PROBLEM DESCRIPTION

One of our latest projects for our customer VyoPath, consisted on building a Machine Learning model capable of detecting different types of network attacks. Using Netflow as the base networking protocol, the model was trained on different data sources including public datasets, synthetically generated data, and data captured using custom computing. 

After the initial training, our client continued to capture data on an ongoing basis. Doing so gave VyoPath a great opportunity to enable periodic model re-training in order to keep it updated and protected against data-drifting. 

Due to the inherent dynamic characteristics of network threats and future attack modalities, the client required an automated solution to leverage the incoming stream of data and recursively update the Machine Learning Model. Ideally, this solution would involve little to no  human intervention and maintenance over time. 

IMPLEMENTED SOLUTION

Since data was stored in the cloud in a raw format, we had to perform several extracting, cleansing and transforming tasks before being able to re-train the model. Our proposed solution included a KubeFlow pipeline that would create isolated blocks for each task and connect them in a logical way to finally orchestrate on-schedule runs.

The KubeFlow pipeline was run in the Google Cloud as a client requirement. As a result, Vertex AI Pipelines were used to run and keep track of the pipeline runs, as well as store artifacts, log events and send alerts – all while providing seamless integration with other GCP services.

The following diagram is an overview of the architecture we used to solve the client’s needs:

Figure 1 - Proposed Solution Diagram

As the name suggests, the Data capture block is where the data is captured. Using the data collector and business rules, the data is then labeled and moved to a Google Cloud Storage bucket (Input data bucket). 

The next block, Automatic model re-training, contains a Cloud Scheduler event and a Cloud Function to trigger a KubeFlow pipeline with an on-schedule run through the Vertex AI GCP service. 

The pipeline consumes data from the Input data bucket, while the KubeFlow configuration files and current production model pulls from the Artifact bucket. If the newly trained model outperforms the existing production version, then the new model is pushed to the model repository located in the Artifact bucket. It’s then ready to be leveraged by users and applications.

The figure below shows a simplified diagram of the KubeFlow pipeline created to perform the data cleansing, data transformation and model training tasks. It also includes the performance comparison module that decides which model performs better so that it can then export it to the model repository.

Figure 2 - Architecture of the KubeFlow Pipeline

Each block of the diagram represents several sub-blocks of the real pipeline. Essentially, each of these blocks performs individual tasks that allow perfect control and robustness of the execution. Redundant tasks such as checking the model integrity after the export were also considered in the design to ensure maximum availability of the model every time it’s updated.

Despite the solution being completely automated, the alert system is capable of informing the client in case of any failure during its execution.

OBTAINED BENEFITS

Due to our expertise in the NetFlow security field and our familiarity with the defined tech stack, VyoPath chose 3XM Group to improve the detection capabilities of its machine learning model.

The main benefits of our automated pipeline solution included:

  • No development time required to train new versions of the model.
  • High reduction of error-prone tasks mainly due to no human interaction needed.
  • Better traceability of the changes and performance metrics.

Once the solution was deployed, the client was able to leverage the new labeled data, which prevented the machine learning model from drifting over time. As a result, this increased the accuracy and robustness of the detection system. 

Ultimately, having a more accurate and reliable machine learning model should better position VyoPath in the market and lead to a direct increase in revenue.

TECHNOLOGY STACK

→ KubeFlow

→ CatBoost

→ Google Cloud Platform (GCP)

  • Cloud Scheduler
  • Cloud Functions
  • Vertex AI
  • Monitoring
  • Cloud Storage
  • IAM