Extracting valuable insights out of data collected from machine sensors can be hard, often requiring analyzing data from many sensors in parallel. Due to the complexity, machine learning (ML) methods are becoming more and more popular to analyze these datasets. This article presents how to run Edge Machine Learning models to analyze datasets.
This article was originally published by Crosser.
Why ML @ Edge?
There are two main use cases for ML in industrial IoT:
- Anomaly detection
- Extracting higher-valued features, such as remaining uptime
For both these use cases, there are many situations where it’s most natural to execute the Machine Learning model at the edge, close to the source of the data. The reasons typically fall into one of these three categories:
- Results are only needed locally
For an anomaly detector that analyses data from one machine and then triggers an action in another machine or a local system, sending data to the cloud for analysis just adds complexity and cost.
- Latency is critical
For machine-to-machine triggers, latency is often critical and timing requirements may not be met if data is analyzed in the cloud.
- Data volume must be reduced
In many cases the bandwidth associated with the cloud computing assets is limited, unreliable or costly, especially for mobile assets. Using ML to analyze the full dataset locally and then sending triggers or higher valued features to the cloud could be a more optimal solution.
The ML Workflow
The workflow for machine learning consists of two main steps namely, developing the model and executing the model. The first step is an off-line operation where stored data is used to train and tune a model. Once satisfactory results are achieved the trained model is deployed in an execution environment to make predictions based on real-time data. The edge is typically used just for executing the ML model. However, ML model development is an iterative process where the model may be optimized/improved over time when more data becomes available or the architecture is refined. Hence one should expect the ML model in the edge to be updated several times during the life cycle.
Machine Learning for Streaming Data
When data is prepared for developing/training an ML model, it is performed in a manner so that it matches the requirements of the ML environment. Some common preparation steps are:
- – Remove outliers and invalid data, fill in the blanks
- – Scale sensor values
- – Extract features, such as mean/variance or perform frequency domain analysis
- – Align values on time
When executing the ML model in a streaming environment all these operations must be applied before the data can be sent to the model. Especially for the last operation, aligning data on time, requires some special attention.
When training a model, data is usually stored in files or a database with all sensor values present for each time step, so that the model gets the same set of data each time. In a streaming environment, sensor data is received serially, with each sensor sending data at repetitive intervals but independent of all other sensors and possibly at different repetition rates. Before we can deliver the streaming data to an ML model, we must align the data as per regular time boundaries and potentially repeat data from sensors that deliver data at a lower rate.
Machine Learning with Crosser
The Crosser Edge Streaming Analytics solution simplifies the development and maintenance of edge computing by offering a flow-based programming model, through the FlowStudio visual design tool, and central orchestration of edge nodes through the EdgeDirector. Both these tools are available through the Crosser Cloud service. In the edge, the Crosser Edge Node software is installed as a single Docker container, and flows are then easily deployed and updated through the cloud services on any group of nodes with a single operation.
In addition to the standard tools for cleaning and preparing data, the following tools/coding environments are available to support running ML models at the edge:
- Standard Python environment accessible from within flows
- Central Resource catalog to manage ML models and Python scripts
- Join module for aligning streaming data on time intervals
Bring Your Own AI
Python is the most common environment for developing ML models today. Even so, there are a large number of alternative setups that are being used. There is Python version 2 and 3, and a large number of ML frameworks, such as Scikit-learn, Tensorflow, Pytorch, and several other commercial options. To ensure that we can host your ML model independent of the choices made by your data science team, Crosser has decided to introduce the “Bring Your Own AI” concept.
In the Crosser Edge node, you have access to a standard Python environment (version 2 or 3) and may install any additional libraries/frameworks that can be installed using the standard Python toolchain. This environment is configured through a standard flow module, where you set up the libraries needed and the code you want to run. In this way, you can be sure that your model can run in the correct environment, as expected by your model developers.
Initiatives such as the ONNX format, initiated by Microsoft and Facebook, aims at providing a standard exchange format for ML models so that any runtime environment that supports the ONNX can execute a model, independent of which framework was used to build the model. This will also make it easier to run ML models at the edge. Native ONNX support is scheduled for a mid-2019 release at the time of writing this article.
Managing ML Resources
In order to execute an ML model at the edge, the trained model must be present in each edge node that needs it. In addition, you may need some Python code for the final adaptation of the streaming data and to map the model results into a format that can be used by local consumers.
The central Resource catalog in Crosser Cloud is used to manage all your ML resources. You can upload trained ML models as well as Python scripts and can then easily reference them when building flows. When you deploy these flows onto edge nodes the system will make sure that all resources needed are downloaded into the relevant edge nodes.
When a flow is updated, for instance, by referencing a new ML model, all edge nodes are automatically updated with a single operation. Flow versioning is used to keep track of your changes.