Gathering insight from data generated across distributed systems in real-time – and correlating them with historical data - provides important benefits for businesses by being able to react faster to changes and customer demands. In addition devices (typically the edges of large distributed systems) are getting smarter and smarter and, correspondingly, able to produce more complex and larger volumes of data streams. Simply transferring all raw data from the edges to the backend is not a practical solution as it impacts scalability and performance of the overall system. This deliverable is a prototype deliverable and is focusing on two major aspects: 1. Providing algorithms that enable anomaly detection on given data. Therefore, we have developed two algorithms, (1) outlier detection and (2) binary classification. Both algorithms are detecting bad runs in the given scenario (see Infineon use case, deliverable D9.2). a. The outlier detection algorithm is finding values which are out of an expected band. The detected outliers can be used to inform the operator about the detected anomaly. b. The binary classification is using machine learning approaches. Therefore, the prediction is developed and trained by some given data. In combination with the prediction model the features can predict if processing data are abnormal or normal. 2. Providing a framework and concepts to support the distributed execution of queries. This means that parts of a query can run close to the data sources (as described above) and other parts can run in the backend. For the above mentioned algorithms that means that the outlier detection which works on high rate data has to run on a node at the data source while the binary classification can run partly on the node at the source (for extracting the features and aggregating them) and partly on the backend for predicting the outcome. This document summarizes the core ideas and provides an overview of the developed concepts and algorithms. It tries to help the reader to understand the available prototype and demos. With this deliverable we did a major step towards the distribution of queries and to detect anomalies on streaming data. Core concepts are available and implemented. We have proven that it is conceptually possible to distribute the query logic between multiple nodes and to detect anomalies early in the process.
No attachments available
Not specified (see website if available)
- Not specified