Anomaly detection in time series data

Anomaly detection is one of the areas I deal with in Gauss Algorithmic. And so when I was at Machine Learning Prague 2019, I waited curiously for Vítězslav Vlček's lecture Data-driven System health determination in Monitoring Softwares for Operational Intelligence, in which Vlček's methods of anomaly detection was to be presented.‍

Major problems in anomaly detection

Unlike other problems commonly encountered in machine learning, the most difficult thing in this case is to determine the "degree of anomaly" in individual cases, and the small amount of labeled data is also problematic. This, together with a significant disproportion between anomalous and normal examples, makes it almost impossible to use common techniques.

Time-series anomalies

The main topic of the lecture was the problem of anomaly detection in time series, specifically in several interdependent series. An example of anomaly detection using CPU, RAM and disk was presented. When combining these three closely interconnected time series, it is difficult to create a prediction model for all three variables. And even if such a model was successfully created, it would be extremely complex and difficult to interpret.

Vítězslav Vlček presented his own method of solving this problem, inspired by the wave function collapse algorithm. In this case, the behavior that has not previously occurred in these metrics is considered anomalous. To give you a better idea, I have visualized all three signals (CPU, RAM and disk) in a graph.

‍

***Figure 1: Processor, RAM and disk usage***

‍

We split these three signals into tiles according to the selected time interval, which means we can model their interdependence. Their subsequent development can be predicted by "placing" the tiles so that they best correspond to the connection to all three signals. An anomaly is then defined based on the difference between the real development of the signals and the prediction based on the tiles.

‍

***Figure 2: Splitting a time interval into tiles***

‍

The advantage of this method is that anomalies do not repeat: if an anomaly occurred in the past, there is a tile the anomaly can be compared to. Even if we don't achieve the desired effect, it is still possible to implement a system of forgetting old tiles, or rather use only the tiles that have previously been displayed at least once. The method is expected to have low memory demand since it's not necessary to save the entire course to the tile – only the coordinates of the start and end for each signal are saved.

Evaluation

In my opinion, the idea of this method of detecting anomalies is interesting. The algorithm itself is simple and computationally undemanding. However, the use might be a bit problematic because it's limited to a very specific kind of problems. I will follow the latest development of this method as well as its use.