The Big Data conference IEEE BigData 2019 launched a new kind of Machine Learning competition, based on live data streams of a univariate time series.
Time series prediction is a field of expertise of Craft AI, so we decided to test some tools we have in development during the competition.
The goal was to receive live data every 5 seconds, and make a prediction for the next 5 seconds. The rhythm of the predictions meant that the solution had to be a program, able to run during the whole 3 hours of competition without human intervention. A few weeks before the real competition, the organizers provided a sample of the data stream, so participants could build their solution. After some exploration, it appeared the stream was following a periodic moving pattern, with noise, and random anomalies.
The period and the pattern were drifting, preventing from using a static model. The solution had to adapt itself, learning from recent data, and forgetting old ones. First we built a model representing the topology of the stream: a moving period, a moving pattern, some anomalies and some noise.
To work well, this model needed 9 parameters to be fitted, leading us to a common problem in time series predictions: make the parameters continuously evolve over time. This idea of adapting to concept drift and continuously updating the predictive model is a key part of the Craft AI engine. In the context of this competition, we chose a genetic algorithm to fit these parameters, running at the same time as the prediction script. The genetic algorithm keeps trying different sets of parameters, testing them on predictions of the past minutes of the stream, and updates the parameters used by the prediction script each time it finds a better performance.
The published solutions can be found at Real-Time Machine Learning Competition on Data Streams at the IEEE Big Data 2019.
It turns out we won the competition, with the best predictions, but also with the fastest published solution. With less than 0,1 second to make predictions, our solution could handle data streams 50 times faster! Results will be announced during the conference, which we are pleased to attend.