Comparing the craft ai energy solution to other popular predictive models

by Claire Bizon Monroc | Jan 22, 2019 | Use Case | AI    Data   

Comparing the craft ai energy solution to other popular predictive models

craft ai energy is the craft ai integration kit for energy prediction. Based on craft ai ‘s explainable predictive models, it provides an easy way to extract meaningful patterns from your energy conumption data and detect anomalies. But how does it compare to other predictive models?

To answer that question, we compared its performance on two prediction tasks to several widely used machine learning and time series algorithms. Every task is defined by a dataset and an evaluation method.

The datasets

The benchmark was realized on 2 public datasets:

Both are a multi-year capture of a single household’s consumption:

  • The AMPds2 dataset:

The AMPds2 contains electricity measurements of a household located in Burnaby, Canada at one minute intervals from April 1, 2012 to April 1,2014. Weather data from the Vancouver International Airport’s weather station was added at a hourly frequency for the same period. For the purpose of the benchmark, the data was resampled at a 30 minutes interval. Punctual temperature measurements were assumed to stay relevant for the next two half-hours.

  • The UCI dataset:

The University of California, Irvine’s Individual household electric power consumption Data Set (UCI) was released in 2012, by EDF R&D Researchers Hebrail and Berard. It contains electricity measurements of a house located in Sceaux, France at one minute intervals from December 16, 2006 to November 26, 2010. Since no additional contextual data was provided, we used the craft ai energy kit to automatically retrieve daily minimum and maximum temperatures from the Dark Sky Time Machine Request API. We further resampled the data to the hour.

Here is a complete overview of the data used in both benchmarks:

Tables UCI Dataset AMPds2 Dataset
Location Sceaux, France Burnaby, Canada
Type Individual household Individual household
Measured Variable Minute-averaged active power Instantaneous activate power
Unity kiloWatt Watt
Frequency 1 hour 30 minutes
Depth 4 years 2 years
Size 34k 35k
Missing values YES: 1.2 % NO
Weather data DAILY:
Maximal and minimal temperature
Retrieved from
Punctual temperature
Provided in the dataset

What models was craft ai energy compared to?

  • SARIMA : Seasonal Autoregressive Integrated Moving Average
    SARIMA is an auto-regressive model for seasonal time series. Looking at the evolution of a variable through time - like an energy consumption load - it tries to express its next value as a linear function of the previous ones. It’s a widely used statistical method for time series forecasting. For this benchmark, an implementation of the model was done with the statsmodels module for Python.

  • scikit-learn Regressor Tree :
    scikit-learn is a free machine learning library for Python that implements various machine learning algorithms, including regressor decision trees.

  • scikit-learn Random Forest Regressor :
    This Random Forest Regressor is an ensemble model made of several regressor trees. Each member tree learns to predict value by training on a part of the data. When the model must predict a value, it delegates this task to each one of its members, and gives an average of all their outputs as the final prediction.

  • Facebook Prophet
    Prophet is Facebook’s own tool for time series forecasting. It explicitly looks for yearly, weekly and daily seasonality in the data to produce an accurate approximation of its non-linear trends.

The Benchmark


The models were evaluated during a rolling predictions benchmark. The whole datasets were successively split between test and train data. At each step, one week of data was added to the training data, and the models were used to predict the power consumption values for the next, unseen week.


Evaluation scores were then computed at every step by comparing each model’s predictions to the observed ground values. 3 metrics were used for this purpose :

  • MAE : Mean Average Error
    As its name suggests, the MAE metric is an average of all the prediction errors made by each model on a given period. The more different a model’s prediction compared to the actual value, the larger the MAE.
  • MAPE : Mean Average Percentage Error
    The MAPE measures every model’s error as a percentage of the actual value.
  • R2 : Coefficient of Determination
    The R2 score grades models by comparing them to a naive predictive model. A naive model would simply compute the mean energy consumption of an individual and systematically output it as a prediction whatever the time or the context. Thus any model that received the score of 0 on a given period was as good as this naive, untrained model: it brought very little predicting value compared to its learning capacity.

Preparing the benchmark

In addition to the exogenous temperature data, the notion of time needed to be built into the predictive models. Time-awareness is an inherent feature of time series forecasting solutions. The craft ai API simulates time awareness by automatically generating seasonal features from every sample’s timestamp: day, month, time, year. This behaviour was reproduced for other tree-based solutions. The final data fed to scikit-learn Regressor Tree and scikit-learn Random Forest Regressor therefore included 4 more features : hour, day of month, month, and year, on top of the weather information.

Predictive models are characterized by parameters that can be tuned to adjust the model to the data it tries to describe. To produce a fair comparison of the different performances, a pre-evaluation of the models on the first 30 weeks of data was realized, after which the parameters producing the best scores for every model were selected. Because the aim of this benchmark was however to show how different models adapt to a simple baseline predictive task, over specialization of any model was refrained upon. The parameters used for tuning were:

  • craft ai energy : the depth of the regression tree
  • SARIMA : both (p, d, q) of the model and of the seasonal order. The periodicity of the model was indicated for each model manually (48 for AMPds and 24 for UCI)
  • scikit-learn Regressor Tree : the depth of the tree
  • scikit-learn Random Forest Regressor : the depth of the trees and the number of trees.

Other parameters were kept as default.

The Results

So what results did we achieve? On the graph below are plotted all the predictions for the UCI dataset. On the first week of December, 24 2006, predictions start off really, really wrong. This is after just one week of training data, not enough for any of the models to pick up the trends and variations in the household’s energy consumption. Select a week to see how the model’s predictions compared to the actual values throughout the years:

As expected, the predictions get better as the models train on more and more data. But they’re also sometimes subject to sudden variations as the models get surprised by the adoption of new habits in the households. This is reflected in their scores.

Let’s look at a smoothed evolution of the Mean Average Error (in MW) on the UCI prediction task. The clear decreasing trend mirrors the progressive improvement of the models that we expected. This evolution however follows a yearly period (52 units on the graph). Every year, the models output better predictions of energy consumption in the summer than in the winter.

Evolution of MAE Error on UCI
Evolution of MAPE Error on AMPds2

To have a better sense of what these performances mean, we can use the R2 scores to compare them to a naive model.

Evolution of the R2 Score on the AMPds2 prediction task

The horizontal bar represents our naive model. By looking at the evolution of the score, we can see the moment in the training where each model became better than a naive model. This happened quite early for our autoregressive time-series solutions - Prophet and SARIMA -, which both cross the bar around 10 weeks of training. Tree-based solutions needed a bit more time to cross that landmark - around 20 weeks. They also kept on being less robust to sudden changes in habits, as on week 37. In this respect craft ai does react more sturdily that other tree-based solutions. It largely competes with, and overpasses time series forecastings from the week 50 on.

Overall the craft ai energy kit achieved an R2 score of 0.21 on UCI and 0.22 on AMPds2 , performing above the models’ average of respectfully 0.20 and 0.17. Take a look at the full R2 score table:

Tables UCI Dataset AMPds2 Dataset
craft ai energy 0.212 0.217
SARIMA 0.207 0.186
scikit-learn Regressor Tree 0.177 0.154
scikit-learn Random Forest Regressor 0.212 0.147
Facebook Prophet 0.184 0.153


craft ai energy performed consistently better than both similar machine-learning solutions and statistical time-series models on two predictive tasks. This strong performance, combined with an ability to explain every prediction it makes with clear and simple rules, makes it an ideal candidate for energy prediction tasks.

If you want to learn more about craft ai energy solution download the official presentation below and/or contact us !

Download the PDF !

1 Makonin S., Ellert B., Bajic I. V. and Popowich F. (2016). Electricity, water, and natural gas consumption of a residential house in Canada from 2012 to 2014. Scientific Data, 3(160037):1–12.
2 Alice Berard, Georges Hebrail (, EDF R&D, Clamart, France (2012). UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science.