Predicting the future with Machine Learning

The future is uncertain. Luckily by using some machine learning ‘black magic’ we will look into ways to predict the future. Hopefully in the blog post I can demystify some of the magic. Predicting the future, or to be more specific, Forecasting the future has been one of man’s fascinations for centuries. From ancient Babylon’s forecasters, weathermen reporting the forecast on tomorrow’s weather, forecasting future changes in the stock markets to provisioning of staff members in call-centres, predicting the future has become part of our human ‘evolution’.

In this post I will focus on forecasting from data which is referred to as Time series data. Time series is data that changes over time and is captured in (time, value) or (x, y) pairs with x being time. Examples of time series data include stock prices, yearly rainfall, inflation, profits, etc. The area I am interested  in is the load on a server in the cloud.

Classical Machine Learning

Before explaining how to predict on time series, let us first discuss machine learning. Classical machine learning aims to classify data into a categories termed classes. A well known case is one where we have data points collected from flowers that belong to the Iris family (either Setosa, Virginica or Versicolor). For each flower sample the following was measured: the length and width of the sepals and petals and these four measurements form what machine learning refers to as features. Using these features we train a classifier (a mathematical model) with some of the data collected, in each training case telling the model what species the particular sample belongs to (also called a label). After the classifier-model has been taught, it is tested by presenting a new sample with the label unknown and makes a prediction on which species the new sample belongs. This is called statistical classification with supervised learning and you can find more information here.

Time series

Classical machine learning uses features, like the petal and sepal lengths of flowers or the symptoms presented by a patient or elements in a wine in order to make some decision on a new sample. In contrast time series do not contain a collection of features, but rather a value at each time step. Time series prediction assumes that the future depends on the past and using this assumption it views each time step back; x[t-1], x[t-2], … as features. Time series typically also has one or more of the following characteristics; is contains randomness, follows a trend and has seasonal and/or cyclical behaviour. Predicting or forecasting a value (or set of values) into the future is the ‘decision’ the machine learning model must make. The question now is how do one approach a time series predication problem?

Steps in forecasting

The following is the generic steps in forecasting time series:

  1. Define the problem to be solved. It is vital to consider of the what the end result of the forecast will be and how the forecast result will be used.
  2. Gather data. The saying is “There is no data like more data”, more data enables better training, testing and validation.
  3. Explore your data. Use statistical tools to plot different information plots and determine the characteristics of your data; does it contain patterns, is it too random or does it follow a trend.
  4. Choose models and fit them. This can be simple models such as using the mean of past data-points or complex approaches such as Neural networks. I will give a brief overview of models currently applied to time series, below.
  5. Evaluate your model accuracy. This entails measuring error metrics and optimising model parameters.

I will now discuss steps 3-5 in more depth. Step 1 and 2 are straightforward and depends one the specific problem you are facing.

3. Explore your data: Exploratory Data Analysis aims to  give insight into the data, discovering the structure, behaviour, patterns, outliers and other information to help with choosing a forecasting model. A key assumption of time series prediction is that the data is Stationary. A Stationary process’s joint probability distribution does not change over time (read more here). Tools to determine stationarity include calculating the Auto-correlation of the data or using the Augmented Dickey–Fuller test. Another important characteristic of your data to explore is to determine if the data follows a trend or contains a seasonal behaviour. Some model are not capable of accurately predicting on data which contains a trend or periodic pattern and thus one will typically remove these behaviour through transforming the data (ie. using FFT or first order differencing).

4. Choose and fit models: The simplest ‘model’ is calculating a mean over the past few samples and using this as a prediction. One step up is to plot a Histogram of your data and use the average of the samples in the most significant bin (the bin in which the majority of samples fall). Moving Average (MA) models calculates a average of the samples in a sliding window over time, with Exponential Smoothing weighting each lagged sample exponentially less. Auto-Regressive (AR) models aims to fit a function f(t) to a prediction y(t) = f(t) + e, where e captures the random error in the data. Combining MA and AR models give ARMA(p,q) models with order (p,q) defining the order of AR(p) and MA(q) respectively.

More advanced models include Markov chains and Neural networks. The idea is that one can discriminate a time series’ values into M bins, which then denotes M states and using a Markov model, predict the probability of transitioning from one state to another for each time step. Neural network aim to learn patterns from past values and predict a new value or a probability distribution for a set of values into the future. Other advanced methods include combining different models or vectorising a collection of time series data in order to improve model accuracy.

5. Model Evaluation: After fitting a model and forecasting some values, it is vital to be able to measure the accuracy of both the model and the predictions. The typical metric to use is to calculate the Root Mean Squared of the Error (RMSE) also referred to as the Root Mean Squared Deviation (RMSD); calculated by : \operatorname{RMSD}=\sqrt{\frac{\sum_{t=1}^n (\hat y_t - y_t)^2}{n}}. With \hat y_t being the predicted value and y the actual value at time t. More information on error metrics can be found here.


The future is still uncertain, but using Forecasting models in machine learning, I conjured some ‘magic’ to enable us to predict the future. This blog post briefly covered classical machine learning, namely classification with supervised learning and the focus was on the steps to take when forecasting on time series data. I discussed the popular models currently being applied to time series and also mentioned advanced approaches using Markov models or Neural networks.

Useful resources

No comments yet.

Leave a comment

Leave a Reply