Forecasting with Machine Learning for Marketers

Simon Löfwander
Reading time: 9 minutes
2nd October 2018

Much of the data we see in analytics and digital marketing is presented in a time series format. When we track sessions, goal conversions or any KPI over time we almost exclusively deal with time series data. It is such a widely used way to present online data, many of us even think of a time series plot when we think of analytics. It makes sense; time series provide an excellent representation to identify what has happened in the past. But what can they tell us about the future?


Figure 1: The familiar-looking time series plot with distinctive seasonality

It’s often overlooked that we can use the very same time series that helps us explain the past to predict what will happen going forward. By using statistical models—ranging from naïve approaches that predict future values to be the same as the past observation for a year, month or day, to advanced full-blown machine learning concepts—we can make calculated bets on the future movement of KPIs, by exploiting patterns from the past.

Why use time series forecasting?

Predicting what the future holds can be a great advantage no matter which field we’re in, given we’re moderately successful in our endeavours. Part of the data science nomenclature, time series forecasting is a statistical technique we can use to maximise our chances of success.

Speculating on the future using statistical models was popular long before the terms ‘data science’ and ‘machine learning’ became trendy to throw around, even though time series forecasting is now often categorised under these umbrella terms.

Of course, it was also popular to speculate long before statistical models started to do the heavy lifting, using tools such as ‘your gut’, tea leaves or star constellations. Notwithstanding if one believes in tea leaves prophecies, we’d be wise to consider the power of predictions made by more sophisticated models to guide us to the future. If you play your cards right, it is a tangible strategy to increase revenue and profits for many businesses.

The most obvious example of a time series forecasting application can be found in the financial markets (it is also the most difficult – there is no ‘free lunch’). If we could somehow figure out the future value of stocks, it would be easy to turn that information to value, and sooner or later we’d all be millionaires. Many have tried, and almost as many have failed, which can be explained by the simple fact that stock movements aren’t very deterministic from the perspective of past values.

There may be some patterns in the data, but when new information hits the market and gets reflected in the stock value, the new price would seem random when looking at the past value alone (successful stock predictions incorporate additional information to past movements, but more on such models later). This touches on a very important subject in the realms of time series forecasting – predictability.

How well is a time series represented by its past values? Are there seasonality, trends or other apparent patterns in the data we can exploit to predict the future? This is important because if a time series is random, the value one day has nothing to do with the value any of the previous days. Even the most intricate machine learning system would have had no chance predicting it.


Figure 2: A so called “random walk” we can just forget to predict accurately

On the other hand, if there are patterns in the data, we have a good chance of making quality forecasts and bounding future values into ranges of certainty. It’s a chance that increases with our model choice, how well it is tuned and the strength of the patterns in the data.

On the application in SEM & Analytics

The good news is that website data usually exhibits the characteristics we need to perform reliable forecasts.

An e-commerce site selling student literature is more likely to make sales when a semester starts and on weekdays throughout rather than on weekends when the target audience is likely out partying studying. On the other hand, a gambling site is more likely to get new customers on evenings and weekends for its casino games and on match days or during big events for its betting counterpart.

All these events and tendencies show up as patterns in our data, which a machine learning model can use to its advantage to provide high quality forecasts. Having access to reliable forecasts on the number of future visits to your e-commerce or gambling site may not be as valuable as the equivalent for the stock market. However, it can still provide us with a great deal of insight that can be converted into profits if acted on correctly, which we’ll touch on later in this article.

So, how do we know if we can trust a model to predict the future for us? This is not a “one answer to rule them all” kind of question.

As with most things, there are a few ways we can go about it. However, we’ll focus on one approach that uses an important machine learning concept that will help determine whether the model we have built and the time series data its been fed is enough to provide a reliable forecast.

The technique is called ‘supervised learning’ and works by splitting the time series data into two parts: a training and testing set. The training set is used, not surprisingly, to train the model. It identifies the patterns in the data and categorises the movements and characteristics of it into different components, such as weekly or monthly seasonality, trends and holiday effects.


Figure 3: Screenshot of the components of a time series and identified anomalies from “Ayima Intelligence”

After a model has been trained, it is used to predict the period of the test set – without knowing what happened during it. Subsequently, by comparing the predictions to the real values, we get a good representation of the models’ capacity to predict the future. If we deem the showcased ability good enough, it is retrained on all available data and used to forecast. Forecasting using this technique is automated in our data science platform “Ayima Intelligence”, from which screenshots are shown in figure 3 and 4.  


Figure 4: Example of supervised learning in time series forecasting from “Ayima Intelligence”

Combining the past with our planning

The above example produces forecasts by using patterns found in the past values of the same data, be it trends, seasonality or re-occurring but irregular events. This is useful, but we oftentimes want to include more variables to support the predictions.

Changes in strategy, the market or budgets often have a significant impact on KPIs – similar to how the stock market is influenced by new information. If we fail to incorporate these in times of change our forecasts might provide little value.

With this perceptive comes the next question: how do we take, for example, planned media spend into account when predicting revenue for a site?

The answer to our prayers is called ‘dynamic regression’, which combines the time series forecasting part with a regression component, measuring the causal relationship to another variable. For example, to predict the revenue of this year’s Black Friday sales we can use seasonality and trends, the impact of the event last year, and the planned media spend (which very well could be different this time around).

Transforming insights into value

Now, moving on to the million-dollar question: “how do we turn these predictions into value?”

To answer, let’s first follow up on the example of revenue and media spend. What we base the concluding reasoning on is ‘the law of diminishing returns’, an important concept for profit maximisation.

The law simply states that profit is not linear to spend. In other words, the more we spend the smaller the marginal effect of the spend is. This means we want to ‘cap’ our media spend at some level because the marginal returns after that threshold make the extra effort not worthwhile. We are even likely to reduce profits by spending too much.

As mentioned above, dynamic regression can help us find out at what level of media spend we will maximise profit during an event such as Black Friday, or any other period for that matter, by considering the additional impact of each extra unit of spend.

Figure 5: Output of a dynamic regression – the causal relationship with historical media spend is evaluated and used in combination with past patterns to predict revenue over a future period. We clearly see how the lower planned media spend causes the expected revenue to drop.

Another example is demand forecasting, which enables us to find out what products customers want and when they want them. Capitalising on such an approach, an e-commerce site could make changes in product price and appropriately plan product sales.

As an example, imagine we forecasted the expected sales of a dress over some period. The forecasts would enable profit maximising price decisions when combined with an analysis of customers, cost, price elasticity and competition. Furthermore, they would add logistics and storage capacity, ensuring the most in-demand products will be available and distribution systems prepared. This would then lead to improved customer satisfaction, subsequently increasing retention rates – a metric so important we can’t afford to overlook it.

Some concluding words & results

We hope this helped shine a light – albeit just a little bit – on time series forecasting and its end-goal capacity to increase revenue and profits.

To spark that interest a bit further, let’s revisit the points made regarding figure 6 and try to figure out the optimal media spend going forward. To do so, we need only to figure out how to define profit based on our findings to visualise the profit-curve, which is not as hard as it sounds.

Figure 6: By using historical data and the causal relationship between revenue and media spend, we’re able to find the optimal media spend going forward for profit maximisation. To find the optimal spend level, we simply solve for the max of the profit curve using first order derivatives.

Now let’s see how using the optimal media spend changes the predicted revenue from figure 6.

Figure 7: Predicted revenue and profit for the prediction period by considering optimal media spend, compared to current spend levels.

Seemingly, by incorporating causal relationships in our forecasts we can optimise profit going forward, whilst simultaneously getting access to forecasts which consider the expected impact our change of plans has on total revenue.

Finally, it should be noted that the predictions are in fact predictions and should not be taken as an indisputable truth. That being said, by carefully training and tuning machine learning models to exploit patterns and causal relationships we’re doing far better than drawing numbers out of a hat, or trying to make sense of tea leaves for that matter.  

If you want to know more, have any questions or need help to produce high quality forecasts, you can learn about our data science offering here.

Written By Simon Löfwander
Asset 1 Asset 1 Asset 3