Trality has been discontinued as of 31.7.2023. Thank you to all our users 💙.

Avoiding Common Pitfalls of Machine Learning Strategies

CONSTANTIN

31 May 2021 • 7 min read

This is the first in a series of blog articles giving an insight into the research done at Trality.

The majority of research in machine learning for algorithmic trading happens behind closed doors. Even after the alpha of specific strategies vanishes, the knowledge and methodology is kept a secret. At Trality we want to change this. Our goal is to create an environment for quants of all levels of experience in which creativity can flourish, and, in alignment with the ethos of decentralization, investment ideas and capital are crowd-sourced within the community. We provide the tools and infrastructure needed to enable researchers, engineers and traders to focus on what is most important, generating alpha. While we are very clear that the alpha itself should be a well-protected secret, we believe that entering an open discussion and sharing knowledge will benefit the whole community and help to compete with big players.

This first blog-article series will cover three topics, common pitfalls, financial data aggregation and stationary time series.

Common Pitfalls

Like in any other field, it’s a good idea to start one's journey by obtaining an overview of the current available state of research. One important aspect of this is learning about errors that others have already made and avoiding them. This saves a lot of time otherwise wasted by following paths with dead ends.

We will take a look at some of the most common pitfalls which everyone should be aware of before starting to implement machine learning ideas. This will give you a head start on your research journey.

Time Lag

The first thing that comes to mind when thinking about machine learning for algorithmic trading is simply predicting the future price from the recent price history, i.e. predicting on financial time series data. However, predicting financial time series comes with a particularly nasty pitfall which can be very deceptive: the time lag.

After getting our hands on some price data and implementing a common deep learning model with our favorite machine learning library, we are eager to see some results. After training the model for some epochs and seeing an improvement for the loss-function (e.g. MSE), we plot the predictions and want to see some results with our own eyes.

And on the first look we are quite impressed with what our model accomplished. Our result might look like Figure 1, the prediction in orange seems to match the real values in blue pretty accurately.

Figure 1 - These are the results of a LSTM predicting 1 hour bitcoin prices. On first sight, the results on the evaluation and test set look rather promising. We will take a closer look in the next Figure.

However, we are missing something. We will see what that is if we zoom in on the results in Figure 2.

Figure 2 - In this Figure we magnified the results from the previous Figure. We can now see, how the predictions resemble the actual values but are shifted by one day. The model has learned, to minimise the loss by predicting the next value to be the last known value.

The magnification shows us that the prediction pretty closely resembles the true values but is shifted by exactly one step into the future. This means that our model is just using the most recent history which is the last known price value to predict the next price. As the model learns this prediction, it often actually leads to improved metrics, and as we saw also fools our perception of the performance of the model. The model is not doing anything wrong per se, since this prediction often resembles a local minima. Unfortunately this obviously does not help us to predict in which direction the price is going to move.

To handle time lag predictions, we can introduce some metrics that help us to quickly identify them. To get started we introduce a “model” which does just what happens in case of time lag, for any input length, its prediction for the next step at t will simply be the value at t-1. We can use this model as a benchmark and compare losses and metrics that we compute on our machine learning model to this simple lag prediction. If we get better results on our machine learning model we are most likely on the right track. Another option is to check the correlation between lag prediction and machine learning model. When they are correlated chances are high that we run into a potential local minima representing the lag prediction.

Vegard Flovik illustrates this problem with a random walk in his great article: How not to use Machine Learning.

Leakage

We already learned in the first pitfall, that if it looks too good to be true, it most certainly is not true. This applies to machine learning in general but is especially important in financial time series prediction, because of the low signal to noise ratio, most of the time we are searching for very small signals in a lot of noise. If we happen to find one of those precious signals, we should be very accurate in excluding every possibility of information leakage. A great counterexample is the article by Adam King: Deep Learning: Profitable Bitcoin Trading Bot.

Information leakage can happen at many points in a machine learning project and data preprocessing is especially susceptible to leakage.

Leakage is often hard to detect and there is no easy textbook solution to the problem. The most important takeaway from this should be, to always double and triple check good predictions for leakage, especially if the results are way above expectations, before basing further research on them. Additionally one should always 'paper' test the model before deploying it in production, this is particularly important for trading strategies.

Not Using Stationary Data

Last but not least, a common mistake is to neglect data preprocessing. Blindly applying a state of the art deep learning model to a financial time series wont yield satisfying results. Most machine learning models expect stationary data as input with mostly constant statistical properties. To meet this requirement, preprocessing of the data is necessary. The easiest approach is use the first difference of the time series.

To do so, we calculate the returns at every time step, a return r is the difference between the current price p(t) and previous price: r(t) = p(t) - p(t-1).

Figure 3 - In blue we see the original data, the returns are depicted in red. We can see that bei using the returns calculated between neighbouring prices we obtain a completely stationary time series. However we loose the ability to observe the long reaching relationship between the prices.

By applying this simple form of stationarisation, we lose information about the history of the time series. To know for example whether the current price at t is higher or lower than a price further into the past t-N, N>1 we would have to add up all price-returns in between. Some models might be able to reconstruct this data, but for a lot of them this information is lost.

To address this problem a method called fractional differentiation exists. Instead of calculating the difference only between neighbouring prices (i.e. t and t-1), it introduces weights (depicted in Figure 4) which are used to form a weighted return between the current price compared to N prices into the past.

Figure 4 - Weights for different d values. d = 1 corresponds to simple returns, calculated between neighbouring prices. As we decrease d, the impact of prices further in the past increases. For d = 0 no differentiation would take place.

In Figure 5 we show the result of the fractional differentiation on one year of bitcoin price data.

Figure 5 - In blue we see the original time series (d = 0) and the line in red are the results from a fractional differentiation with d = 0.8. As we see the red values are much more stationary while still keeping some history. For example the price crash right after 09-01 can also be observed in the fractional differentiated data.

Compared to the transformation in Figure 3, we already restore a good amount of historical information while achieving a good enough stationarisation of the series.

We will discuss this method in detail in a separate blog-article which will be about stationary time series.

Looking to create your own trading algorithm?

Check out the Trality Code Editor. Our world-beating Code Editor is the world’s first browser-based Python Code Editor, which comes with a state-of-the-art Python API, numerous packages, a debugger and end-to-end encryption. We offer the highest levels of flexibility and sophistication available in private trading. In fact, it’s the core of what we do at Trality.

Summary

We discussed three of the most common pitfalls in machine learning for financial time series. The time lag can be confusing at first but actually provides us with an easy tool to measure performance. We learned to always be aware of the possibility of leakage and never deploy a trading algorithm without testing it with paper trading first. Last but not least we introduced two options to provide machine learning models with stationary data which is an easy first step to improve prediction quality.

We will continue this series in the near future, touching the topics of financial data aggregation and stationary time series. The most commonly known form of financial data aggregation is simple time-candles as seen in most exchanges charts. We already slightly touched the topic stationary time series in this article, fractional differentiation however deserves its own article where we will go into detail.

We are working on making machine learning modules and direct predictions available to our users and we already made huge process towards that goal.