FEDERICO CORNALBA
02 December 2021 • 8 min read
The term ‘learning’ has become increasingly prevalent in our everyday lives. It is fair and safe to say that a good portion of this trend can be attributed to the phenomenal rise of Machine Learning techniques and methods within the context of Artificial Intelligence, which has gained a prominent position in a wide range of quantitative sciences, including Computer Vision (CV), Autonomous Driving, and Natural Language Processing (NLP), just to name a few.
While several Machine Learning problems can effectively be solved in a 'static' way (i.e., the learning process analyses all available data in one go), many other problems are more naturally framed in a 'dynamic' fashion (i.e., the learning process is split into consecutive steps, where each step is associated with an interpretable, meaningful action that either the algorithm or the algorithm's user can take within the scope of the given problem). These dynamically structured problems make up a Machine Learning sub-field commonly referred to as ‘Reinforcement Learning’.
Reinforcement Learning is fascinating from a conceptual point of view. But it also allows us to describe the intrinsically dynamic objects that interest us most here at Trality, namely financial markets, which is why we are kicking off a blog series on the topic. In the first piece, we'll explore the basic idea of Reinforcement Learning before explaining its potential connection with Trality's core business. Drawing our inspiration from the beautifully written report “Reinforcement Learning in Financial Market - a survey” by T. G. Fischer, we won't even need to refer to finance-related examples for much of this piece. As a matter of fact, we'll start off rather...basic.
For the time being, we'll link all of our considerations to the following example.
Basic example: Assume you have been invited by a host to try out a new, multi-step game, which essentially involves making strategic decisions at each step, with the goal of maximising some kind of reward at the end of the game.
You'd be hard-pressed to find a more generic description, but, for the time being, that's basically all you need to know. Don’t worry, we will provide more specific instances of such a game whenever needed.
Depending on how the host decides to set things up, you might find yourself in one of two different but equally realistic scenarios.
Scenario 1. As you have never played before, the host decides to pair you with an experienced player, who has played several times before. This experienced player is able to summarize the overall game’s environment in a set of features (or ‘state’) and is also able to make predictions as to how the state is likely to change from step to step. Your involvement is limited to deciding which ‘action’ to take at each step of the game based on the predictions you receive from the experienced player.
Scenario 2. The host wants you to learn from your mistakes, and therefore forces you to play on your own. Since you are inexperienced, you are given several unofficial game rounds in which you can practice, and at each step you consider the game’s current state and evaluate your actions. You keep track of your evaluations, and keep updating them throughout each round of the game.
In Scenario 1, the experienced player has ‘learnt’ over the course of years that certain states of the game are likely to lead to some other specific states. In mathematical terms, the experienced player has seen a lot of pairs (x_i, y_i), with x_i being a state and y_i being the observed state following x_i. This vast knowledge allows the experienced player to make predictions, if confronted with a new configuration x.
In many cases, this learning task can also be performed effectively by machines (in a nutshell, this is Machine Learning), provided, of course, that said machines can figure out a suitable ‘mapping’ that faithfully links the data x to the observations y. More precisely:
Machine learning is concerned with having a machine select the most accurate mapping, matching the data x and the observations y among a very large (very large!) set of possible mappings.
The input provided by humans is usually limited to:
The machine does everything else, namely, it uses a lot of computational power to explore the specified set of mappings and eventually picks out a suitable mapping. The high level of computational power of modern computers not only makes this exploration possible, but also the whole method feasible.
In Scenario 1, the roles of the experienced and inexperienced player are ‘detached’. In other words, the inexperienced player takes the prediction of the experienced player (the only one who has gone through training) and acts upon it. This dual approach has a few limitations:
To address the above issues, a further refinement of Machine Learning, called ‘Reinforcement Learning’, has been proposed. In a nutshell, this approach ‘merges’ the roles of the experienced and inexperienced players. The resulting player, who needs training on the game, explores the effect of all the actions that can be made and, crucially, judges them according to the very same criteria he/she is ultimately interested in while also taking all relevant constraints into account. In other words, the training combines the simultaneous improvement of the state predictions and of the choice of actions based on a reward function that faithfully summarises the player’s gain. This situation is roughly what we have described in Scenario 2.
We are now acquainted with the basics of Reinforcement Learning and we've already come across some of the basic terminology, which we'll reiterate here:
We have so far described Reinforcement Learning as the procedure that combines the simultaneous improvement of the state predictions and the choice of actions based on a reward function tailored to the player’s needs. Within this general paradigm, there is more classification to be done. As a matter of fact, Reinforcement Learning is further split into three different cases, which we can explain against the same game described below.
In this game, the player’s goal is to maximize the time he/she wanders around in a maze prior to hitting the first dead end. The player thus has to decide what to do at each crossroad in the maze and may only rely on two tools:
Reinforcement Learning techniques are roughly split into three main categories:
Overall, Reinforcement Learning may be summed up like this:
Reinforcement Learning is a sub-field of machine learning. In addition to the basic requirements described in Section 3 above, the machine needs to be trained to describe a dynamic interaction with a given environment. Specifically, the machine’s internal algorithms need, on the one hand, to be phrased in terms of consecutive, interpretable actions. On the other hand, they need to be driven by the reward mechanism, which is specified by the user.
As promised, we have substantiated our ‘Basic Example’ many times throughout our discussion in order to illustrate a few core ideas.
Playing Risk and finding our way through an imaginary maze definitely sound like entertaining activities worth taking up once in a while. However, as we hinted at the beginning of this piece, we here at Trality are ultimately interested in playing a different kind of game (in keeping with the language of this piece).
At its core, Trality has come up with a visionary platform that brings several state-of-art tools for the creation, design, and consolidation of automated trading bots (i.e., predefined strategies which are deployed for selling and buying stocks in the financial markets) to a constantly growing user base. More specifically, Trality is on track to democratize and make trading accessible to everyone (not just to those who are in the business), and it is achieving this goal by letting the user (or 'bot Creator') be the one in charge of striking a balance between straight-out-of-the-package features and higher-skill coding features that best suits him/her.
Ultimately, any given trading bot created by our users will constantly be confronted with the current 'state' of the financial environment (e.g., stock prices, non-price financial indicators, sentiment data — you name it). Consequently, the bot will need to perform some 'actions' (such as buying/selling stocks, or re-shaping users' portfolios). As you may have guessed, 'state' and 'actions' reflect the fact that this financial scenario can indeed be framed within the language of Reinforcement Learning, which we introduced above.
As we strive to offer a better and more inclusive user experience, we strongly believe that an ever-increasing degree of interpretability is a key feature that all our products will have. The intuitiveness and interpretability provided by the Reinforcement Learning paradigm make it relevant and integral for a leading fintech platform such as Trality. It is thus our intention to incorporate elements of Reinforcement Learning in our future products and to use it to craft even better tools for our bot Creators.
Alright, I get the gist of it now. But what does Reinforcement Learning look like in terms of code and what’s the mathematical foundation that makes it possible? Tune in for the second episode in this blog series on Reinforcement Learning and we’ll dig deeper.