FEDERICO CORNALBA
29 March 2022 • 7 min read
In our initial piece on the subject, we spelled out the basics of the so-called Reinforcement Learning (RL) paradigm, and we hinted at the impact that embracing such a paradigm could have on our fintech company. A summary of the main points that we made—laid out as a mock interview—is as follows.
In accordance with the vision summarized in point 4 above, we have been developing our own open source repository devoted to including existing (and building up new) Reinforcement Learning tools for stock and cryptocurrency trading. And ... here it is:
https://github.com/trality/fire
The core features to which we are committed are interpretability, extremely intuitive UI, modular code structure, and accessible reproducibility.
Currently, the tools available in our repository are associated with a critic-only Deep Q-Learning RL agent with Hindsight Experience Replay for both single- and multiple-reward learning with generalization (no need to grasp everything now; there's more on this below).
Our plan is for the FiRe repository to be constantly expanded (by Trality as well as by the broader community). In the long run, we hope that FiRe will become a solid and trusted reference, one that bot creators can use to build their bots comfortably and reliably.
Well, pretty straightforward:
We provide the minimal necessary context which is required to navigate the structure of – and use the existing tools in – our FiRe repository.
Our current RL agent is a critic-only Deep Q-Learning agent with Hindsight Experience Replay for both single- and multiple-reward learning with generalization. Even though the full agent description goes beyond the scope of this piece, we put together the essential components.
tuple = [
starting environment state,
action taken,
next environment state visited,
weights for the reward vector,
scalar reward obtained by weighting the reward vector
]
Cloning. The FiRe repository can be cloned from GitHub by running
$ git clone https://github.com/trality/fire.git
Change to the FiRe directory. Simply run
$ cd fire
Checking your python version. You'll need python 3.8.10. You can check your current python version by typing
$ python3 --version
Activating virtual environment. Once we have cloned FiRe, we recommend setting up a virtual environment. This can be done by running
$ python3 -m venv .venv
$ source .venv/bin/activate
in the repository's main folder.
Finishing the setup. You can install all relevant packages and download the exact same BTCUSD-hour dataset we have used for some of our simulations by running
$ make
Setting up an experiment. Now that we have the repository installed, and we know what our RL agent does in a nutshell, we can substantiate more specific aspects on the code by spanning through a complete dummy experiment.
An experiment is created after specifying the chosen dataset, the agent Neural Network's hyper-parameters, the chosen reward(s), and all relevant Deep Q-Learning parameters. This information is specified in a .json file, which looks something like
{
"dataset": {
"path": "datasets/crypto_datasets/btc_h.csv",
# ...
# addition fields
# ...
},
"model": {
"epochs_for_Q_Learning_fit": "auto",
"batch_size_for_learning": 2048,
# ...
# additional fields
# ...
},
"rewards": ["LR", "SR", "ALR", "POWC"],
"window_size": 24,
"frequency_q_learning": 1000,
"Q_learning_iterations": 500,
"discount_factor_Q_learning": 0.9,
# ...
# additional fields
# ...
}
The entirety of fields in the .json is thoroughly explained in the FiRe repository. Just to mention a few: "path" indicates the chosen dataset; "Q_learning_iterations" is the number of episodes; and "rewards" contains all the considered rewards. The ones that we have used so far (all appearing above) are
Running the experiment. Once the example.json file is ready, simply run
$ python3 main.py example.json
to start the experiment.
Outputs of the experiment. In addition to saving all agent's features throughout the execution (in particular, the agent's Neural Network), relevant plots summarizing the agent's performance are produced.
The first type of plots show a given cumulative reward obtained by the agent on train/evaluation sets as the episodes progress, see Figure 1.
The second type of plots show, for any given episode, the Sharpe-Ratio based on the best performing model on the evaluation set up that very episode. These plots are used as statistically significant indicators in the comparison of multi- and single-reward simulations. See Figure 2.
The third type of plots show the overall profit (on the test set) associated with the best performing model on the evaluation set mentioned above. See Figure 3.
If you are new to our repository (or the RL in general), we appreciate that you might have a lot of questions! In trying to provide clarifications that are as clear as possible, we've put ourselves in your shoes and come up with the most important questions (in our opinion), which are listed below. Should some of your questions not be listed, feel free to get in touch with us on Discord.
All of us at Trality have worked towards setting up the foundations of a repository, one that we would like to see grow and eventually become a strong and consolidated RL library for the benefit of our bot creators.
At the same time, we also believe that allowing open access to this repository is crucial towards giving members of the financial and data science community the opportunity to work, improve, and enrich the RL tools therein. We cannot wait to see what the community has to contribute to this cause!
In this long-term effort, we believe that the closest upcoming milestones on our side (to be reasonably achieved in the next 6 months) will be the following:
Acknowledging sources. A selected number of features in our code take inspiration from two open source repositories: the gym-anytrading stocks environment and a minimal Deep Q-Learning implementation.
We hope you enjoyed reading this piece, and we would love for you to be involved in open source contributions to the FiRe repo. We look forward to answering questions that you may have.
Stay tuned for our next RL blog piece!