Financial institutions invest heavily to automate their decision-making for trading and portfolio management. In the US, the majority of trading volume is generated through algorithmic trading. [1]
With cloud computing, vast amounts of historical data can be processed in real time and fed into sophisticated machine learning (ML) models. This allows market participants to discover and exploit new patterns for trading and asset managers to use ML models to construct better investment portfolios.
In this post, we explain how to use Amazon SageMaker to deploy algorithmic trading strategies using ML models for trade decisions. In the following sections, we go over the high-level concepts. The GitHub repo has the full source code in Python.
The key ingredients for our solution are the following components:
We use Jupyter notebooks as our central interface for exploring and backtesting new trading strategies. Amazon SageMaker allows you to set up Jupyter notebooks and integrate them with AWS CodeCommit to store different versions of strategies and share them with other team members.
Because we want to focus on ML-based strategies, we need a scalable data store, so we use Amazon Simple Storage Service (Amazon S3) to store historical market data, model artifacts, and backtesting results.
For our trading strategies, we create Docker containers that contain the required libraries for backtesting and the strategy itself. These containers follow the SageMaker Docker container structure in order to run them inside of SageMaker along with the ML models. For more information about the structure of SageMaker containers, see Using the SageMaker Training and Inference Toolkits.
An advantage to this approach is that we can use the same APIs and the SageMaker management console for ML and backtesting.
The following diagram illustrates this architecture.
For live trading, we assume that we can run trading strategies in a container that connects to a broker that provides market data, takes orders, and notifies about trades. To host the containers, we can use SageMaker or AWS Fargate. SageMaker can host the ML models that we use for trade decisions.
The following diagram illustrates this architecture.
In the following sections, we focus on backtesting and ML for algorithmic trading strategies.
For backtesting, we use an open-source backtesting framework. This approach works similarly with other backtesting frameworks as long as they can be run in a Docker container. Defining an algorithmic trading strategy generally follows four steps:
To run the backtest in SageMaker, we build and deploy our SageMaker-compatible container that contains the trading strategy to Amazon Elastic Container Registry (Amazon ECR) and then run it with the following command:
algo = sage.estimator.Estimator( image_uri=image, role=role, instance_count=1, instance_type=’ml.m4.xlarge’, output_path=”s3://{}/output”.format(sess.default_bucket()), sagemaker_session=sess, base_job_name=job_name, hyperparameters=config, metric_definitions=[ { “Name”: “algo:pnl”, “Regex”: “Total PnL:(.*?)]” }, { “Name”: “algo:sharpe_ratio”, “Regex”: “Sharpe Ratio:(.*?),” } ]) algo.fit(data_location)
When the backtest is complete, we can get the performance metrics that we defined earlier by running the following command:
from sagemaker.analytics import TrainingJobAnalytics latest_job_name = algo.latest_training_job.job_name metrics_dataframe = TrainingJobAnalytics(training_job_name=latest_job_name).dataframe() metrics_dataframe
The following screenshot shows our results.
When the training job is complete, SageMaker stores the data for the trained model in Amazon Simple Storage Service (Amazon S3). For our use case, we run a backtest and use this feature to store a chart for the trading strategy. With the following command, you can visualize it in the notebook:
from IPython.display import Image model_name=algo.model_data.replace(‘s3://’+sess.default_bucket()+’/’,”) s3 = boto3.resource(‘s3′) my_bucket = s3.Bucket(sess.default_bucket()) my_bucket.download_file(model_name,’model.tar.gz’) !tar -xzf model.tar.gz !rm model.tar.gz Image(filename=’chart.png’)
The following screenshot shows our visualization.
To get a better understanding how this works for a simple strategy without ML, you can run the detailed steps in the Strategy_SMA.ipynb Jupyter notebook for backtesting a moving average crossover strategy with SageMaker. Before you can use the notebooks, follow the instructions on the GitHub repo for setting up the required infrastructure and loading some sample historical price data.
For an ML-based trading strategy, we need to frame the ML problem for our strategy and then train it with SageMaker. In our example, we use daily historical stock prices, and we train a binary classification model that predicts if a given price target will be reached in the near future based on historical prices and technical indicators. For the technical indicators, we calculate SMA (simple moving average) and ROC (rate of change) over different periods, from a few days up to several days. The last close and SMA prices are normalized between 0–1. We label each dataset with a long and short column that describes if we reached the profit target for a long or short trade without being stopped out in the next few days.
The following screenshot shows a sample of our original data.
For the ML model, the training data has the following structure:
The following screenshot shows a sample of our training data.
To train the model in SageMaker, we build and deploy our SageMaker-compatible container to Amazon ECR and then train it with the following command:
classifier = sage.estimator.Estimator( image_uri=image, role=role, train_instance_count=1, train_instance_type=’ml.m4.xlarge’, output_path=”s3://{}/output”.format(sess.default_bucket()), sagemaker_session=sess, base_job_name=job_name) classifier.fit(data_location)
You can run the detailed steps in the Train_Model_Forecast.ipynbJupyter Notebook.
After you complete the notebook, we have a trained model on 40% of the historical data. We can host it in SageMaker for inference or combine the model artifact directly with the SageMaker container for the trading strategy.
For our ML-based strategy, we take a simple approach. We take our trained classification model and predict at each price update if a long or short trade is profitable in the following days. If the model predicts with a threshold higher than 50% that a long or short trade is profitable, we take the trade and aim for a percentage-based profit target and protect it with a percentage-based stop-loss.
You can follow the detailed steps in the Strategy_ML_Forecast.ipynb Jupyter notebook for backtesting this strategy on the remaining 60% of historical data.
After you run through the notebook, you can review the performance metrics and analyze the buy and sell orders for your ML-based strategy.
SageMaker provides a flexible and scalable solution for the development of algorithmic trading strategies, especially when combined with ML. With more advanced ML models like reinforcement learning, algorithmic trading and portfolio investment will fundamentally change in the future and more innovation is expected in this space. For more information, see Portfolio Management with Amazon SageMaker RL.
In this blog post, we described how to use SageMaker for backtesting of machine learning-based trading strategies. An important component is the data that can be used for machine learning and a scalable solution for storing and querying financial data is required. To accelerate setting up an environment for storing and querying your financial data, you can use Amazon FinSpace that provides a turnkey service designed for financial services customers with data management and integrated notebooks for analytics. More details can be found in this blog post.
This post is for educational purposes only and past trading performance does not guarantee future performance.
[1] “The stock market is now run by computers, algorithms and passive managers”, 2019: https://www.economist.com/briefing/2019/10/05/the-stockmarket-is-now-run-by-computers-algorithms-and-passive-managers
Oliver Steffmann is a Solutions Architect at AWS based in New York and brings over 18 years of experience in designing and delivering trading and risk solutions for financial service customers. Oliver leverages his knowledge of Big Data and Machine Learning to help customers with their digital transformation.