Amazon SageMaker helps data scientists and developers prepare, build, train, and deploy high-quality machine learning (ML) models quickly by bringing together a broad set of capabilities purpose-built for ML. SageMaker accelerates innovation within your organization by providing purpose-built tools for every step of ML development, including labeling, data preparation, feature engineering, statistical bias detection, AutoML, training, tuning, hosting, explainability, monitoring, and workflow automation.
You can use a variety of techniques to deploy new ML models to production, so choosing the right strategy is an important decision. You must weigh the options in terms of the impact of change on the system and on the end users. In this post, we show you how to deploy using a shadow deployment strategy.
Shadow deployment consists of releasing version B alongside version A, fork version A’s incoming requests, and send them to version B without impacting production traffic. This is particularly useful to test production load on a new feature and measure model performance on a new version without impacting current live traffic.
A rollout of the application is triggered when stability and performance meet the requirements.
Shadow deployment has the following advantages:
However, shadow deployment brings the following challenges:
In this post, we look at three different options for deploying models using a shadow deployment strategy:
In this section, we explore the offline process, as shown in the following diagram.
In this option, we use the data capture utility in Model Monitor. Model Monitor continuously monitors the quality of SageMaker ML models in production. With Model Monitor, you can set alerts that notify you when deviations in model quality occur. Early and proactive detection of these deviations enables you to take corrective actions, such as retraining models, auditing upstream systems, or fixing quality issues without having to monitor models manually or build additional tooling.
For shadow deployment, we enable data capture and turn on Model Monitor for a real-time inference endpoint for model version 1 to capture data from requests and responses. Then we store the captured data in an Amazon Simple Storage Service (Amazon S3) bucket. We use the file that data capture generates (input data) and batch transform to get inference for model version 2. Optionally, we can use Amazon Athena and Amazon QuickSight to prepare a dashboard and gain insights from the inferences or simply run a hash compare between the two inference data outputs to show the differences.
You can find the complete example on GitHub.
We now dive deep and demonstrate how to perform shadow deployment for real-time inferences with an ML model using AWS services such as API Gateway, Lambda, and SageMaker.
The following diagram shows our proposed architecture.
The architecture includes the following components:
Now that we have talked about the customer experience, let’s talk about the analysis and comparison.
In our example, we use a binary classification ML problem in which we predict if the data input is cancerous or not (malignant or benign). You can extend this architecture and concept to your own ML problem in a similar manner.
To implement this solution, install the AWS CDK on your device. For instructions, see Working with the AWS CDK in Python.
Complete the following steps:
git clone https://github.com/aws-samples/amazon-sagemaker-shadow-deploy.git
source .venv/bin/activate
pip install -r requirements.txt
This should generate an AWS CloudFormation template.
This fetches and sets up the data, and performs training with two sets of hyperparameters, which results in two different models (cell 8 and cell 9 in the notebook). We host these two different models in two different endpoints.
cdk deploy sagemaker-shadow-deploy –parameters endpointNameV1=shadow-linear-endpoint-v1-XXX –parameters endpointNameV2=shadow-linear-endpoint-v2-XXX
Replace the endpoint with the correct endpoint per your notebook output. Optionally, if you already have an endpoint deployed, you can use that.
In our example, we use the following code:
cdk deploy sagemaker-shadow-deploy –parameters endpointNameV1=shadow-linear-endpoint-v1-202101081721 –parameters endpointNameV2=shadow-linear-endpoint-v2-202101081802
The following stack output shows two endpoint URLs:
sagemaker-shadow-deploy.Endpointxxx – use this url in postman and send the following data: {“data”: “13.49,22.3,86.91,561.0,0.08752,0.07697999999999999,0.047510000000000004,0.033839999999999995,0.1809,0.057179999999999995,0.2338,1.3530000000000002,1.735,20.2,0.004455,0.013819999999999999,0.02095,0.01184,0.01641,0.001956,15.15,31.82,99.0,698.8,0.1162,0.1711,0.2282,0.1282,0.2871,0.06917000000000001”}
Use the URL sagemaker-shadow-deploy.ViewShadowDeploymentsViewerEndpointxxx in your browser to see the model inference results.
For more details, see the GitHub repository.
Another asynchronous approach is shown in the following diagram. Instead of invoking the second model endpoint in the same Lambda function, you can put a message in an Amazon Simple Queue Service (Amazon SQS) queue with the request ID. The queue triggers a Lambda function that fetches the request details and payload from the DynamoDB table and invokes the second model endpoint. The function also logs the response in the DynamoDB table, thereby closing the loop for the request.
You can follow this simple architecture using AWS services to host shadow deployment. If your model isn’t in AWS, you can import it in SageMaker and host the endpoint. Depending on your business use case, you can use a synchronous or asynchronous approach for shadow deployment. We took a classification business problem and built two different models, and demonstrated how to perform shadow deployment using AWS services. Visit the GitHub sample project and try out the shadow deployment approach for your model.
Ram Vittal is an enterprise solutions architect at AWS. Ram has been helping customers solve challenges across several areas such as security, governance, big data, and machine learning. He has delivered thought leadership on big data, machine learning, and cloud strategies. Ram holds professional and specialty AWS Certifications and has a master’s degree in Computer Engineering. In his spare time, he enjoys tennis and photography.
Neelam Koshiya is an enterprise solutions architect at AWS. Her current focus is to help enterprise customers with their cloud adoption journey for strategic business outcomes. In her spare time, she enjoys reading and being outdoors.
Raghu Ramesha is a Software Development Engineer (AI/ML) with the Amazon SageMaker Services SA team. He focuses on helping customers migrate ML production workloads to SageMaker at scale. He specializes in machine learning, AI, and computer vision domains, and holds a master’s degree in Computer Science from UT Dallas. In his free time, he enjoys traveling and photography.