A continuous integration and continuous delivery (CI/CD) pipeline helps you automate steps in your machine learning (ML) applications such as data ingestion, data preparation, feature engineering, modeling training, and model deployment. A pipeline across multiple AWS accounts improves security, agility, and resilience because an AWS account provides a natural security and access boundary for your AWS resources. This can keep your production environment safe and available for your customers while keeping training separate.
However, setting up the necessary AWS Identity and Access Management (IAM) permissions for a multi-account CI/CD pipeline for ML workloads can be difficult. AWS provides solutions such as the AWS MLOps Framework and Amazon SageMaker Pipelines to help customers deploy cross-account ML pipelines quickly. However, customers still want to know how to set up the right cross-accounts IAM roles and trust relationships to create a cross-account pipeline while encrypting their central ML artifact store.
This post is aimed is at helping customers who are familiar with, and prefer AWS CodePipeline as their DevOps and automation tool of choice. Although we use machine learning DevOps (MLOps) as an example in this post, you can use this post as a general guide on setting up a cross-account pipeline in CodePipeline incorporating many other AWS services. For MLOps in general, we recommend using SageMaker Pipelines.
We deploy a simple ML pipeline across three AWS accounts. The following in an architecture diagram to represent what you will create.
For this post, we use three accounts:
If you want to follow along, clone the Git repository to your local machine. You must have:
Run the following command to copy the Git repository.
git clone https://github.com/aws-samples/cross-account-ml-train-deploy-codepipeline
In the shared service account (Account A)
It will take a few minutes for AWS CloudFormation to deploy the resources. We’ll use the outputs from the stack in A as input for the stacks in B and C.
In the training account (Account B)
In the production account (Account C)
AWS CloudFormation creates the required IAM policies, roles, and trust relationships for your cross-account pipeline. We’ll take the AWS resources and IAM roles created by the templates to populate our pipeline and step function workflow definitions.
To run the ML pipeline, you must update the pipeline and the Step Functions state machine definition files. You can download the files from the Git repository. Replace the string values within the angle brackets (e.g. ‘
In the shared service account, navigate to the Amazon S3 console and select the Amazon S3 bucket created by AWS CloudFormation. Upload the train.csv file from the Git repository and place in a folder labeled ‘Data’. Additionally, upload the train_SF.json file in the home directory.
Note: The bucket policy denies any upload actions if not used with the KMS key. As a workaround, remove the bucket policy, upload the files, and re-apply the bucket policy.
On to the CodeCommit page select the repository created by AWS CloudFormation. Upload the sf-sm-train-demo.json file into the empty repository. The sf-sm-train-demo.json file should be updated with the values from the AWS CloudFormation template outputs. Provide a name, email, and optional message to the main branch and Commit changes.
Now that everything is set up, you can create and deploy the pipeline.
We secured our S3 bucket with the bucket policy and KMS key. Only the CodePipeline service role, and the cross-account roles created by the AWS CloudFormation template in the training and production accounts can use the key. The same applies to the CodeCommit repository. We can run the following command from the shared services account to create the pipeline.
aws codepipeline create-pipeline –cli-input-json file://test_pipeline_v3.json
After a successful response, the custom deploy-cf-train-test stage creates an AWS CloudFormation template in the training account. You can check the CodePipeline status in the console.
AWS CloudFormation Code deploys a Step Functions state machine to start a model training job by assuming the CodePipeline role in the shared services account. The cross-account role in the training account permits access to the S3 bucket, KMS key, CodeCommit repo, and pass role Step Functions state machine execution.
When the state machine successfully runs, the pipeline requests manual approval before the final stage. On the CodePipeline console choose Review and approve the changes. This moves the pipeline into the final stage of invoking the Lambda function to deploy the model.
When the training job completes, the Lambda function in the production account deploys the model endpoint. To do this, the Lambda function assumes the role in the shared service account to run the required PutJobSuccessResult CodePipeline command.
Congratulations! You’ve built the foundation for setting up a cross-account ML pipeline using CodePipeline for training and deployment. You can now see a live SageMaker endpoint in the shared service account created by the Lambda function in the production account.
In this blog post, you created a cross-account ML pipeline using AWS CodePipeline, AWS CloudFormation, and Lambda. You set up the necessary IAM policies and roles to enable this cross-account access using a shared services account to hold the ML artifacts in an S3 bucket and a customer managed KMS key for encryption. You deployed a pipeline where different accounts deployed different stages of the pipeline using AWS CloudFormation to create a Step Functions state machine for model training and Lambda to invoke it.
You can use the steps outlined here to set up cross-account pipelines to fit your workload. For example, you can use CodePipeline to safely deploy and monitor SageMaker endpoints. CodePipeline helps you automate steps in your software delivery process to enable agility and productivity for your teams. Contact your account team to learn how to you can get started today!
Peter Chung is a Solutions Architect for AWS, and is passionate about helping customers uncover insights from their data. He has been building solutions to help organizations make data-driven decisions in both the public and private sectors. He holds all AWS certifications as well as two GCP certifications. He enjoys coffee, cooking, staying active, and spending time with his family.
Qingwei Li is a Machine Learning Specialist at Amazon Web Services. He received his Ph.D. in Operations Research after he broke his advisor’s research grant account and failed to deliver the Nobel Prize he promised. Currently he helps customers in the financial service and insurance industry build machine learning solutions on AWS. In his spare time, he likes reading and teaching.
David Ping is a Principal Machine Learning Architect and Sr. Manager of AI/ML Solutions Architecture at Amazon Web Services. He helps enterprise customers build and operate machine learning solutions on AWS. David enjoys hiking and following the latest machine learning advancement.
Rajdeep Saha is a specialist solutions architect for serverless and containers at Amazon Web Services (AWS). He helps customers design scalable and secure applications on AWS. Rajdeep is passionate about helping and teaching newcomers about cloud computing. He is based out of New York City and uses Twitter, sparingly at @_rajdeepsaha.