Amazon SageMaker projects are AWS Service Catalog provisioned products that enable you to easily create end-to-end machine learning (ML) solutions. SageMaker projects give organizations the ability to use templates that bootstrap ML solutions for your users to speed up the start time for ML development.
You can now use SageMaker projects to manage custom dependencies through an image building continuous integration and continuous delivery (CI/CD) pipeline that’s available as a first-party template on Amazon SageMaker Studio. This new capability gives developers the flexibility to make updates to the images you use for training, processing, and inference by changing the container files in your project’s source control repositories, which automatically triggers an image building pipeline. The template uses AWS CodeCommit as the code repository. You can use the newly created images in a SageMaker pipeline for processing, training, and inference.
The new template options are now available via the SageMaker Python SDK or within the Studio IDE, as shown in the following screenshot.
The new template uses AWS CodePipeline to build and push images to Amazon ECR and then trigger a SageMaker pipeline that trains a model and registers it to the SageMaker Model Registry. After it’s in the model registry, the model status can be updated to approved, which triggers the model deployment process via CodePipeline.
The following architectural diagram doesn’t include the CodeCommit repositories for the model building and model deployment code. The focus is on the image building features in the new template.
We use the new MLOps project template for image building CI/CD to provision and configure the following resources, which are discussed in more detail later in this post:
All of the provisioning and configuration required to set up the end-to-end CI/CD pipeline using these resources is automatically performed by SageMaker projects.
Now that we’ve covered how the new feature works, let’s walk through the one-time setup tasks followed by using the new templates.
To create your SageMaker project, complete the following steps:
A message appears indicating that SageMaker is provisioning and configuring the resources.
When the project is complete, you receive a successful message.
Your project is now listed on the Projects list.
Five CodeCommit repositories are created by this project template:
After all five repositories are cloned, they’re available in the Studio UI.
In this example, we generated all three image building repositories. In this section, you see the structure of the repository and learn how it can be updated to meet your custom requirements.
Each of the image building repositories follow the same structure, as shown in the following screenshot.
The image that’s created in this repository is a simple XGBoost image, but following this structure, the Dockerfile can be updated to meet the use case being worked on. The codebuild-buildspec.yml file is used to configure CodeBuild so that the image can be built and pushed to Amazon ECR.
You can navigate to the CodeBuild console to see the status of the images that are built.
The CodePipeline pipelines associated with each repository run automatically on project creation. New builds are triggered when changes are pushed to the repository. You can see the images on the Amazon ECR console.
When new code is pushed to any of the image building repositories, the CodeBuild project starts and the new version of the image is built and pushed to Amazon ECR. A set of EventBridge rules are created to automate each step of the ML workflow. In this new template, a rule in EventBridge is created to trigger the model build pipeline when a new container version is pushed to Amazon ECR.
The model build pipeline target starts running the SageMaker pipeline.
In this section, we demonstrate how an update to a Dockerfile in one of the repositories triggers a CodeBuild process that creates and pushes a new image version to Amazon ECR, and the subsequent ML pipeline that’s launched.
The following screenshots contain the logs of the CodeBuild stage that builds to container using the updated Dockerfile pushed to the repository.
The image version being pushed to Amazon ECR triggers the SageMaker pipeline in the model build repository.
The model can be viewed and approved in the model registry similar to the workflows in the other MLOps templates on SageMaker Studio.
In this post, we walked through the new SageMaker MLOps project template for image building CI/CD. With the structure provided in the template, you can modify the Dockerfiles to meet your use case, create a custom template with more image building repositories, or create custom rules for the automatic pipeline triggering. Try it out and let us know if you have any questions in the comments section!
Kirit Thadaka is an ML Solutions Architect working in the Amazon SageMaker Service SA team. Prior to joining AWS, Kirit spent time working in early stage AI startups followed by some time consulting in various roles in AI research, MLOps, and technical leadership.