SageMaker Projects give organizations the ability to easily setup and standardize developer environments for data scientists and CI/CD systems for MLOps Engineers. With SageMaker Projects, MLOps engineers or organization admins can define templates which bootstrap the ML Workflow with source version control, automated ML Pipelines, and a set of code to quickly start iterating over ML use cases. With Projects, dependency management, code repository management, build reproducibility, artifact sharing and management become easy for organizations to set up. SageMaker Projects are provisioned using AWS Service Catalog products. Project templates are used by organizations to provision Projects for each of their users.
This post describes how SageMaker Project templates can be customized to fit any organization’s use case. This GitHub repository contains examples of custom templates.
Every organization has its own set of standards and practices that provide security and governance for their AWS environment. SageMaker provides a set of 1st party templates for organizations that want to quickly get started with ML workflows and CI/CD. Included in the templates are projects which use AWS native services for CI/CD such as AWS CodeBuild, AWS CodePipeline, and AWS CodeCommit and also projects that use third party tools such as Jenkins and GitHub.
Oftentimes organizations need tight control over the MLOps resources that are provisioned, restricted and managed; this includes – configuring IAM roles/policies, enforcing resource tags, enforcing encryption and decoupling resources across multiple accounts. To give organizations the flexibility to do this, SageMaker Projects support custom templates where organizations use AWS CloudFormation scripts to define the resources needed for an ML workflow. These custom templates are created as AWS Service Catalog products and provisioned as Organization Templates on the SageMaker Studio UI. This is where Data Scientists would choose a template and have their ML workflow bootstrapped and pre-configured. AWS Service Catalog is an AWS service that enables organizations to create and manage catalogs of products that are approved for use on AWS. These products are created using CloudFormation templates.
To help our customers get started with common model building and deployment paradigms, SageMaker Projects offers a set of 1P templates. The 1P templates generally focuses on creating resources for model building and model training.
To understand SageMaker Projects in detail, it helps to break it up into two components – AWS resources and project seed code.
The following sections will reference “MLOps template for model building, training, and deployment”.
The CodeCommit repositories created by the template are pre-populated with seed code for model building and model deployment.
The model can then be approved or rejected in the registry. Organizations should determine how their model approval takes place and have processes that govern which users are allowed to approve models in the registry.
If the model is rejected, no action is taken.
When this 1P SageMaker project is first created, each pipeline runs automatically as the seed code is pushed into these repositories for the first time. As users subsequently update the code in either of the repositories to meet their use case, the pipelines will be re-triggered. Some use cases fit within the paradigm offered by the 1st party templates, making these templates a great way to seed ML projects. For example, adding additional steps to the pipeline, changing the data set, updating the properties of the deployed endpoint like adding DataCaptureConfig, using GitHub or Jenkins instead of the AWS native counterparts are all achievable using the 1st party templates. For more information on the 1st party SageMaker Project templates, visit the SageMaker Projects documentation.
Organizations may want to extend the 1st party templates to support use cases beyond simply training and deploying model. Custom project templates are a way for organizations to create a standard workflow for Machine Learning projects. Organization can create several templates and use IAM policies to manage access to those templates on SageMaker Studio ensuring that each of their users are accessing projects dedicated for their use cases.
Here are some common scenarios when organization would need to create a custom project template:
Scenario | Current 1P offering | Organization’s use case |
Using SVC systems not supported in the 1st party templates | 1st party templates use on AWS CodeStar Connections to authenticate with the repository limiting the 1st party templates to CodeCommit, GitHub, BitBucket, and GitHub enterprise. | Organizations may have version control systems other than the ones currently offered by 1P templates. |
Using a multi-account strategy for model training and deployment. | Currently 1P templates do model training and deployment all in the same account. | Organizations may want to use a multi-account strategy as a best practice with dedicated training accounts and staging/production accounts for deployments. |
Custom approval workflows for model deployment. | 1P templates have 2 approval steps – Approval in the Model Registry, and manual approval in the CI/CD Pipeline | Organizations may have multiple approval steps that need to take place before a model can be deployed. |
Multiple deployment stages | 1P templates deploy models in 2 stages – staging and production. Both endpoints are deployed in the same account. | Organizations may have more than 2 stages (staging, pre-production, production) for deployment. |
Multiple code branches for experimentation | 1P templates assume only a single branch in the repository used | Organizations may have multiple users working on the same repository where each of them works in individual branches for experimentation with the main branch having the best version of the training pipeline |
Custom hosting options | 1P templates use SageMaker hosted endpoints only | Organizations may want to leverage a variety of SageMaker hosting options – MME, Edge etc. |
Single pipeline for multiple use cases | 1P templates use a single SageMaker Pipeline that trains a model | Organizations may have to use a SageMaker Pipeline to train and register multiple models each with its own evaluation criteria to limit the number of pipelines to manage |
Development and production pipelines | 1P templates used a single pipeline for development and production | Organizations may want to first test their pipeline in a development environment and then use a CI/CD process to create and run that pipeline in a production environment |
Using custom seed code | 1P templates have standard seed code for model building and deployment | Organizations may want to provide their developers a set of custom seed code particular to use cases they work on |
Using SageMaker Studio in a VPC | 1P templates use SageMaker in internet-mode | Organization may be using SageMaker in vpc-only mode |
In this section, organizations will see how they can build their own custom templates and the considerations to account for when designing templates of their own.
The 1P templates use AWS CodeStar Connections to manage authentication between the Project and the repository so that the seed code can be pushed to it. This method will support GitHub, GitHub Enterprise, and BitBucket in addition to the AWS native repository, CodeCommit. If organizations want to use different repositories a different authentication mechanism needs to be provided in the Project. A recommended approach is to use an AWS Lambda function with AWS Secrets Manager to authenticate with the repository and push the seed code. Once the seed code has been pushed, the authentication with the repository on SageMaker Studio happens via repository username and password. The method with Lambda and Secrets Manger is meant for the seed code being pushed to the repo when the project is created. Alternative strategies to push seed code into the repository can be explored based on the organization’s repository, authentication mechanism, use case, etc.
The seed code pushed to the repository should be customized to support the use case for the project.
In the SageMaker Project, the CI/CD tool used will be responsible for triggering the model training and deployment process. When the status of a model is changed in the Model Registry, an EventBridge notification is emitted which can alert the CI/CD tool to begin deployment. Similarly, the CI/CD tool will need to use SageMaker’s API to start the SageMaker Pipeline execution when a change is made to the model building repository. In the 1P template that uses Jenkins for CI/CD, the Jenkins Pipeline is triggered by pushed to the SVC repository. The CI/CD Pipeline uses the AWS CLI commands to start a SageMaker Pipeline for model training and run CloudFormation scripts for deploying the model to the endpoint.
In cases where organizations want to use CI/CD tools not supported in the 1P templates (Jenkins, CodePipeline), they should make sure their repository can trigger their CI/CD Pipeline and that they AWS CLI commands can be invoked for the CI/CD pipeline so the relevant AWS services can be called (SageMaker Pipelines, CloudFormation, etc). In the 1P template, a Lambda function is used to trigger the Jenkins/CodePipeline pipeline when a model in the Model Registry is approved, the same can be done when using other CI/CD tools.
SageMaker Projects require a set of IAM roles that fall under two categories:
The 1P templates use the following AWS managed roles.
In case of custom templates, the LaunchRole needs to be updated to have enough permissions to deploy all resources in the CloudFormation template; and the UseRole needs to have all the associated services in its trust policy so the services can assume the right role. Customers can define a UseRole for each service instead of a single role for all services.
To get started, identify the user personas associated with the application in addition to the administrator, i.e., data scientists, MLOps team, etc., and design the IAM policies with least privilege access for each user, and services such as SageMaker Studio and Service Catalog. See Actions, resources, and condition keys for AWS services for an exhaustive list of IAM policies. A sample set of IAM roles are:
Based on the hosting option that’s right for the use case, the deployment components of the template should be updated to use that hosting option. For eg. if the model needs to be deployed to a multi model endpoint, the CloudFormation scripts should be updated to reflect that. Or if a Serial Inference Pipeline is used, a PipelineModel should be registered to the Model Registry by the training pipeline and CloudFormation is used to deploy the PipelineModel to a SageMaker Endpoint. Similarly, the template can be modified to support compiling a model for Edge deployment using SageMaker Neo.
In addition to the hosting option, the approval strategy should be coded into the custom template. In the 1P templates as described above, the approval process for deployment happens in 2 steps. The first is approving the model in the Model Registry, the second is a manual approval step in CodePipeline or Jenkins. This may not fit into the approval governance mechanisms in place for organizations when they deploy models. An example of a different approval mechanism could be to restrict the users that can update the Model Registry model status so only MLOps Engineers can update the status to “Approved”. Once approved, the CI/CD pipeline can have a step that checks for certain integration tests to be completed along with manual approval from an account admin before deploying the model. Such approval workflows can be designed in the custom template to define a standard practice for deployment across the organization.
Lastly, the 1P templates operate within a single account in accordance with CI/CD best practices, organizations may have a multi-account strategy with dedicated dev, staging, and prod accounts. In this case, customers can make use of AWS Organizations to manage those accounts and leverage cross account CloudFormation stacks to handle the deployment. The following diagram illustrates how this can be setup.
For detailed instruction on how to set up multi-account deployment using SageMaker Projects, refer to this blog.
{ “Sid”: “EnforceTaggingOnProjects” “Effect”: “Allow”, “Action”: [ “servicecatalog:ProvisionProduct” ], “Resource”: “*”, “Condition”: { “ForAnyValue:StringEquals”: { “aws:TagKeys”: [
When you create a custom product, you can also use the TagOption Library to enforce the values for each tag. When a tag is specified for a Project, SageMaker propagates the tags to all its resources.
In an organizational setting, you can also create multiple Project templates (Service Catalog Products) for different teams, and restrict access for each team to their corresponding template using IAM policies.
{ “Sid”: “RestrictAccessByTeam”, “Effect”: “Allow”, “Action”: [ “servicecatalog:ProvisionProduct” ], “Resource”: “*”, “Condition”: { “StringEquals”: { “aws:ResourceTag/team”:
This table describes how the best practices described above can help solve a variety of use cases where custom project templates are created.
Scenario | Proposed solution |
Using SVC systems not supported in the 1st party templates | The 1st party templates can be customized to use custom authentication mechanisms like Lambda functions with AWS Secrets Manager or any other way to access code in the repository. Refer to Source Version Control in the section Best Practices for Designing a Custom Project Template. Here is an example of this. |
Using a multi-account strategy for model training and deployment. | Use AWS Organization to manage multiple accounts and cross account CloudFormation stacks to manage deployment of models in multiple accounts. Refer to “Model Deployment Strategies” in the section Best Practices for Designing a Custom Project Template. |
Custom approval workflows for model deployment. | Add unit tests, integration tests, additional manual approval steps, multiple evaluation steps in the training pipeline, etc. to have a robust model approval strategy. Refer to “Model Deployment Strategies” in the section Best Practices for Designing a Custom Project Template. |
Multiple deployment stages | The CI/CD Pipeline (CodePipeline, Jenkins etc) needs to be updated with all the deployment steps required. One step for each stage of deployment. Refer to “Model Deployment Strategies” in the section Best Practices for Designing a Custom Project Template. |
Multiple code branches for experimentation | A custom template could be created where each time a new branch in SVC is pushed for experimentation, a SageMaker Pipeline for that branch is created and executed. Here is an example of a custom template to enable this strategy. |
Custom hosting options | A custom template can be created that changes the CloudFormation scripts used for endpoint deployment and the deployment stages can be updated to suit the hosting option selected. Refer to “Model Deployment Strategies” in the section Best Practices for Designing a Custom Project Template. |
Single pipeline for multiple use cases | The project seedcode can be updated to have a single SageMaker Pipeline train multiple models serving multiple use cases. Instances where this is useful could be when a single dataset is used to train multiple models, each model is trained on different subsets of data. This prevents the need of managing multiple pipelines and reduces the number of data preparation steps needed. |
Development and production pipelines | A custom template can be created that creates a SageMaker Pipeline using a Pipeline definition is a production environment when the definition is pushed to the SVC repository. This way, data scientists can test their pipeline in a dev environment, iterate over it until the pipeline reaches the desired state, push the pipeline definition file to the repository, have the CI/CD pipeline create a new pipeline using the same definition in the production environment, and start its execution. |
Using custom seed code | A custom template can be created that pulls code from a location that hosts the custom seed code for the projec. This can be an organization managed S3 bucket or repository to pull code from. Refer to Source Version Control in the section Best Practices for Designing a Custom Project Template. |
Using SageMaker Studio in a VPC | A custom template can be created that has access to the bucket with seed code through a VPC endpoint. Without this, when the project is created, the seed code will not be available to populate the repository. Refer to Security, Encryption, & Tagging in the section Best Practices for Designing a Custom Project Template. |
Using the best practice and guidance described here, organizations can enable their users with standardized workflows for ML that help boost productivity and ensure compliance with organization standards.
Visit this GitHub repository for an example on building your own template and contribute to the repository with custom templates of your own!
Kirit Thadaka is an ML Solutions Architect working in the SageMaker Service SA team. Prior to joining AWS, Kirit spent time working in early-stage AI startups followed by some time in consulting in various roles in AI research, MLOps, and technical leadership.
Durga Sury is a Data Scientist in the Energy Delivery team in Professional Services. Before AWS, she enabled non-profit and government agencies derive insights from their data to improve education outcomes. At AWS, she focuses on Natural Language Processing and MLOps.