This post outlines the best practices for provisioning Amazon SageMaker Studio for data science teams and provides reference architectures and AWS CloudFormation templates to help you get started. We use AWS Service Catalog to provision a Studio domain and users. The AWS Service Catalog allows you to provision these centrally without requiring each user to obtain Amazon SageMaker access policies to provision Studio separately.
SageMaker is a fully managed service that provides every machine learning (ML) developer and data scientist with the ability to build, train, and deploy ML models quickly. Studio is a web-based integrated development environment (IDE) for ML that lets you build, train, debug, deploy, and monitor your ML models. Studio provides all the tools that you need to take your models from experimentation to production while boosting your productivity. You can write code, track experiments, visualize data, and debug and monitor within a single, integrated visual interface.
Studio supports three authentication modes: AWS Identity and Access Management (IAM), federated single sign-on (SSO) and AWS Single Sign-On (AWS SSO). The steps outlined in this post apply to IAM and federated SSO modes only. To provision Studio using AWS SSO, see Onboard to Amazon SageMaker Studio Using AWS SSO.
Let’s start by looking at the key components within Studio that we need to consider while provisioning:
The following section describes two different personas that interact with Studio resources and the level of access they need to fulfill their duties. We use this as a high-level requirement to model IAM roles and policies to establish desired controls based on resource ownership at the team and user level.
The following architecture diagram shows the necessary permissions and flow for the two user personas, cloud admin and data scientist.
The architecture has the following workflow:
Before you get started, complete the following prerequisites:
The cloud admin role needs full access to AWS Service Catalog, but not SageMaker. When you create the role, make sure that the ServiceCatalogAdminFullAccess managed policy is attached. When cloud admins initiate product provisioning from AWS Service Catalog, the provisioning role is assumed. You don’t need to create the provisioning role; it’s automatedly provisioned by our sample templates along with the AWS Service Catalog products (which we discuss in the next section).
AWS Service Catalog allows you to create and manage a catalog of services to be provisioned under an AWS account. You can use a CloudFormation template to define how to provision a specific service and release it as an AWS Service Catalog product. Products also can be organized as product portfolios.
After the product has been populated to AWS Service Catalog, users (with proper access rights) can provision these products self-service. The user doesn’t need access to the service being provisioned (with AWS Service Catalog products, you also can set up an execution role used for provisioning), it just needs access to the AWS Service Catalog products.
In this post, we use AWS Service Catalog to provision Studio and onboard Studio users. We provide the underlying CloudFormation templates for SageMaker products as well as a launch template to populate the SageMaker-related products into AWS Service Catalog. You can find these templates in the GitHub repo.
Complete the following steps to run the launch template, which populates the AWS Service Catalog products:
By default, the template launches in us-west-2, but you can switch to another Region before starting the template.
Users who assume this role have access to initiate product provisioning.
To provision your Studio domain, complete the following steps:
After you create your Studio domain, you can start provisioning your Studio users. To provision a new user profile via AWS Service Catalog, complete the following steps:
For step-by-step instructions on provisioning the Studio domain and user profiles, check out the following videos:
The data scientist role needs permission to create a pre-singed URL that enables users to log in to Studio. When you provision the domain, the template also creates the data scientist policy with the pre-signed URL access. This policy can either be attached to your data scientist role or directly to the IAM users.
You don’t need additional SageMaker access policies because Studio assumes the execution role on your behalf. In our example CloudFormation templates, we provision this role for you, but you can customize based on your needs.
You can also restrict access to just those user profiles that are assigned to specific users via tags. For more information, see Configuring Amazon SageMaker Studio for teams and groups with complete resource isolation.
You can use AWS Service Catalog to delete your Studio domain and user profiles.
In this post, we demonstrated how you can provision Studio and onboard your Studio users via AWS Service Catalog, which provides better governance. We also demonstrated how to decouple the cloud admin role from the data scientist role. As a cloud admin, you have access to provision new resources, but the cloud admins don’t need any SageMaker access. Data scientists need SageMaker access, but they can’t provision new user profiles or Studio domains. This separation of roles leads to better isolation of concerns, governance, and security. You can find these templates in the GitHub repo.
Andras Garzo is a Solutions Architect in the ML Migration team, helping customers adopt Amazon SageMaker, save costs, and make their ML workload more performant.
Sam Palani is an AI/ML Specialist Solutions Architect at AWS. He enjoys working with customers to help them architect machine learning solutions at scale. When not helping customers, he enjoys reading and exploring the outdoors.
Rama Thamman is a Software Development Manager with the AI Platforms team, leading the ML Migrations team.