Amazon SageMaker Studio is the first fully integrated development environment (IDE) for machine learning (ML). Studio provides a single web-based visual interface where you can perform all ML development steps required to prepare data, as well as build, train, and deploy models. Lifecycle configurations are shell scripts triggered by Studio lifecycle events, such as starting a new Studio notebook. You can use lifecycle configurations to automate customization for your Studio environment. This customization includes installing custom packages, configuring notebook extensions, preloading datasets, and setting up source code repositories. For example, as an administrator for a Studio domain, you may want to save costs by having notebook apps shut down automatically after long periods of inactivity.
The AWS Cloud Development Kit (AWS CDK) is a framework for defining cloud infrastructure through code and provisioning it through AWS CloudFormation stacks. A stack is a collection of AWS resources that can be programmatically updated, moved, or deleted. AWS CDK constructs are the building blocks of AWS CDK applications, representing the blueprint to define cloud architectures.
In this post, we show how to use the AWS CDK to set up Studio, use Studio lifecycle configurations, and enable its access for data scientists and developers in your organization.
The modularity of lifecycle configurations allows you to apply them to all users in a domain or to specific users. This way, you can set up lifecycle configurations and reference them in the Studio kernel gateway or Jupyter server quickly and consistently. The kernel gateway is the entry point to interact with a notebook instance, whereas the Jupyter server represents the Studio instance. This enables you to apply DevOps best practices and meet safety, compliance, and configuration standards across all AWS accounts and Regions. For this post, we use Python as the main language, but the code can be easily changed to other AWS CDK supported languages. For more information, refer to Working with the AWS CDK.
To get started, make sure you have the following prerequisites:
First, clone the GitHub repository.
As you clone the repository, you can observe that we have a classic AWS CDK project with the directory studio-lifecycle-config-construct, which contains the construct and resources required to create lifecycle configurations.
The file we want to inspect is aws_sagemaker_lifecycle.py. This file contains the SageMakerStudioLifeCycleConfig construct we use to set up and create lifecycle configurations.
The SageMakerStudioLifeCycleConfig construct provides the framework for building lifecycle configurations using a custom AWS Lambda function and shell code read in from a file. The construct contains the following parameters:
For more information on the Studio notebook architecture, refer to Dive deep into Amazon SageMaker Studio Notebooks architecture.
The following is a code snippet of the Studio lifecycle config construct (aws_sagemaker_lifecycle.py):
class SageMakerStudioLifeCycleConfig(Construct): def __init__( self, scope: Construct, id: str, studio_lifecycle_config_content: str, studio_lifecycle_config_app_type: str, studio_lifecycle_config_name: str, studio_lifecycle_config_arn: str, **kwargs, ): super().__init__(scope, id) self.studio_lifecycle_content = studio_lifecycle_content self.studio_lifecycle_config_name = studio_lifecycle_config_name self.studio_lifecycle_config_app_type = studio_lifecycle_config_app_type lifecycle_config_role = iam.Role( self, “SmStudioLifeCycleConfigRole”, assumed_by=iam.ServicePrincipal(“lambda.amazonaws.com”), ) lifecycle_config_role.add_to_policy( iam.PolicyStatement( resources=[f”arn:aws:sagemaker:{scope.region}:{scope.account}:*”], actions=[ “sagemaker:CreateStudioLifecycleConfig”, “sagemaker:ListUserProfiles”, “sagemaker:UpdateUserProfile”, “sagemaker:DeleteStudioLifecycleConfig”, “sagemaker:AddTags”, ], ) ) create_lifecycle_script_lambda = lambda_.Function( self, “CreateLifeCycleConfigLambda”, runtime=lambda_.Runtime.PYTHON_3_8, timeout=Duration.minutes(3), code=lambda_.Code.from_asset( “../mlsl-cdk-constructs-lib/src/studiolifecycleconfigconstruct” ), handler=”onEvent.handler”, role=lifecycle_config_role, environment={ “studio_lifecycle_content”: self.studio_lifecycle_content, “studio_lifecycle_config_name”: self.studio_lifecycle_config_name, “studio_lifecycle_config_app_type”: self.studio_lifecycle_config_app_type, }, ) config_custom_resource_provider = custom_resources.Provider( self, “ConfigCustomResourceProvider”, on_event_handler=create_lifecycle_script_lambda, ) studio_lifecyle_config_custom_resource = CustomResource( self, “LifeCycleCustomResource”, service_token=config_custom_resource_provider.service_token, ) self. studio_lifecycle_config_arn = studio_lifecycle_config_custom_resource.get_att(“StudioLifecycleConfigArn”)
After you import and install the construct, you can use it. The following code snippet shows how to create a lifecycle config using the construct in a stack either in app.py or another construct:
my_studio_lifecycle_config = SageMakerStudioLifeCycleConfig( self, “MLSLBlogPost”, studio_lifecycle_config_content=”base64content”, studio_lifecycle_config_name=”BlogPostTest”, studio_lifecycle_config_app_type=”JupyterServer”, )
To deploy your AWS CDK stack, run the following commands in the location where you cloned the repository.
The command may be python instead of python3 depending on your path configurations.
When the stack is successfully deployed, you should be able to view the stack on the CloudFormation console.
You will also be able to view the lifecycle configuration on the SageMaker console.
Choose the lifecycle configuration to view the shell code that runs as well as any tags you assigned.
There are multiple ways to attach a lifecycle configuration. In this section, we present two methods: using the AWS Management Console, and programmatically using the infrastructure provided.
To use the console, complete the following steps:
From here, you can also set it as default.
You can also retrieve the ARN of the Studio lifecycle configuration created by the construct’s and attach it to the Studio construct programmatically. The following code shows the lifecycle configuration ARN being passed to a Studio construct:
default_user_settings=sagemaker.CfnDomain.UserSettingsProperty( execution_role=self.sagemaker_role.role_arn, jupyter_server_app_settings=sagemaker.CfnDomain.JupyterServerAppSettingsProperty( default_resource_spec=sagemaker.CfnDomain.ResourceSpecProperty( instance_type=”system”, lifecycle_config_arn = my_studio_lifecycle_config.studio_lifeycycle_config_arn ) )
Complete the steps in this section to clean up your resources.
Delete the Studio lifecycle configuration
To delete your lifecycle configuration, complete the following steps:
Delete the AWS CDK stack
When you’re done with the resources you created, you can destroy your AWS CDK stack by running the following command in the location where you cloned the repository:
When asked to confirm the deletion of the stack, enter yes.
You can also delete the stack on the AWS CloudFormation console with the following steps:
If you run into any errors, you may have to manually delete some resources depending on your account configuration.
In this post, we discussed how Studio serves as an IDE for ML workloads. Studio offers lifecycle configuration support, which allows you to set up custom shell scripts to perform automated tasks, or set up development environments at launch. We used AWS CDK constructs to build the infrastructure for the custom resource and lifecycle configuration. Constructs are synthesized into CloudFormation stacks that are then deployed to create the custom resource and lifecycle script that is used in Studio and the notebook kernel.
For more information, visit Amazon SageMaker Studio.
Cory Hairston is a Software Engineer with the Amazon ML Solutions Lab. He currently works on providing reusable software solutions.
Alex Chirayath is a Senior Machine Learning Engineer at the Amazon ML Solutions Lab. He leads teams of data scientists and engineers to build AI applications to address business needs.
Gouri Pandeshwar is an Engineer Manager at the Amazon ML Solutions Lab. He and his team of engineers are working to build reusable solutions and frameworks that help accelerate adoption of AWS AI/ML services for customers’ business use cases.