Launched at AWS re:Invent 2021, Amazon SageMaker Ground Truth Plus helps you create high-quality training datasets by removing the undifferentiated heavy lifting associated with building data labeling applications and managing the labeling workforce. All you do is share data along with labeling requirements, and Ground Truth Plus sets up and manages your data labeling workflow based on these requirements. From there, an expert workforce that is trained on a variety of machine learning (ML) tasks labels your data. You don’t even need deep ML expertise or knowledge of workflow design and quality management to use Ground Truth Plus.
Today, we are excited to announce the launch of new built-in interfaces on Ground Truth Plus. With this new capability, multiple Ground Truth Plus users can now create a new project and batch, share data, and receive data using the same AWS account through self-serve interfaces. This enables you to accelerate the development of high-quality training datasets by reducing project set up time. Additionally, you can control fine-grained access to your data by scoping your AWS Identity and Access Management (IAM) role permissions to match your individual level of Amazon Simple Storage Service (Amazon S3) access, and you always have the option to revoke access to certain buckets.
Until now, you had to reach out to your Ground Truth Plus operations program manager (OPM) to create new data labeling projects and batches. This process had some restrictions because it allowed only one user to request a new project and batch—if multiple users within the organization were using the same AWS account, then only one user could request a new data labeling project and batch using the Ground Truth Plus console. Additionally, the process created artificial delays in kicking off the labeling process due to multiple manual touchpoints and troubleshooting required in case of issues. Separately, all the projects used the same IAM role for accessing data. Therefore, to run projects and batches that needed access to different data sources such as different Amazon S3 buckets, you had to rely on your Ground Truth Plus OPM to provide your account specific S3 policies, which you had to manually apply to your S3 buckets. This entire operation was manually intensive resulting in operational overheads.
This post walks you through steps to create a new project and batch, share data, and receive data using the new self-serve interfaces to efficiently kickstart the labeling process. This post assumes that you are familiar with Ground Truth Plus. For more information, see Amazon SageMaker Ground Truth Plus – Create Training Datasets Without Code or In-house Resources.
We demonstrate how to do the following:
Before you get started, make sure you have the following prerequisites:
If you have a Ground Truth Plus project before the launch (December 9, 2022) of the new features described in this post, then you need to create and share an IAM role so that you can use these features with your existing Ground Truth Plus project. If you’re a new user of Ground Truth Plus, you can skip this section.
To create an IAM role, complete the following steps:
On the Add permissions tab, you should see the new policy you created (refresh the page if you don’t see it).
To request a new project, complete the following steps:
This is where all your projects are listed.
The Request project page is your opportunity to provide details that will help us schedule an initial consultation call and set up your project.
To label your data, Ground Truth Plus needs temporary access to your raw data in an S3 bucket. When the labeling process is complete, Ground Truth Plus delivers the labeling output back to your S3 bucket. This is done through an IAM role. You can either create a new role, or you can navigate to the IAM console to create a new role (refer to the previous section for instructions).
Your IAM role will allow Ground Truth Plus, identified as sagemaker-ground-truth-plus.amazonaws.com in the role’s trust policy, to run the following actions on your S3 buckets:
[ “s3:GetObject”, “s3:PutObject”, “s3:GetBucketLocation”, “s3:ListBucket” ]
A Ground Truth Plus OPM will schedule an initial consultation call with you to discuss your data labeling project requirements and pricing.
After you request a project, you need to create a project team to log in to your project portal. A project team provides access to the members from your organization or team to track projects, view metrics, and review labels. You can use the option Invite new members by email or Import members from existing Amazon Cognito user groups. In this post, we show how to import members from existing Amazon Cognito user groups to add users to your project team.
User pools require a domain and an existing user group.
We recommend using a client generated by Amazon SageMaker.
You can add more team members after creating the project team by choosing Invite new members on the Members page of the Ground Truth Plus console.
After you have successfully submitted the project request and created a project team, you can access the Ground Truth Plus project portal by clicking Open project portal on the Ground Truth Plus console.
You can use the project portal to create batches for a project, but only after the project’s status has changed to Request approved.
To ensure the batch is created successfully, you must meet the following requirements:
Your batch status will show as Request submitted. After Ground Truth Plus has temporary access to your data, AWS experts will set up data labeling workflows and operate them on your behalf, which will change the batch status to In-progress. When the labeling is complete, the batch status changes from In-progress to Ready for review. If you want to review your labels before receiving the labels then choose Review batch. From there, you have an option to choose Accept batch to receive your labeled data.
This post showed you how multiple Ground Truth Plus users can now create a new project and batch, share data, and receive data using the same AWS account through new self-serve interfaces. This new capability allows you to kickstart your labeling projects faster and reduces operational overhead. We also demonstrated how you can control fine-grained access to data by scoping your IAM role permissions to match your individual level of access.
We encourage you to try out this new functionality, and connect with the Machine Learning & AI community if you have any questions or feedback!
Manish Goel is the Product Manager for Amazon SageMaker Ground Truth Plus. He is focused on building products that make it easier for customers to adopt machine learning. In his spare time, he enjoys road trips and reading books.
Karthik Ganduri is a Software Development Engineer at Amazon AWS, where he works on building ML tools for customers and internal solutions. Outside of work, he enjoys clicking pictures.
Zhuling Bai is a Software Development Engineer at Amazon AWS. She works on developing large scale distributed systems to solve machine learning problems.
Aatef Baransy is a Frontend engineer at Amazon AWS. He writes fast, reliable, and thoroughly tested software to nurture and grow the industry’s most cutting-edge AI applications.
Mohammad Adnan is a Senior Engineer for AI and ML at AWS. He was part of many AWS service launch, notably Amazon Lookout for Metrics and AWS Panorama. Currently, he is focusing on AWS human-in-the-loop offerings (AWS SageMaker’s Ground truth, Ground truth plus and Augmented AI). He is a clean code advocate and a subject-matter expert on server-less and event-driven architecture. You can follow him on LinkedIn, mohammad-adnan-6a99a829.