If you need to integrate image analysis into your business process to detect objects or scenes unique to your business domain, you need to build your own custom machine learning (ML) model. Building a custom model requires advanced ML expertise and can be a technical challenge if you have limited ML knowledge. Because model performance can change over inference results, you need to implement an automated ML workflow that can continuously retrain a model with newly captured and human-labeled images. Incorporating a human review process and having access to a readily available human labeling workforce can pose a business challenge. In addition, you need to consider adding flexibility into the ML workflow to allow for change without requiring development rework as business objectives evolve over time. Developing a customizable ML workflow that behaves similar to a business rule engine requires significant upfront investment, which can be a resource challenge.
This post is the first in a two-part series that explains how to implement an automated Amazon Rekognition Custom Labels and Amazon Augmented AI (Amazon A2I) ML workflow that can provide continuous model improvement without requiring ML expertise.
With Amazon Rekognition Custom Labels, you can easily build and deploy ML models to identify custom objects that are specific to your business domain. Because Amazon Rekognition Custom Labels is built off Amazon Rekognition trained models, you only need to use a small set of training images to build your custom model, without requiring any ML knowledge. When combined with Amazon A2I, you can quickly integrate a human review process into your ML workflow to capture and label images for model training. Amazon A2I provides the capability to integrate your own, contracted, or readily accessible Amazon Mechanical Turk workforce to provide the human label review. With AWS Step Functions, you can create and run a series of checkpoints and event-driven processes to orchestrate the entire ML workflow with minimal upfront development. By incorporating AWS Systems Manager Parameter Store, you can use parameters as variables for Step Functions checkpoints to customize the behaviors of the ML workflow as needed.
In this post, we explain how we use Step Functions and Parameter Store to allow a model operator to configure the ML workflow, similar to a business rule engine, without requiring development rework.
For this use case, we want to build a Amazon Rekognition Custom Labels model for custom logo detection with training images. As we start using the model, we capture inference images with low detection confidence for human labeling. Captured images that can be properly labeled are added to the training images for model training as part the of continuous model improvement process.
The objective of this ML workflow is to continuously improve the accuracy of the model based on inference performance. Specific to this Amazon Rekognition Custom Labels use case, inference images with a confidence level below the acceptance criteria need to be captured and labeled by a human workforce with Amazon A2I for new model training. The following diagram illustrates this ML workflow.
We add flexibility to this ML workflow by parameterizing some of the processes, as indicated in green in the preceding diagram. Parameterizing the workflow allows a model operator to make changes to the processes without requiring development. We provide seven configurable parameters:
We use Step Functions to deploy the following state machine for the orchestration of the workflow.
The state machine is event-driven and divided into four separate states:
This solution is built on AWS serverless architecture. The architecture is shown in the following diagram.
We use Amazon Rekognition Custom Labels as the core ML service. A Recognition Custom Labels project is created as part the initial AWS CloudFormation deployment process. Creating, starting, and stopping project versions are performed automatically as orchestrated by the state machine backed by Lambda.
We use Amazon A2I to provide a human labeling workflow to label captured images during the inference process. A flow definition is created as part of the CloudFormation stack. An Amazon A2I human labeling task is generated by Lambda as part of the custom label detection process.
We optionally deploy an Amazon SageMaker Ground Truth private workforce and team, if none existed, as part of the CloudFormation stack. The human flow definition has a dependency for the Ground Truth private team to function.
We optionally deploy an Amazon Cognito user pool and app client, if none existed, as part of the CloudFormation stack. The Ground Truth private workforce has a dependency for the user pool and app client to function.
We use Parameter Store in two different ways. Firstly, we provide a set of seven single-value parameters for the model operator to use to configure the ML workflow. Secondly, we provide a JSON-based parameter for the system to use to store environmental variables and operational data.
We use two EventBridge rules to initiate Step Functions state machine runs. The first rule is based on a Systems Manager event pattern. The Systems Manager rule is triggered by changes to the Parameter Store and initiates the state machine to invoke a Lambda function to apply changes to the impacted resources. The second rule is a schedule rule. The schedule rule is triggered periodically to initiate the state machine to invoke a Lambda function to check for new model training.
We use a Step Functions state machine to orchestrate the ML workflow. The state machine initiates different processes based on events received from EventBridge and responses from Lambda. In addition, the state machine uses an internal process such as Wait to wait for model training and deployment to complete and Choice to evaluate for next tasks.
We use an S3 bucket and a set of predefined folders to store training and inference images and model artifacts. Each folder has a dedicated purpose. The model operator uploads new images to the folder images_labeled_by_folder for training, and the model consumer uploads inference images to the folder images_for_detection for custom label detection.
We use three different sets of Lambda functions:
We use Amazon Simple Notification Service (Amazon SNS) as a communication mechanism to alert the model operator and model consumer of relevant model training and detection events. All SNS messages are published by the corresponding Lambda functions.
In this post, walked through a continuous model improvement ML workflow with Amazon Rekognition Custom Labels and Amazon A2I. We explained how we use Step Functions to orchestrate model training and deployment, and custom label detection backed by a human labeling private workforce. We described how we use Parameter Store to parameterize the ML workflow to provide flexibility without needing development rework.
In Part 2 of this series, we provide step-by-step instructions to deploy the solution with AWS CloudFormation.
Les Chan is a Sr. Partner Solutions Architect at Amazon Web Services. He helps AWS Partners enable their AWS technical capacities and build solutions around AWS services. His expertise spans application architecture, DevOps, serverless, and machine learning.
Daniel Duplessis is a Sr. Partner Solutions Architect at Amazon Web Services, based out of Toronto. He helps AWS Partners and customers in enterprise segments build solutions using AWS services. His favorite technical domains are serverless and machine learning.