Running machine learning (ML) experiments in the cloud can span across many services and components. The ability to structure, automate, and track ML experiments is essential to enable rapid development of ML models. With the latest advancements in the field of automated machine learning (AutoML), namely the area of ML dedicated to the automation of ML processes, you can build accurate decision-making models without needing deep ML knowledge. In this post, we loo at AutoGluon, an open-source AutoML framework that allows you to build accurate ML models with just a few lines of Python.
AWS offers a wide range of services to manage and run ML workflows, allowing you to select a solution based on your skills and application. For example, if you already use AWS Step Functions to orchestrate the components of distributed applications, you can use the same service to build and automate your ML workflows. Other MLOps tools offered by AWS include Amazon SageMaker Pipelines, which enables you to build ML models in Amazon SageMaker Studio with MLOps capabilities (such as CI/CD compatibility, model monitoring, and model approvals). Open-source tools, such as Apache Airflow—available on AWS through Amazon Managed Workflows for Apache Airflow—and KubeFlow, as well as hybrid solutions, are also supported. For example, you can manage data ingestion and processing with Step Functions while training and deploying your ML models with SageMaker Pipelines.
In this post, we show how even developers without ML expertise can easily build and maintain state-of-the-art ML models using AutoGluon on Amazon SageMaker and Step Functions to orchestrate workflow components.
After an overview of the AutoGluon algorithm, we present the workflow definitions along with examples and a code tutorial that you can apply to your own data.
AutoGluon is an open-source AutoML framework that accelerates the adoption of ML by training accurate ML models with just a few lines of Python code. Although this post focuses on tabular data, AutoGluon also allows you to train state-of-the-art models for image classification, object detection, and text classification. AutoGluon tabular creates and combines different models to find the optimal solution.
The AutoGluon team at AWS released a paper that presents the principles that structure the library:
For more details about the algorithm, refer to the paper released by the AutoGluon team at AWS.
After you install the AutoGluon package and its dependencies, training a model is as easy as writing three lines of code:
from autogluon.tabular import TabularDataset, TabularPredictor train_data = TabularDataset(‘s3://my-bucket/datasets/my-csv.csv’) predictor = TabularPredictor(label=”my-label”, path=”my-output-folder”).fit(train_data)
The AutoGluon team proved the strength of the framework by reaching the top 10 leaderboard in multiple Kaggle competitions.
We use Step Functions to implement an ML workflow that covers training, evaluation, and deployment. The pipeline design enables fast and configurable experiments by modifying the input parameters that you feed into the pipeline at runtime.
You can configure the pipeline to implement different workflows, such as the following:
The solutions consist of a general state machine (see the following diagram) that orchestrates the set of actions to be run based on a set of input parameters.
The steps of the state machine are as follows:
A set of input parameter samples is available on the GitHub repo.
The state machine for training a new ML model using AutoGluon is comprised of two steps, as illustrated in the following diagram. The first step is a SageMaker training job that creates the model. The second saves the entries in the SageMaker model registry.
You can run these steps either automatically as part of the main state machine, or as a standalone process.
Let’s now look at the state machine dedicated to the deployment phase (see the following diagram). As mentioned earlier, the architecture supports both online and offline deployment. The former consists of deploying a SageMaker endpoint, whereas the latter runs a SageMaker batch transform Job.
The implementation steps are as follows:
This post presents an easy-to-use pipeline to orchestrate AutoML workflows and enable fast experiments in the cloud, allowing for accurate ML solutions without requiring advanced ML knowledge.
We provide a general pipeline as well as two modular ones that allow you to perform training and deployment separately if needed. Moreover, the solution is fully integrated with SageMaker, benefitting from its features and computational resources.
Get started now with this code tutorial to deploy the resources presented in this post into your AWS account and run your first AutoML experiments.
Federico Piccinini is a Deep Learning Architect for the Amazon Machine Learning Solutions Lab. He is passionate about machine learning, explainable AI, and MLOps. He focuses on designing ML pipelines for AWS customers. Outside of work, he enjoys sports and pizza.
Paolo Irrera is a Data Scientist at the Amazon Machine Learning Solutions Lab, where he helps customers address business problems with ML and cloud capabilities. He holds a PhD in Computer Vision from Telecom ParisTech, Paris.