AutoGluon-Tabular is an open-source AutoML framework that requires only a single line of Python to train highly accurate machine learning (ML) models on an unprocessed tabular dataset. In this post, we walk you through a way of using AutoGluon-Tabular as a code-free AWS Marketplace product. We use this process to train and deploy a highly accurate ML model for a tabular prediction task.
Tabular data prediction, which includes both classification and regression, is the most prevalent class of prediction problems in business. If you’ve worked on this type of prediction problem before, you know that it’s a vast field with extreme diversity of data. Businesses want to build predictive models on top of data obtained through a wide array of sources, such as purchase histories, insurance claims, medical reports, and sensor readings streamed from IoT devices. This diversity has resulted in an enormous variety of modeling techniques.
Classical approaches have typically been dominated by domain expertise and careful, time-consuming feature engineering. However, if you follow data science competitions like those hosted by Kaggle, you may have noticed a transition happening. Lately, the most competitive approaches haven’t been encapsulated by domain experts with careful feature engineering, but instead by ML architecture experts, with large ensembles of models. Over time, the ML community has discovered that models that are worse in isolation are often superior in combination. This idea is sometimes known in other contexts as the diversity prediction theorem, or wisdom of the crowd. This effect is typically greatest when individual models are diverse and have errors in different ways.
This idea is at the core of AutoGluon-Tabular. AutoGluon-Tabular is designed to be straightforward, robust, efficient, accurate, and fault tolerant, returning to the latest checkpoint in the event of a failure. As a library, all the complexity has been abstracted away so that results can often be achieved with only three lines of code.
We’ve taken this one step further and launched AutoGluon-Tabular in the AWS Marketplace as one way of using AutoGluon-Tabular on AWS. It’s possible to build world-class models without a single line of code! In addition, you can take advantage of powerful Amazon SageMaker features. Amazon SageMaker is a fully managed service that provides every developer and data scientist with the ability to prepare build, train, and deploy machine learning models quickly. It makes it easy to deploy your trained model to production with a single click.
The following sections step you through how to use AutoGluon-Tabular in AWS Marketplace on the SageMaker console. If you want to use AutoGluon-Tabular in the AWS Marketplace in SageMaker notebooks, you can refer the following sample notebook.
We walk through the following steps:
The first step is to subscribe to AutoGluon-Tabular in AWS Marketplace.
You’re redirected to the SageMaker console.
To create a training job, complete the following steps:
We recommend using the m5 instance type and a volume size of more than 30 GB.
The minimum requirement is to set the name of the label column to predict.
When training is complete, you can create a model package.
To deploy your endpoint, complete the following steps:
To create a batch transform job, complete the following steps:
Finally, delete the endpoint when you’re done so you don’t incur further charges.
In this post, we walked you through how to train ML models and make predictions using AutoGluon-Tabular in AWS Marketplace via the SageMaker console. You can use this code-free solution to use the power of ML without any prior programming or data science expertise. Try it out and let us know how it goes in the comments!
Yohei Nakayama is a Deep Learning Architect at the Amazon ML Solutions Lab. He works with customers across different verticals to accelerate their use of artificial intelligence and AWS Cloud services to solve their business challenges. He is interested in applying ML/AI technologies to the space industry.
Austin Welch is a Data Scientist at the Amazon ML Solutions Lab, where he helps AWS customers across different industries accelerate their AI and cloud adoption.
Tatsuya Arai Ph.D. is a biomedical engineer turned deep learning data scientist on the Amazon ML Solutions Lab team. He believes in the true democratization of AI and that the power of AI shouldn’t be exclusive to computer scientists or mathematicians.