Skip to content
This repository was archived by the owner on Jun 15, 2023. It is now read-only.

Latest commit

 

History

History
29 lines (21 loc) · 4.41 KB

autopilot-automate-model-development.md

File metadata and controls

29 lines (21 loc) · 4.41 KB

Automate model development with Amazon SageMaker Autopilot

Amazon SageMaker Autopilot is a feature-set that automates key tasks of an automatic machine learning (AutoML) process. It explores your data, selects the algorithms relevant to your problem type, and prepares the data to facilitate model training and tuning. Autopilot applies a cross-validation resampling procedure automatically to all candidate algorithms when appropriate to test their ability to predict data they have not been trained on. It also produces metrics to assess the predictive quality of its machine learning model candidates. It simplifies your machine learning experience by automating these key tasks that constitute an AutoML process. It ranks all of the optimized models tested by their performance. It finds the best performing model that you can deploy at a fraction of the time normally required.

You can use Autopilot in different ways: on autopilot (hence the name) or with various degrees of human guidance, without code through Amazon SageMaker Studio or with code using one of the AWS SDKs. Autopilot currently supports regression and binary and multiclass classification problem types. It supports tabular data formatted as CSV or Parquet files in which each column contains a feature with a specific data type and each row contains an observation. The column data types accepted include numerical, categorical, text, and time series that consists of strings of comma-separate numbers. Autopilot supports building machine learning models on large datasets up to hundreds of GBs.

Autopilot also helps explain how models make predictions using a feature attribution approach developed for Amazon SageMaker Clarify. Autopilot automatically generates a report that indicates the importance of each feature for the predictions made by the best candidate. This explainability functionality can make machine learning models more understandable to AWS customers. The model governance report generated can be used to inform risk and compliance teams and external regulators.

You get full visibility into how the data was wrangled and how the models were selected, trained, and tuned for each of the candidates tested. This is provided by notebooks that Autopilot generates for each trial that contains the code used to explore the data and find the best candidate. The notebooks also provide educational tools to help you learn about and conduct your own ML experiments. You can learn about the impact of various inputs and trade-offs made in experiments by examining the various data exploration and candidate definition notebooks exposed by Autopilot. You can also conduct further experiments on the higher performing candidates by making your own modifications to the notebooks and rerunning them.

The following graphic outlines the principal tasks of an AutoML process managed by Autopilot.

[Overview of the AutoML process used by Amazon SageMaker Autopilot.]

With Amazon SageMaker, you pay only for what you use. You pay for the underlying compute and storage resources within SageMaker or other AWS services, based on your usage. For more information about the cost of using SageMaker, see Amazon SageMaker Pricing.

Topics