In this tutorial, we’ll learn what automated machine learning or AutoML is. Firstly, we’ll briefly introduce the concept of machine learning and its types. After that, we’re going to explain automated machine learning, the pros and cons of it, and what are the most popular tools that currently exist.
2. Machine Learning
Machine learning is a subfield of computer science and artificial intelligence. It enables algorithms to learn from data and make predictions without explicitly defining the rules by which the algorithms should work.
In summary, machine learning algorithms analyze data and learn patterns in the data to make predictions or decisions about it. Generally, it’s difficult or even impossible to solve particular problems without machine learning techniques.
For example, it’s almost impossible to define rules manually for solving image classification problems for recognizing whether an image is a cat or a dog. Despite that, machine learning algorithms solve this problem with high accuracy. Of course, to do that, we need to provide them with enough images so they can learn relevant patterns and features.
2.1. Types of Machine Learning
In short, we can classify machine learning into four types:
- Supervised learning: algorithms use labeled data or data where target values exist
- Unsupervised learning: algorithms use unlabeled data, and there are no target values
- Semisupervised learning: a combination of supervised and unsupervised where data is partially labeled
- Reinforcement learning: algorithms where agents interact with the environment and learn rules by receiving rewards or punishments for their actions
3. Automated Machine Learning
During the development of machine learning projects, several steps are commonly executed for many problems.
For example, the main ingredient of an ML project is data. Data needs to be collected, preprocessed, and prepared into the format the ML algorithm processes. It might include handling missing and anomalous values, transforming them into numerical values, executing feature engineering, and performing feature selection, among other steps.
Usually, machine learning algorithms have several hyperparameters that we need to tune. Moreover, to find the best solution, we typically try various machine models or construct some ensemble using a few of them.
Once we train and develop the machine learning model, we deploy it in the production environment. At last, we can deploy extra components like monitoring and various triggers to help with the model maintenance.
The goal of automated machine learning (AutoML) is to automatize all steps that we have mentioned. Thus, it can help non-experts to start using machine learning algorithms with minimal knowledge in the field. Furthermore, it assists experienced ML developers in automating repetitive tasks.
In short, we look for one button or function that automatically solves as many processes as possible in the development of machine learning projects.
4. Pros and Cons of AutoML
In industry, many machine learning projects don’t require experts to develop them. In many cases, it doesn’t matter if the results are a few percent lower than the optimal solution. The project is successful as long as it satisfies the minimum requirements. Even if developers don’t know how it works or they consider it a black box. In such situations, AutoML is a great tool that can save a lot of resources.
Usually, we don’t need to build a project from scratch in the industry. In several cases, we use high-level programming libraries or tools that solve some tasks. More or less, these tools provide AutoML functionalities. But also, they leave an option to dig deeper into the code logic and modify some steps.
Such AutoML tools might be very convenient for experienced ML practitioners and save them much time.
In contrast to that, some AutoML tools are the real “black boxes” because they don’t provide options to see how they operate under the hood. Some don’t allow users to modify or create custom utilities. These tools might be limited to some set of algorithms or techniques that won’t work well with any problem.
The cost of efficient AutoML solutions can increase significantly as the size of the project expands. Additionally, some tools might be misleading by convincing beginners that they can become professionals without a deeper understanding of the ML field.
5. Top 5 Tools for AutoML
In this section, we’ll present some popular AutoML tools.
Scikit-learn (Sklearn) is one of the most popular Python machine-learning libraries. Auto-Sklearn is built on top of Sklearn and automatically performs algorithms selection and hyperparameter tuning. The image next summarizes the operation of such a tool:
Auto-Sklearn employs a predefined set of machine learning algorithms and features preprocessing techniques, but it is also possible to select which one we want to use. Moreover, we can define do we want an ensemble model and how many models this ensemble should have. At last, it allows users to implement and add new models or feature preprocessing techniques.
PyCaret is an open-source, low-code ML library in Python that automates machine-learning workflows. Using PyCaret, with several lines of code is possible to create and analyze ML models that solve problems of:
- Anomaly detection
- Time series forecasting
PyCaret has functions that allow deploying the entire ML pipeline on the cloud, including AWS S3, GCP, and Azure. Also, with a few lines of code is possible to create and run a POST API for inference on top of FastAPI, as presented in the following figure:
5.3. H2O AutoML
H2O AutoML is an automated ML tool that allows users to build machine learning pipelines using simple R, Python, or web GUI interfaces. It automates some tasks, including:
- Data preprocessing
- Model selection, tuning, and building ensembles
- Explainability of models.
- Deployment and scaling to clusters using Hadoop, Spark, and Kubernetes
Without any programming knowledge, selecting all the parameters needed for the ML pipeline is possible by simply clicking through the web GUI interface as shown in figure next:
5.4. Amazon SageMaker Autopilot
Amazon SageMaker Autopilot provides an AutoML cloud service that allows users to automate the end-to-end process of building, training, and deploying ML models without coding knowledge. This tool solves classification and regression problems.
It automatically analyzes data, builds features, performs hyperparameter tuning, and generates multiple models. These models are ranked based on performance metrics. We can select any of the models and check the model explainability notebook.
In addition to that, there is an option for deploying the model to the desired machine. The only thing that we need to do is to upload our data into the AWS S3 bucket and select a few settings for a building process.
The following figure summarizes the Amazon SageMaker Autopilot capabilities:
AutoKeras is an AutoML library for deep learning. It’s developed on top of Keras, the high-level API of TensorFlow 2. This library supports several tasks with a simple interface, including:
- Tabular data classification and regression
- Image classification and regression
- Text classification and regression
- Time series forecasting
They have also announced that functionalities for object detection and image segmentation will be implemented in the future. We have some of AutoKeras’ functionalities presented in the following figure:
In this article, we’ve explained what the term AutoML means as well as its advantages and disadvantages. Also, we’ve presented some of the most popular AutoML tools.
Overall, there are a lot of great tools in AutoML. They can save us lots of resources and help experts and novice ML developers.
But in practice, ML tasks are very diverse. It’s impossible to create an AutoML tool that can solve every ML project from start to end. Furthermore, every AutoML tool has limitations. Thus, even with AutoML, we need experienced ML developers to interpret the created models and results.