Machine learning optimization is an important part of all machine learning models. Whether used to classify an image in facial recognition software or cluster users into like-minded customer groups, all types of machine learning model will have undergone a process of algorithm optimization.

This post was originally published in Seldon.

In fact, machine learning itself can be described as solving an optimization problem, as an optimization algorithm the driving force behind most machine learning models. The iterative learning performed by these models are an example of the process of optimization. Models will learn and improve to minimum the degree of error within a loss function.

The aim of machine learning optimization is to lower the degree of error in a machine learning model, improving its accuracy at making predictions on data. Machine learning is generally used to learn the underlying relationship between input and output data, learned from a set of training data. When facing new data in a live environment, the model can use this learned approximated function to predict an outcome from this new data. For example, models are trained to perform classification tasks, labelling unseen data into learned categories. Another example is in machine learning regression tasks, which is the prediction of continuous outcomes like forecasting stock market trends.

In both examples, the aim of machine learning optimization is to minimize the degree of error between the real output and the predicted output. This is known as the loss function, a measurement of the difference between the real and predicted value. Machine learning optimization aims to minimize this loss function. Optimization algorithms are used to streamline this process beyond the capacity of any manual process. These algorithms use mathematical models to iteratively optimize a machine learning model.

This guide focuses mainly on the optimization of model hyperparameters using an optimization algorithm. Usually, the actual effects of different combinations of hyperparameters arenâ€™t known. Optimization algorithms are therefore leveraged to test and optimize combinations to create the most effective model configurations and learning processes. This guide explores the process of algorithm optimization, including what it is, why itâ€™s important, and examples of the different optimization algorithms used in practice.

**What is Algorithm Optimization For Machine Learning?**

Algorithm optimization is the process of improving the effectiveness and accuracy of a machine learning model, usually through the tweaking of model hyperparameters. Machine learning optimization uses a loss function as a way of measuring the difference between the real and predicted value of output data. By minimizing the degree of error of the loss function, the aim is to make the model more accurate when predicting outcomes or classifying data points.

Optimizing machine learning models usually focuses on tweaking the hyperparameters of the model. As the name suggests, machine learning is the process of a model or system learning from the data itself, often with very little human oversight. Hyperparameters are the elements of a model that are set by the developer or data scientist before training begins. These impact the learning process, and as a result can be tweaked to improve model efficiency.

An example of a hyperparameter would be the setting of the total amount of clusters or categories a model should classify data into. Other examples may be setting the rate of learning, or setting the structure of the model. Hyperparameters are configured before the model is trained, in contrast to model parameters which are found during the training phase of the machine learning lifecycle. It should be tweaked so the model can perform its given task in the most effective way possible.

Hyperparameter tuning or machine learning optimization aims to improve the effectiveness of the model, and minimize the aforementioned loss function. The power of optimization algorithms can be leveraged to find the most effective hyperparameter settings and configurations. Manually testing and tweaking hyperparameters would be a time-consuming task, which would prove impossible with black box models. Instead, optimization algorithms are used to select and assess the best possible combinations of hyperparameters.

**What Is The Need For Optimization Algorithms?**

The concept of hyperparameters in machine learning must first be clarified before understanding the need for optimization algorithms. The model training process deals with achieving the best possible model parameters. For example, during training the weightings of features within the model can be established. Hyperparameters on the other hand are set before the training process by the developer or data scientist. These are model parameters used to configure the overall learning process of the model.

Examples of hyperparameters include the setting of the learning rate or the count of clusters used by the model. Optimized hyperparameters are therefore vital as it ensures the model is at peak efficiency, reducing the loss function and improving the effectiveness of the model as a whole. Each hyperparameter can be tweaked or changed until the most optimum configuration is achieved. This means the model is as effective and accurate as possible.

Manual optimization of hyperparameters can take a huge amount of time and resources, as a data scientist must cycle through different combinations and configurations. Optimization algorithms are therefore used to streamline the process, effectively finding the optimum configuration of model hyperparameters. An optimization algorithm will work through many different iterations to find the optimum configuration of the model, beyond what is possible by a human.

Another common issue within black box machine learning is that it can often by impossible to understand the effect of hyperparameters on the wider model. In these cases, manual optimisation by a developer wouldnâ€™t be possible. Optimization algorithms are leveraged to improve model configurations even when derivatives are unknown.

The technique of cross validation is usually used to test these optimized hyperparameters on new and unseen data. The process sees the model process unseen testing data, indicating whether the model is overfit to the training data. This helps to gauge the modelâ€™s ability to generalize when facing new data, an important consideration for any machine learning model. As a result, optimization algorithms are an integral part of the machine learning process.

**What Are The Different Algorithm Optimization Techniques?**

There are many different approaches within optimization algorithms, with different variants of each technique. Although there are different techniques, the aim is generally the same: to minimize the loss or cost function. Through optimization, the difference between the estimated and real value is reduced. Optimization algorithms use different techniques to test and evaluate combinations of hyperparameters, to find the optimal configurations in terms of model performance. The algorithms are often used within the model itself to improve its effectiveness in light of its target function too.

A way of grouping the many different optimization algorithms is whether the derivative of the target function that is being optimized can be established. If the function is differentiable, the derivative can be used within the optimization algorithm as a valuable piece of extra information. The algorithm will use a derivative to improve the direction or focus of its optimization. But in some cases, derivatives may not be accessible or available. In other cases, noisy data may cause derivatives to become unhelpful. Derivative-free optimization techniques are used by optimization algorithms that avoid using derivatives altogether, using just the function values instead.

**Optimization Algorithms For Differentiable Functions**

For machine learning model functions that are differentiable, the functionâ€™s derivative can be leveraged during the optimization process. The derivative can inform the direction or selection of each iteration of hyperparameter combinations. The result is a much more focused search area. This can mean optimization algorithms can perform more effectively when compared to derivative-free optimization algorithms.

Common optimisation algorithms when the function is differentiable include:

- Gradient Descent
- Fibonacci Search
- Line search

**Gradient Descent**

Gradient descent is a common technique in machine learning optimization. The gradient is measured and multiplied by the learning rate hyperparameter, which is optimized to minimize the loss function. Itâ€™s a common approach within the training of machine learning algorithms too. Gradient descent is in the wider group of first-order algorithms, which use the gradient or first derivative to move through the search space.

**Fibonacci Search**

Fibonacci Search is a type of optimization technique in the wider group of bracketing algorithms. Itâ€™s generally used to find the minimum or maximum of values, and moves through the search area within a specific range or bracket. Each step in the sequence narrows the bracket of an optimum value, effectively narrowing the search area in each iteration. A similar technique is the golden-section search, which again narrows its boundaries in the search for an optimum value.

**Line Search**

The line search technique uses a descent direction to iteratively refine a target function. The approach performs a bracketed search along a line after the direction of movement is selected. Each iteration will be optimized against the target function until no more optimization is achievable. Itâ€™s part of a wider group of optimization algorithms called Local Descent Algorithms.

**Derivative-free Optimization Algorithms**

In some cases it can be challenging or impossible to identify derivative information of the machine learning modelâ€™s function. This could be down to a significant resource expense or if the data is particularly noisy so that derivatives arenâ€™t useful. In the case of black box machine learning, derivatives may be difficult to define or identify. They can also be difficult to establish in simulation-based learning environments.

Derivative-free optimization algorithms use only the values found in the objective functions. This approach is usually less effective or efficient compared to optimization algorithms that use derivatives. This is because the algorithm has less information to inform the optimization process.

Common examples of derivative-free optimisation algorithms include:

- Evolution algorithms
- Bayesian optimisation
- Random search

**Evolution Algorithms**

Evolution algorithms are a common approach when optimizing deep neural networks. The technique mirrors genetic or evolutionary selection processes to combine and assess hyperparameter combinations. Hyperparameters are combined through different iterations, with the most successful combinations forming each generation. Hyperparameters are combined, tested and evaluated, with each iteration informing the next round of testing. This way, each iteration becomes more and more optimized and effective, mirroring the process of natural selection.

**Bayesian Optimization**

Bayesian optimization is one of the most popular approaches for derivative-free optimization algorithms. The technique refines the hyperparameter combinations with each iteration, by focusing on combinations which best meet the target function. This approach avoids a resource or time-intensive blanket approach in which all combinations of hyperparameters may be combined and tested. Instead, Bayesian optimization is a sequence of refinements, with the model selecting hyperparameters with the most value to the target function.

**Random Searches**

Random searches is a straightforward and commonly used approach to machine learning optimization. Each hyperparameter configuration is randomly searched and combined to discover the most effective combinations. It can be used to discover emerging or new hyperparameter combinations because of the randomized nature of the search. If each hyperparameter configuration is mapped to a grid, random searches technique will randomly search and combine the values.

This is unlike Bayesian optimization, which is more focused in its approach when used in optimization algorithms. Random searches will usually be limited to a specific number of sequences or iterations. If left unlimited, random searches may take a huge amount of time to complete. Itâ€™s generally used to find the best combination of hyperparameters, avoiding the need for any manual checks.

**Machine Learning Deployment for Every Organization**

Seldon moves machine learning from POC to production to scale, reducing time-to-value so models can get to work up to 85% quicker. In this rapidly changing environment, Seldon can give you the edge you need to supercharge your performance.

With Seldon Deploy, your business can efficiently manage and monitor machine learning, minimise risk, and understand how machine learning models impact decisions and business processes. Meaning you know your team has done its due diligence in creating a more equitable system while boosting performance.

Deploy machine learning in your organizations effectively and efficiently. Talk to our team about machine learning solutions today.