There are around two major areas where we can optimize over for many machine learning algorithms and deep learning networks. One is the hyperparameters, and the second is the neural network architecture.
The selection of the hyper-parameters is sometimes being so much critical in determining a model’s convergence. Optimization techniques are being used to determine what the best hyperparameters are for the learning algorithms. The optimal hyperparameters that correspond to a user-defined metric are being chosen. Most of the time, optimization is done to minimize the loss of the model within an accuracy. This process is known as hyper-parameter optimization.
Hyperparameter optimization is done over a fixed architecture, but we can also use to perform a technique which is known as Network Architecture Search, which used to perform a search over the architecture space of a Machine learning model or a deep learning network is the model still does not look to converge after the HPO alone.
Some of the top optimization techniques in the field of machine learning
So, now let us talk more about the various techniques that you can use to optimize the hyperparameters of your model.
Exhaustive search is the process of looking for the most optimal hyper-parameters by checking whether each and every candidate is a good match. You used to perform the same thing which you forget the code for your bike lock and try out all the possible options. In machine learning, we used to do the same thing, but the number of options is quite large mainly.
The exhaustive search method is so simple and easy. For instance, if you are working with a k-mean algorithm, then you will manually search for the right number of clusters. Moreover, if there are lots of options that you have to consider, then it becomes unbearably slow and heavy as well. This makes the brute force search inefficient in significant parts of real-life cases.
Gradient descent is the most common algorithm for model optimization for simply minimizing the error. In order to perform the gradient descent, then you have to iterate over the dataset training while readjusting the hybrid model.
Your goal is to minimize the cost function because it means that you will get the smallest possible error and improve the accuracy of the model.
On the graph, you can even see a graphical representation of how the gradient descent algorithm travels in the variable space. To simply get started, you also need to take a random point on the graph and choose a direction. If you see that at any point of time the error is getting larger, that means you chose the wrong direction.
When you are unable to improve anymore, the optimization is over, and you have found a local minimum.
It represents another approach to Machine Learning optimization. The principle lies behind the logic of these algorithms, which is an attempt to apply the theory of evolution to machine learning.
In the theory of the genetic algorithm, only those specimens get to survive and reproduce that have the best adaptation mechanisms. How do you know what specimens are and are not the best in the case of the machine learning models?
Just simply imagine you have a bunch of random algorithms at hand. This will be your large number of population. With the multiple models with some of the predefined hyperparameters, some are being adjusted than the others.