Modern machine learning algorithms: strengths and weaknesses
Machine Learning is a basic sub-part of Artificial Intelligence (AI) that allows devices and computers the ability to learn and gain better experience without any external help or direct programming instructions.
On basic terms, algorithms for Machine Learning are classified into three groups – supervised learning, unsupervised learning and reinforced learning.
- Supervised Learning refers to the application of past knowledge to the present data with the wisdom of known examples to estimate future phenomenon. Here, we have both the input and output values available to us and the algorithm helps us to match the input with the output.
- Unsupervised Learning occurs when an algorithm has plain examples but no response with the input and hence needs to regulate the function path of the input by itself.
- Reinforcement Learning is an algorithm method which by its actions and decisions carry results in its environment and the human world. With trial and error methods, the algorithm learns its best suitable approach method.
According to Scikit-Learn, Supervised learning has different algorithm mechanisms like – decision trees, support vector machines, generalised linear models, neutral networks and many such more. However, each of these methods do not explain their strengths and weakness.
In machine learning, you may have heard this common theorem called No Free Lunch. In layman’s terms, this states that no algorithm method is best for both general and special cases. An algorithm set for use in general purpose fails when used in special problems. On the same hand, an algorithm that runs successfully for special cases cannot be used in general ones. Thus, you should test as many algorithms as you can and at the end select only that one that gives the best performance with the program.
Here, instead of following the theoretically set methods of learning, we take a new approach called Machine Learning Task to categorise the different algorithms.
It is a supervised learning mechanism which estimates and predicts response by continuous variables as its output. It is basically a field of statistics and is used as an example to study machine learning.
As a subset of supervised learning, it has labelled examples that helps in its prediction process. The aim is find such an example that best suits the overall given input data. There are three types of Regression Algorithms –
- Linear Regression – It is a common regression algorithm. It helps to demonstrate a relation between dependent variables and independent variables using a best fit straight line.
- Strengths – This method is easy and simple to comprehend the regression mechanism and can be planned to elude overfitting. This linear representation of data can also be improved to satisfy new data easily by using stochastic gradient descent.
- Weakness – The only disadvantage of this is when there is no linear relation between the variables. Also, the model is not adaptable to too much complex situations and using the best relation to describe the system can be hard and tricky at times.
- Regression Trees – Also known as decision trees, here the given data set is partitioned into separate sets that increase the information gained from each new sub-set.
- Strengths – The decision trees can also gain knowledge from non-linear relations. This method works well in practical cases and is also applicable for non-complex situations.
- Weakness – Unconfined decision trees are disposed to overfitting since their bounds are not closed and the tree can keep on branching until it reaches an end point.
It is also a supervised learning task where the output predictions are discrete in nature. Examples like whether an email is spam or not; or whether an image is a pictures of cat or a picture of dogs. Algorithms for Classification are:
- Logistic Regression – An equivalent of linear regression, it outputs value in binary nature (0 or 1) using a logistic function.
- Strengths – Unlike linear regression, it can work on both linear and non-linear relationship between the input and output variables. The algorithm can also be planned to avoid overfitting.
- Weakness – However, performance decreases if the function has too many non-linear relationships. Also, they are not adaptable to more complex situations.
- Classification Tree – It is the counter part of regression trees, and is also known as decision trees.
- Strengths – Like regression trees, these work well in practical cases and are strong and flexible.
- Weakness – Same as regression trees.
It is a non-supervised learning mechanism to find groups or clusters of data from the given data set such that predictions of a single cluster are similar.
- K- Means – Here, the data set is grouped on the basis of its features and the number of groups is K. the clusters are grouped around a centroid such that the object centroid distance is as small as possible.
- Strengths – It is very simple, easy to understand, especially for beginners. It can also be adapted according to our given changes.
- Weakness – The user needs to specify the number K which must be a positive integer. Also since centered around a centroid, the clusters must always be spherical in shape.
- Hierarchical Clustering – Also known as agglomerative clustering, the process starts with a small cluster. This cluster is then merged with a similar cluster to form a bigger one and the process continues until a single cluster is left. Thus a hierarchy is formed in the clusters.
- Strengths – As compared to K-means clustering, the clusters here need not be spherical in shape.
- Weakness – The user needs to choose the levels of hierarchy that he/she will need for the application.