Supervised Learning refers to the application of past knowledge to the present data with the wisdom of known examples to estimate future phenomenon. Here, we have both the input and output values available to us and the algorithm helps us to match the input with the output.
Unsupervised Learning occurs when an algorithm has plain examples but no response with the input and hence needs to regulate the function path of the input by itself.
Reinforcement Learning is an algorithm method which by its actions and decisions carry results in its environment and the human world. With trial and error methods, the algorithm learns its best suitable approach method.
According to Scikit-Learn, Supervised learning has different algorithm mechanisms like – decision trees, support vector machines, generalised linear models, neutral networks and many such more. However, each of these methods do not explain their strengths and weakness.
In machine learning, you may have heard this common theorem called No Free Lunch. In layman’s terms, this states that no algorithm method is best for both general and special cases. An algorithm set for use in general purpose fails when used in special problems. On the same hand, an algorithm that runs successfully for special cases cannot be used in general ones. Thus, you should test as many algorithms as you can and at the end select only that one that gives the best performance with the program.
Here, instead of following the theoretically set methods of learning, we take a new approach called Machine Learning Task to categorise the different algorithms.
It is a supervised learning mechanism which estimates and predicts response by continuous variables as its output. It is basically a field of statistics and is used as an example to study machine learning.
As a subset of supervised learning, it has labelled examples that helps in its prediction process. The aim is find such an example that best suits the overall given input data. There are three types of Regression Algorithms –
Linear Regression – It is a common regression algorithm. It helps to demonstrate a relation between dependent variables and independent variables using a best fit straight line.
Strengths – This method is easy and simple to comprehend the regression mechanism and can be planned to elude overfitting. This linear representation of data can also be improved to satisfy new data easily by using stochastic gradient descent.
Weakness – The only disadvantage of this is when there is no linear relation between the variables. Also, the model is not adaptable to too much complex situations and using the best relation to describe the system can be hard and tricky at times.
Regression Trees – Also known as decision trees, here the given data set is partitioned into separate sets that increase the information gained from each new sub-set.
Strengths – The decision trees can also gain knowledge from non-linear relations. This method works well in practical cases and is also applicable for non-complex situations.
Weakness – Unconfined decision trees are disposed to overfitting since their bounds are not closed and the tree can keep on branching until it reaches an end point.
It is also a supervised learning task where the output predictions are discrete in nature. Examples like whether an email is spam or not; or whether an image is a pictures of cat or a picture of dogs. Algorithms for Classification are:
Logistic Regression – An equivalent of linear regression, it outputs value in binary nature (0 or 1) using a logistic function.
Strengths – Unlike linear regression, it can work on both linear and non-linear relationship between the input and output variables. The algorithm can also be planned to avoid overfitting.
Weakness – However, performance decreases if the function has too many non-linear relationships. Also, they are not adaptable to more complex situations.
Classification Tree – It is the counter part of regression trees, and is also known as decision trees.
Strengths – Like regression trees, these work well in practical cases and are strong and flexible.
Weakness – Same as regression trees.
It is a non-supervised learning mechanism to find groups or clusters of data from the given data set such that predictions of a single cluster are similar.
K- Means – Here, the data set is grouped on the basis of its features and the number of groups is K. the clusters are grouped around a centroid such that the object centroid distance is as small as possible.
Strengths – It is very simple, easy to understand, especially for beginners. It can also be adapted according to our given changes.
Weakness – The user needs to specify the number K which must be a positive integer. Also since centered around a centroid, the clusters must always be spherical in shape.
Hierarchical Clustering – Also known as agglomerative clustering, the process starts with a small cluster. This cluster is then merged with a similar cluster to form a bigger one and the process continues until a single cluster is left. Thus a hierarchy is formed in the clusters.
Strengths – As compared to K-means clustering, the clusters here need not be spherical in shape.
Weakness – The user needs to choose the levels of hierarchy that he/she will need for the application.
Support Vector Machines
Support vector machines (SVM) use a mechanism called kernels, which essentially calculate the distance between two observations. The SVM algorithm then finds a decision boundary that maximizes the space between the closest members of separate classes.
For example, an SVM with a linear kernel is similar to logistic regression. Therefore, in practice, the benefit of SVM’s typically comes from using non-linear kernels to model non-linear decision boundaries.
Strengths: SVM’s can model non-linear decision boundaries, and there are many kernels to choose from. They are also reasonably robust against overfitting, especially in high-dimensional space.
Weaknesses: However, SVM’s are memory intensive, trickier to tune due to the importance of picking the right kernel, and don’t scale well to larger datasets. Currently, in the industry, random forests are usually preferred over SVM’s.
Naive Bayes (NB) is a straightforward algorithm based around conditional probability and counting. Essentially, your model is actually a probability table that gets updated through your training data. To predict a new observation, you’d simply “look up” the class probabilities in your “probability table” based on its feature values.
It’s called “naive” because its core assumption of conditional independence (i.e., all input features are independent of one another) rarely holds true in the real world.
Strengths: Even though the conditional independence assumption rarely holds true, NB models actually perform surprisingly well in practice, especially for how simple they are. They are easy to implement and can scale with your dataset.
Weaknesses: Due to their sheer simplicity, NB models are often beaten by models adequately trained and tuned using the previous algorithms listed.
Support vector machines (SVM) utilize a mechanism known as kernels, which basically compute the distance between two observations. The SVM algorithm then locates a decision boundary that optimizes the space between the nearest members of different classes.
By way of instance, an SVM with a linear kernel is like logistic regression. Accordingly, in practice, the advantage of SVM’s usually comes in utilizing non-linear kernels to simulate non-linear decision bounds.
Strengths: SVM’s can simulate non-linear choice bounds, and there are numerous kernels to pick from. They’re also reasonably healthy against overfitting, particularly within high-dimensional space.
Weaknesses: But, SVM’s are memory intensive, so more difficult to tune as a result of the value of choosing the proper kernel, and do not scale well to larger datasets. Presently in the business, random forests are often favored over SVM’s.
Implementations: Python / R
Naive Bayes (NB) is a straightforward algorithm based on round conditional odds and counting. Basically, your version is really a probability table that gets upgraded through your training information. To forecast new monitoring, you would just “appear” the course probabilities on your “chance table” according to its characteristic values.
It is called “innocent” since its core premise of conditional independence (i.e., all entered attributes are separate from one another) seldom stays right in the actual world.
Strengths: Although the conditional independence assumption rarely holds true, NB versions really perform amazingly well in training, particularly for how easy they are. They’re simple to apply and will scale along with your dataset.
Weaknesses: Because of their utter ease, NB versions are frequently defeated by versions correctly trained and tuned with the prior algorithms recorded.