Pros and Cons of Unsupervised Learning

Machine learning is primarily of three types. It includes “Supervised learning”, “Unsupervised learning” and “Reinforcement learning”. This post is specifically limited to the depth of unsupervised learning.

After reading this article, you will be able to explain to someone in general – what is unsupervised learning, different methods which we can use under its headspace, few real-life use-cases of unsupervised learning, and lastly about its various advantages and disadvantages. Thus, it is going to be a perfect one-stop destination for you to start exploring this domain under Machine learning. Stay tuned.!

What Is Unsupervised Learning?

One of the striking differences between supervised and unsupervised learning is the presence of the target variable. In supervised learning, we know what we are predicting from the model. Thus, each record has a specific label to be predicted.

On the other hand, in unsupervised learning, we do not have a target variable. Each record is independent and has no label to lead them.

To simplify the concept, let us take an example.

Scenario 1: Let’s say you work in a bank and your manager walked to you this morning with an assignment. She told you that she has mailed a dataset. The dataset contains information about the loan takers and each record has a label yes (encoded as 1) and no (encoded as 0) which signifies their defaulter status. You need to use the dataset in order to create a model which will help us predict the possible default probability of each loan taker.

Here, since we know the end objective of the model can be measured in form of a label, it becomes a supervised learning task. In the end, each record is supervised by a label.

Scenario 2: Now let’s assume you are a very diligent employee and you are done with the first task fairly quickly. She again walks up to you with an assignment. This time she expects you to draw patterns from the data which will help to minimize any potential fraud.

This time, there is no single end objective of the model which you can measure using a label. There is no label against each record which makes it an unsupervised learning task.

Before we go to the next section, I have a fun fact for you. Do you know who are the natural unsupervised learners on this earth (yes, I am not talking about machines)? – its “Humans”!

We, humans, are blessed with a super-intelligent ability using which we can identify and distinguish any object. E.g. – let’s say you show five different breeds of a cat and a dog to a child. Further, you show an entirely new breed of cat to him. There is a very high possibility that he will be able to correctly identify the bread as a cat based on patterns that he has observed during the demonstration. Amazing, right?

Real-life Applications Of Unsupervised Learning

Machines are not that quick, unlike humans. It takes a lot of resources to train a model based on patterns in data. Below are a few of the wonderful real-life simulations of unsupervised learning.

Anomaly detection –The advent of technology and the internet has given birth to enormous anomalies in the past and is still growing. Unsupervised learning has huge scope when it comes to anomaly detection.
Segmentation – Unsupervised learning can be used to segment the customers based on certain patterns. Each cluster of customers is different whereas customers within a cluster share common properties. Customer segmentation is a widely opted approach used in devising marketing plans.

If you want to learn more about the real-world applications of unsupervised learning, check out this article.

Frequently Used Algorithms Under Unsupervised Learning

There is a wide range of algorithms that can be deployed under unsupervised learning. A few of them includes:

K-means clustering – It is one of the most sought-after clustering algorithms. It is widely used under customer segmentation, market profiling, RFM analysis, etc.
Principal component analysis – It is used as a dimensionality reduction technique. It then becomes much easier to establish one-to-one connections with variables when only a few of them are left at the end.
Hierarchical clustering – It is yet another form of clustering algorithm and is used when we want to split the clusters between a top-to-bottom or bottom-to-up approach.
Dendrogram – It is used to establish a one-to-one relationship between clusters and objects. It further helps to segregate objects based on individualistic patterns.

Pros Of Unsupervised Learning

A few of the advantages of unsupervised learning are:

It can see what human minds cannot visualize.
It is used to dig hidden patterns which hold utmost importance in the industry and has widespread applications in real-time.
The outcome of an unsupervised task can yield an entirely new business vertical or venture.
There is lesser complexity compared to the supervised learning task. Here, no one is required to interpret the associated labels and hence it holds lesser complexities.
It is reasonably easier to obtain unlabeled data.

Cons Of Unsupervised Learning

A few of the disadvantages of unsupervised learning are:

It is costlier as it might require human intervention to understand the patterns and correlate them with the domain knowledge.
It is not always certain that the obtained results will be useful since there is no label or output measure to confirm its usefulness.
One cannot accurately define the sorting and output of an unsupervised task. It is heavily dependent on the model and in-turn on the machine.
The results often have lesser accuracy.

Conclusion

In this article, we have seen a gentle introduction to the concept of unsupervised learning. It first presented us with what is an unsupervised learning algorithm and how it is different from a supervised learning approach, further it presented various real-life applications and models which can be deployed under unsupervised learning headspace.

In the end, it presented a few of the prominent pros and cons of unsupervised learning. Overall, it has presented an inch of the iceberg to you. Head to a machine learning repository over Kaggle.com and you will find ample projects to get you started with unsupervised learning.