When to use semi supervised learning

Share this article

Are you wondering what semi supervised learning is? Or maybe you want to learn more about when you should and should not use semi supervised learning? Then you are in the right place! In this article, we discuss everything you need to know to understand when you should use semi supervised learning.

We will start out by explaining what semi supervised learning is and providing some examples of common types of semi supervised learning techniques. After that, we will discuss what data needs to be used in order to implement semi supervised learning. After that, we will discuss some of the main advantages and disadvantages of semi supervised learning. This will provide context for the final section of this article where we provide information on when you should and should not use semi supervised learning.

What is semi supervised learning?

What is semi supervised learning? Semi supervised learning is a model training paradigm that can be used when you have a small set of data that has labels and a much larger set of data that does not have labels. It is generally used in situations where the labeled dataset is not large enough to be able to train a model on. The idea behind semi supervised learning is that the labeled data and the unlabeled data are used in conjunction to create a model that is better than what could be created with just the labeled data.

It is also important to mention that semi supervised learning is a subset of weakly supervised learning. Weakly supervised learning is a large umbrella that covers a variety of techniques that can be applied when there are defects in your labeled data. Semi supervised techniques are applied when the defect in your labeled data is that only some of the data is labeled. Semi supervised learning and active learning are the main types of weakly supervised learning techniques that can be applied in this situation.

Common strategies employed in semi supervised learning

What are some examples of common strategies that are employed in semi supervised learning? In this section, we will describe some families of techniques that are commonly employed in semi supervised learning. Note that this is not a comprehensive list of all semi supervised learning techniques that exist, but rather a preview of the most common types of techniques.

  • Cluster-then-label. Many techniques that are applied in semi supervised learning take the general form of clustering the data using an unsupervised model then applying labels to the data in those clusters using the labeled data. There are many different techniques that can be employed on this front that use different types of clustering algorithms. Some of these models use simple distance-based clustering models, whereas others use more advanced graph based clustering models.
  • Pseudo-labeling: Another family of techniques, often referred to as pseudo labeling techniques, involves using the labeled data to train a small supervised model. Once that model has been trained, it is used to predict labels for the larger portion of the dataset that does not have any labels.

What data is needed for semi supervised learning?

What data is needed for semi supervised learning? In order to apply semi supervised learning techniques, you need two different types of data available. First, you need a large selection of unlabeled data that does not have class labels that can be used in the model you want to build. Second, you need a smaller selection of labeled data that does contain labels that can be used in the model you want to build.

Advantages and disadvantages of semi supervised learning

What are some of the main advantages and disadvantages of semi supervised learning techniques? In this section, we will describe some of the most common advantages and disadvantages of semi supervised learning.

Advantages of semi supervised learning

Here are some of the main advantages of semi supervised learning.

  • Does not require a large labeled dataset. The main advantage of semi supervised learning is that it can be used even in cases where you do not have large amounts of labeled data. All you need is a relatively small amount of labeled data that can be used to guide your models.
  • Does not require iterative human feedback. Another advantage of semi supervised learning is that it does not require human feedback to be provided at multiple steps in the modeling process. This stands in contrast to other model training regimes that can be applied when only a small portion of your data is labeled, such as active learning.
  • More cost effective. If you are in a situation where obtaining high quality labels for your data is time and resource intensive, then using techniques that can accommodate less than perfect labeled data will generally be more cost effective.

Disadvantages of semi supervised learning

Here are some of the main disadvantages of semi supervised learning.

  • Requires some labeled data. One of the main disadvantages of semi supervised learning is that you do need to obtain a reasonable amount of labeled data. This can be difficult if you are operating in a situation where it is difficult to obtain even a small amount of labeled data. If you are operating in a situation like this, then you may be better off using a training paradigm where no labeled data is required.
  • May be computationally intensive to train. Since semi supervised learning often involves developing a strategy to use your labeled training data to apply labels to the unlabeled training data, training semi supervised models is a multi-step activity that may be computationally intensive. Depending on how you are implementing this, the initial step that applies labels to the unlabeled training data may require more computational resources than training the final model itself.
  • Accuracy may be lower than what is achievable with other paradigms. Another disadvantage of semi supervised learning is that the accuracy of your final model may be lower than what is achievable with other paradigms like supervised learning (assuming that there is a way to obtain lots of labeled data).
  • Many techniques have strict assumptions. Another disadvantage of semi supervised learning techniques is that many of them have strict assumptions that must be met in order for the techniques to work appropriately. This is especially true for techniques that rely on clustering data as a first step.
  • Higher complexity. Weakly supervised learning is more complex and less common than standard supervised learning techniques. That means it will be more difficult to get teammates up to speed and find people who can give you meaningful feedback on your work.

When to use semi supervised learning

What situations does it make sense to use semi supervised learning in? In this section, we will provide some examples of situations where it does make sense to use semi supervised learning.

  • When getting a small amount of labeled data is easy, but scaling up labeling efforts is difficult. In general, it makes sense to use semi supervised learning if you are in a situation where getting a small amount of labeled data is easy, but scaling up data labeling efforts to label all of your data is difficult. Semi supervised learning allows you to exploit all of the data that is available to you without requiring that all of the data be labeled.
  • When getting iterative feedback from humans is difficult. It makes sense to use semi supervised learning in particular when you are in a situation where it is not easy to give the model periodic feedback and provide additional examples of labeled training data as you iterate on it. If you are in a situation where it is easy to give iterative feedback to your model, then it may make sense to employ techniques like active learning where a human helps to guide the model in the right direction.

When not to use semi supervised learning

When does it not make sense to employ semi supervised learning? Here are some examples of situations where it does not make sense to employ semi supervised learning.

  • When getting any amount of labeled data is difficult. When you are using semi supervised learning, it is important that you have a small set of labeled data that does have gold star labels. If you are in a situation where it is difficult to get any amount of labeled data, then you may be better off opting for methods that do not require any labeled data like self supervised learning.
  • When there is not a lot of unlabeled data available. Semi supervised learning techniques work best in situations where there is a lot of unlabeled data available. They may not work as well on small datasets. If you are in a situation where there is not a large amount of unlabeled data available, it often makes sense to use a small supervised learning model that is less complex on the labeled data.
  • When labeled data is easy to obtain. Sometimes there are already labels that are available to be used for your data. Other times labels are not directly available, but it is easy to obtain labels. If you are in a situation where it is easy to obtain labels for your data, then it generally makes sense to use a standard supervised learning model. Supervised models will generally have better performance than semi supervised models that were trained on the same amount of data.

Related articles


Share this article

Leave a Comment

Your email address will not be published. Required fields are marked *