When to use weakly supervised learning

Share this article

Are you wondering when you should use weakly supervised learning? Or maybe you want to learn more about the difference between weakly supervised learning and other paradigms like supervised learning? Then you are in the right place! In this section, we will discuss everything you need to know to understand when to use weakly supervised learning.

We will start out by discussing what weakly supervised learning is and what type of weakly supervised learning techniques exist. After this, we will discuss what type of data is required in order to employ weakly supervised learning techniques. Next, we will discuss some of the main advantages and disadvantages of weakly supervised learning. This will provide useful context that will inform future conversations about when weakly supervised learning should and should not be used.

Note. While weakly supervised learning is a large umbrella term that encompasses many model training paradigms, we will primarily focus on weakly supervised learning techniques for data with inaccurate labels (keep reading to learn what this means).

What is weakly supervised learning?

What is weakly supervised learning? Weakly supervised learning is a large umbrella term that is used to refer to a few different families of model training paradigms. In general, weakly supervised learning is used when you want to train a supervised machine learning model but the data that you have to train your model contains insufficient labels. There are a few different factors that can make the labels you have insufficient, ranging from not enough of the data having labels to the labels themselves being noisy and inaccurate.

Types of weakly supervised learning

In this section, we will provide information on a few different families of model training paradigms that fall under the umbrella of weakly supervised learning. Note that while all of these families of techniques are technically considered weakly supervised learning, we will focus more narrowly on one of these families (inaccurate labels) for the rest of the article.

  • Incomplete labels. Weakly supervised learning techniques that are focused on incomplete labels are designed to be used in situations where only some of your data is labeled. Specifically, these techniques are used in situations where only a small portion of your data has labels and the rest of your data does not have any labels. This is a common family of techniques, but we will not focus on it too much in this article because we have separate articles on the two main sub-families of techniques that are nested within the umbrella of weakly supervised learning for incomplete labels (those sub-families being semi supervised learning and active learning).
  • Inexact labels. Weakly supervised learning techniques for inexact labels are designed for cases when there are some labels available, but they are coarse labels that are not granular enough to be used directly in your model because you are typing to predict a more granular class. We will not focus on these in this article because they are the least commonly used in practice.
  • Inaccurate labels. Weakly supervised techniques for inaccurate labels are designed to be used when you have one or more sets of labels that can be applied to your data, but these labels are noisy and only some of them are correct. This is the family of techniques that we will focus on for the rest of the article. We will focus on this subset of techniques because it is common for people to use the term weakly supervised learning to refer specifically to this family of techniques.

What data is needed for weakly supervised learning?

What data is needed for weakly supervised learning. In this section, we will describe what type of data is needed in order to apply weakly supervised learning. In particular, we will focus on weakly supervised techniques that are used in situations where you have inaccurate labels.

In order to apply weakly supervised learning techniques for inaccurately labeled data, you need to have a dataset where all of the data is labeled with at least one label. We say at least one set of labels because some techniques that are used in these situations assume that there are multiple different sources of labels that can be applied to each record in the dataset. In these situations, you will have two or more labels for each record which could agree or disagree with one another. While all of the data does need to have some type of label applied to it, it is important to note that these labels do not have to be highly accurate. It is acceptable for these labels to be somewhat noisy and inaccurate.

Advantages and disadvantages of weakly supervised learning

What are some of the main advantages and disadvantages of weakly supervised learning? In this section, we will discuss some of the main advantages and disadvantages of weakly supervised learning techniques. As a reminder, we are going to focus on weakly supervised learning techniques that are used in situations where you have inaccurate labels.

Advantages of weakly supervised learning

Here are some of the main advantages of weakly supervised learning.

  • Does not require high quality labels. One of the main advantages of using weakly supervised learning techniques is that you do not need to have high quality labels that are perfectly accurate available. Weakly supervised learning methods can be applied in situations where all you have available is noisy labels that are sometimes inaccurate.
  • More cost effective. If you are in a situation where obtaining high quality labels for your data is time and resource intensive, then using techniques that can accommodate less than perfect labeled data will generally be more cost effective.

Disadvantages of weakly supervised learning

Here are some of the main disadvantages of weakly supervised learning.

  • Some types of labels are required. While it is true that you do not need to have high quality labels available for your data in order to employ weakly supervised learning techniques, you do need to have some type of labels available. These labels can be pulled from an existing source, or added after that fact as part of your training pipelines. Regardless, the labels do need to be added at some point.
  • Producing labels can be computationally intensive. If you do not have noisy labels readily available and you need to apply labels to your data before training your model, then your training pipelines will likely be more computationally intensive than they would be for simpler techniques like supervised learning. This is because there are additional steps that you need to take before you can train your model. If you are using a technique that requires you to train multiple labels per record, then the computational requirements will only increase.
  • Accuracy may be lower than if other paradigms were used. When you are training models using weakly supervised methods, you will often see that the accuracy of your final model is lower than it would otherwise have been if you were able to obtain high quality labels and train a model in a fully supervised fashion.
  • Higher complexity. Weakly supervised learning is more complex and less commonly understood than supervised learning techniques. That means it will be more difficult to get teammates up to speed on your projects and find people who can give you meaningful feedback on your projects.

When to use weakly supervised learning

When should you employ weakly supervised learning techniques on your dataset? Here are some examples of situations where it makes sense to employ weakly supervised learning techniques. As a reminder, we are going to focus on weakly supervised learning techniques for inaccurate labels.

  • When you have data with noisy labels that are sometimes inaccurate. In general, any time that you have data that has noisy labels that are correct some of the time but not all of the time then you should think about using weakly supervised learning techniques. This is the primary family of techniques that is designed to handle these situations.
  • When it is easy to generate noisy labels for unlabeled data. Even if your data does not come to you with noisy labels attached, that does not mean that weakly supervised learning cannot be employed. If you can think of an easy way (or a few easy ways) to generate some noisy labels for an unlabeled dataset, then you can apply weakly supervised learning techniques on top of those labels.

When not to use weakly supervised learning

When does it not make sense to use weakly supervised learning? In this section, we will provide examples of situations where it does not make much sense to use weakly supervised learning. As a reminder, we will focus on weakly supervised learning techniques for inaccurate labels.

  • When high quality labels are easy to obtain. If you are in a situation where it would be reasonably easy to obtain higher quality labels for your data, then it generally makes sense to use a standard supervised learning model. Supervised models will generally have better performance than weakly supervised models that were trained on the same amount of data.
  • When there are no labels available at all. If you are in a situation where there are no labels available for your data at all and it is not possible to generate labels using something like a heuristic, then you cannot use weakly supervised learning techniques. If this is the case, then you may be better off using techniques like self supervised learning that do not require any labels.

Related articles


Share this article

Leave a Comment

Your email address will not be published. Required fields are marked *