Guides

Mislabeled Images/Objects

What Are Mislabeled Images/Objects?

Mislabeled Images refer to Images that have been assigned incorrect or inaccurate labels or annotations. When preparing a dataset for computer vision tasks, such as Image classification or Object detection, human annotators or labeling algorithms are often used to assign labels to the Images. However, mistakes can occur during this labeling process, resulting in mislabeled Images.

There are several reasons mislabeling may occur:

  1. Human error: Human annotators may make mistakes while manually labeling Images. They might misinterpret the content of the Image, confuse similar Objects or classes, or accidentally assign the wrong label due to oversight or lack of expertise.

  2. Ambiguous Images: Some Images can be inherently ambiguous, making it difficult for annotators to assign a definitive label. In such cases, different annotators may have different interpretations. This can lead to inconsistent or incorrect labels.

  3. Algorithmic error: In some cases, automated algorithms or machine learning models are used for labeling Images. If the algorithm has limitations or insufficient or biased data, it can produce inaccurate labels, resulting in mislabeled Images.

  4. Evolving knowledge: In certain domains, the understanding of objects or classes may change over time. If the labeling process is not updated to reflect the latest knowledge, previously labeled Images may become mislabeled.

Why Is This a Pain?

Mislabeled Images can be a pain for several reasons:

  1. Training data quality: Mislabeled Images can significantly affect the quality of the training data used in machine learning models. During the training phase, models learn from labeled examples to make predictions on unseen data. If the labels are incorrect or inaccurate, the model will learn from flawed information, leading to poor performance and unreliable results.

  2. Bias and skewed insights: Mislabeling can introduce bias into the training data, leading to biased models. For example, if Images of certain demographics or objects are consistently mislabeled, it can impact the model's ability to recognize or classify them accurately. This can perpetuate biases and unfairness in automated systems, such as facial recognition or object detection.

  3. Resource wastage: Training machine learning models is a resource-intensive process that requires a significant amount of computational power, time, and effort. Mislabeled Images waste these valuable resources. Models trained on mislabeled data might produce subpar results and necessitate retraining or additional data collection efforts, which can be both costly and time-consuming.

  4. Impact on downstream applications: Mislabeled Images can have a cascading effect on downstream applications, which rely on accurate data. For instance, in medical imaging, mislabeling Images could lead to incorrect diagnoses or treatment plans. In autonomous vehicles, mislabeled Images can affect object recognition, potentially compromising safety on the road.

  5. User experience and trust: Mislabeled Images in applications or services that directly interact with users can lead to a poor user experience. For example, if an Image recognition system consistently mislabels user-uploaded Images, it can erode trust and confidence in the system's capabilities, leading to user frustration and dissatisfaction.

Mislabeled Images vs Objects

Mislabeled Images: These are Images likely labeled with the wrong class (e.g., "Found 3 Images likely mislabeled as husky, possible corrections - Wolf, Coyote, Alaskan”).

Mislabeled Images can also be caused if you have multiple Objects in an Image labeled with a single label. For instance, you might have an Image showing a cat sitting on a table annotated as either "cat" or "table'"if both are valid Dataset classes.

Mislabeled Objects: These are Objects likely labeled with the wrong class (e.g., "Found 3 Objects likely mislabeled as husky, possible corrections - Wolf, Coyote, Alaskan”).

Unlike Image mislabels, Object mislabels can arise from incorrectly positioned or sized bounding boxes. This results in capturing partial, occluded, or multiple Objects.

Possible Mitigation

To mitigate these issues, it is crucial to have robust quality assurance processes in place when labeling Images for training data. Regularly validating and verifying labels, employing multiple annotators for cross-checking, and leveraging expert knowledge can help minimize mislabeling errors and improve the overall quality of the Dataset.

Another possible mitigation is to send Image(s) or Cluster(s) for re-labeling using Visual Layer.

Visual Layer offers multiple ways to simply find the mislabeled data:

  1. Find mismatches in labels using label filter
  2. Find mislabeled data using native Visual Layer auto-detection
  3. Use Visual Layer for data selection
  4. Use Visual Layer to export data for relabeling