Outlier images are visuals that differ significantly from the majority of your dataset. These anomalies may affect model performance, skew results, or reveal labeling issues.

Outliers typically fall into one of the following categories:

CategoriesDescription
Domain or Content OutliersImages that feel “out of place” compared to the rest of your data. From a different source, domain, or modality (e.g., a drawing in a photo dataset). Captured in unexpected conditions (e.g., extreme lighting, occlusions). Showing objects in rare or ambiguous contexts.
Quality OutliersImages that are technically flawed or inconsistent with the rest of the set. Blurry, overexposed, or too dark. Containing heavy compression artifacts or noise. Corrupted or visually incomplete.
Class (Label) OutliersImages that don’t semantically match any class in your label vocabulary. True outliers: e.g., an image of a sheep in a dog/cat dataset — it doesn’t belong to any existing class. Out-of-distribution examples: e.g., a sketch of a dog in a dataset of real dog photos. These are different from mislabels, where an image is mislabeled but still belongs to a known class.

Common Causes of Outliers

CauseDescription
Data collection errorsSamples from unrelated categories or domains may be incorrectly included.
Artifacts and anomaliesDistortions like blur, noise, or overexposure can make an image an outlier.
Rare instancesRare objects, edge-case events, or unconventional perspectives may introduce visual outliers.

Why It Matters

ProblemImpact
Reduced data qualityOutliers introduce noise and reduce consistency across your dataset.
Weaker model performanceModels trained on unfiltered outliers may generalize poorly or become unstable in production.
Hidden skewOutliers may distort validation results or inflate perceived class diversity.

How to Detect Outliers in Visual Layer

Visual Layer provides a one-click method for detecting and correcting outliers using automated issue detection.

  • Detect Outliers:
    Go to “Add Filter” → “Outliers” → select “IS” as the logic operator → set the desired confidence threshold (default is 0.5).
    Export the results using “Matching the applied filter.”

  • Correct Outliers:
    Go to “Add Filter” → “Outliers” → select “IS NOT” as the logic operator → set the desired confidence threshold (default is 0.5).
    Export the results using “Matching the applied filter.”

Managing outliers is an essential step in building reliable, balanced models.