Learn what outliers are, why they affect dataset quality, and how to detect and manage them using Visual Layer.
Categories | Description |
---|---|
Domain or Content Outliers | Images that feel “out of place” compared to the rest of your data. From a different source, domain, or modality (e.g., a drawing in a photo dataset). Captured in unexpected conditions (e.g., extreme lighting, occlusions). Showing objects in rare or ambiguous contexts. |
Quality Outliers | Images that are technically flawed or inconsistent with the rest of the set. Blurry, overexposed, or too dark. Containing heavy compression artifacts or noise. Corrupted or visually incomplete. |
Class (Label) Outliers | Images that don’t semantically match any class in your label vocabulary. True outliers: e.g., an image of a sheep in a dog/cat dataset — it doesn’t belong to any existing class. Out-of-distribution examples: e.g., a sketch of a dog in a dataset of real dog photos. These are different from mislabels, where an image is mislabeled but still belongs to a known class. |
Cause | Description |
---|---|
Data collection errors | Samples from unrelated categories or domains may be incorrectly included. |
Artifacts and anomalies | Distortions like blur, noise, or overexposure can make an image an outlier. |
Rare instances | Rare objects, edge-case events, or unconventional perspectives may introduce visual outliers. |
Problem | Impact |
---|---|
Reduced data quality | Outliers introduce noise and reduce consistency across your dataset. |
Weaker model performance | Models trained on unfiltered outliers may generalize poorly or become unstable in production. |
Hidden skew | Outliers may distort validation results or inflate perceived class diversity. |