Browse all available enrichment models in Visual Layer’s catalog and learn how each can enhance your dataset.
Model Name | Task Type | Description |
---|---|---|
VL-Object-Detector | Object Detection | Identifies and localizes objects with bounding boxes and class labels. |
VL-Image-Tagger | Multi-Class Classification | Applies multiple labels to the entire image for categorization and metadata generation. |
VL-Object-Captioner | Object to Text | Generates short captions describing individual objects in context. |
VL-Image-Captioner | Image to Text | Summarizes the scene or image with natural language. |
VL-Image-Semantic Search | Semantic Image Search | Enables conceptual search over images using natural language queries. |
VL-Object-Semantic Search | Semantic Object Search | Enables contextual search for specific objects based on semantics. |
NVILA-Lite-2B | Image-Text-to-Text (VQA) | Efficient VQA model for visual understanding tasks across multiple frames or images. |
Janus-Pro-1B | Image-Text-to-Text (VQA) | Autoregressive model for multi-modal reasoning and question answering. |