> ## Documentation Index
> Fetch the complete documentation index at: https://docs.visual-layer.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Explore Model Catalog

> Browse all available enrichment models in Visual Layer's catalog and learn how each can enhance your dataset.

Visual Layer’s Enrichment Hub lets you generate high-value metadata using pre-trained models tailored for image and video datasets. These enrichment models can:

* Extract descriptive labels, captions, and tags
* Enable advanced semantic and object-level search
* Power downstream filtering, QA, and automation
* Improve annotation coverage and data understanding

## Available Enrichment Models

Visual Layer provides a range of built-in models designed for diverse enrichment tasks:

<div className="integrations-table">
  | Model Name                   | Task Type              | Description                                                                                                                                            |
  | ---------------------------- | ---------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------ |
  | **VL-Object-Detector**       | Object Detection       | Identifies and localizes objects within images or videos by drawing bounding boxes and classifying each detected object.                               |
  | **VL-Image-Tagger**          | Image Classification   | Assigns labels or tags to an entire image, categorizing its content for identification and analysis.                                                   |
  | **VL-Face-Detector**         | Face Detection         | Detects faces and extracts facial landmarks for accurate face alignment and recognition workflows.                                                     |
  | **VL-Image-Captioner**       | Image to Text          | Generates descriptive text that summarizes the content and context of the entire image input.                                                          |
  | **VL Advanced Captioner**    | Image to Text          | A state-of-the-art Vision-Language model that generates detailed captions and answers questions about image content (VQA).                             |
  | **VL-Object-Captioner**      | Object to Text         | Generates descriptive text that summarizes detected objects and their interactions in the image.                                                       |
  | **NVILA-Lite-2B**            | Image to Text          | A family of open VLMs designed to optimize both efficiency and accuracy for video understanding and multi-image tasks.                                 |
  | **VL-Image-Semantic-Search** | Semantic Image Search  | Enhances image search with conceptual queries, identifying content that matches search intent and improving discovery by understanding visual context. |
  | **Advanced-Object-Search**   | Semantic Object Search | Finds objects in images or videos based on meaning and context, beyond simple tags. Quickly retrieves relevant objects using natural language queries. |
  | **Radiology-Image-Search**   | Semantic Image Search  | Enhances image search with radiology understanding, improving discovery by understanding radiology images and terms.                                   |
</div>

<Note>
  Some models require pre-existing enrichments before they can be applied. These dependencies include:

  * **VL-Object-Captioner** requires **Object Detection** to be applied first.
  * **Semantic Search** models require captions or embeddings from a prior enrichment step.

  Labels may come from user annotations, the **VL-Object-Detector**, or the **VL-Image-Tagger**.
</Note>

## Coming Soon

These models are in development and will be available in the enrichment catalog:

<div className="integrations-table">
  | Model Name                | Task Type             | Description                                                                                                |
  | ------------------------- | --------------------- | ---------------------------------------------------------------------------------------------------------- |
  | **Nv-grounding dino**     | Object Detection      | An open vocabulary zero-shot object detection model with natural language prompts.                         |
  | **Advanced-Image-Search** | Semantic Image Search | Enhanced conceptual image retrieval using complex queries, identifying content that matches search intent. |
  | **yolov9**                | Object Detection      | Object detection model for fast and accurate bounding box predictions.                                     |
</div>

## Want Early Access?

<Card title="Get in Touch" href="mailto:support@visual-layer.com">
  Have questions or want to try out upcoming models early?
  [Contact us](mailto:support@visual-layer.com) to request access or learn more.
</Card>
