Guides

πŸ” Visual Layer API: Textual Caption Search

Use the Visual Layer Exploration API to search your image dataset using natural language captions. This feature leverages semantic understanding to return images whose captions match your intentβ€”not just your keywords.


🧠 Overview

The GET /api/v1/explore/{dataset_id} endpoint, combined with the image_caption parameter, enables caption keyword search.

Example:

A search for "beach sunset" may return captions like:

  • Χ΄A man at the beach on sunset"
  • "Many sunsets and beaches"

In addition, adding quotation marks can make a search more specific, for example:

  • A search for red car will return results with any image of a car containing something red(red headlights, red paint patterns on the car)
  • A search for "red car" will return results of images containing a red car

Each result includes a relevance score indicating how well the caption semantically matches your query.


πŸ” Authentication

On-prem installations do not require authentication.
In case you have authentication all requests require a bearer token in the header:

Authorization: Bearer {your_api_token}

πŸ“₯ Textual Caption Search

Endpoint

GET /api/v1/explore/{dataset_id}

Required Query Parameters

NameTypeDescription
image_captionstringThe natural language caption query
thresholdintegerClustering similarity threshold (0–4)
entity_typestringEither IMAGES or OBJECTS
textual_similarity_thresholdfloatSets the minimum semantic match
quality for caption searches. Higher values (closer to 1.0) return only
highly relevant results, while lower values include more varied matches.
This filters results based on the relevance score (0.0-1.0)

πŸ§ͺ Example (cURL)

curl -H "Authorization: Bearer YOUR_TOKEN" \
"https://app.visual-layer.com/api/v1/explore/95233006-eddc-11ef-b303-76dbc3993eb2?threshold=0&image_caption=beach%20sunset&entity_type=IMAGES"

πŸ“¦ Response (Simplified)

{
  "clusters": [
    {
      "id": "...",
      "entity_type": "IMAGES",
      "size": 4,
      "representative_preview": "...",
      "media": [
        {
          "id": "...",
          "preview": "...",
          "filename": "sunset_beach_aerial.jpg",
          "caption": "Golden sun setting over the ocean waves",
          "relevance_score": 0.92
        }
      ]
    }
  ],
  "total_pages": 1,
  "current_page": 0
}

🧠 Understanding the Response

  • clusters: Groups of visually similar images
  • media: Media items that match your query
  • caption: Matched image description
  • relevance_score: Semantic similarity score (0.0–1.0)
  • preview: Image preview URL

Results are sorted by relevance_scoreβ€”most relevant first.


πŸ§ͺ Filtering and Refinement

Set a Minimum Similarity Threshold

...&textual_similarity_threshold=0.85

Combine with Labels and Tags

...&labels=[beach,sunset]&tags=7b89e36c-c2d1-4af9-9d23-f74e018e67c5

🐍 Python Client

import requests
from typing import Dict, List, Optional

class VisualLayerClient:
    def __init__(self, api_url: str, api_token: str):
        self.api_url = api_url
        self.headers = { 'Authorization': f'Bearer {api_token}' }

    def search_by_caption(self, dataset_id: str, caption_query: str, threshold: int = 0,
                          entity_type: str = 'IMAGES',
                          textual_similarity_threshold: Optional[float] = None,
                          page_number: int = 0,
                          labels: Optional[List[str]] = None,
                          tags: Optional[List[str]] = None) -> Dict:
        params = {
            'image_caption': caption_query,
            'threshold': threshold,
            'entity_type': entity_type,
            'page_number': page_number
        }
        if textual_similarity_threshold is not None:
            params['textual_similarity_threshold'] = textual_similarity_threshold
        if labels:
            params['labels'] = f"[{','.join(labels)}]"
        if tags:
            params['tags'] = "untagged" if tags == ["untagged"] else f"[{','.join(tags)}]"
        response = requests.get(f"{self.api_url}/api/v1/explore/{dataset_id}",
                                headers=self.headers, params=params)
        response.raise_for_status()
        return response.json()

    def extract_media_info(self, results: Dict) -> List[Dict]:
        media_items = []
        for cluster in results.get('clusters', []):
            for item in cluster.get('previews', []):
                media_items.append({
                    'id': item['media_id'],
                    'preview_url': item.get('media_uri'),
                    'caption': item.get('caption'),
                    'relevance_score': item.get('relevance_score'),
                    'label': item.get('label')
                })
        return media_items

Example Usage

client = VisualLayerClient(API_URL, API_TOKEN)
results = client.search_by_caption(dataset_id=DATASET_ID, caption_query="beach sunset", textual_similarity_threshold=0.75)
media = client.extract_media_info(results)

πŸ“š Advanced Techniques

Phrase Matching

image_caption="\"golden sunset\""

Compound Queries

image_caption=beach%20sunset%20without%20people

βœ… Best Practices

  1. Use specific queries for higher precision
  2. Set a similarity threshold to filter weaker matches
  3. Combine filters like tags and labels for refinement
  4. Use relevance_score to sort or prioritize results
  5. Visual clusters β‰  semantic groups – always use relevance for ranking

πŸ›  Pagination Helper

def get_all_caption_matches(client, dataset_id, caption_query, min_similarity=0.75):
    all_media = []
    page = 0
    total_pages = 1
    while page < total_pages:
        results = client.search_by_caption(dataset_id, caption_query,
                                           textual_similarity_threshold=min_similarity,
                                           page_number=page)
        all_media.extend(client.extract_media_info(results))
        total_pages = results.get('total_pages', 1)
        page += 1
    return all_media

🚨 Error Handling

def search_with_retries(client, dataset_id, caption_query, max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.search_by_caption(dataset_id, caption_query)
        except requests.exceptions.HTTPError as e:
            if e.response.status_code == 429:
                time.sleep((2 ** attempt) + random.uniform(0, 1))
            elif e.response.status_code == 401:
                raise Exception("Check your API token.")
            elif e.response.status_code == 404:
                raise Exception("Dataset not found.")
            else:
                raise

⚠️ Limitations

  • Maximum of 100 results per page
  • Thresholds must be β‰₯ 0.5
  • Caption quality affects match accuracy
  • Complex queries may yield unexpected results