How This Helps

Caption Search enables you to retrieve relevant media using natural phrases like “beach sunset” or “red car”—not just exact keyword matches.


Overview

The GET /api/v1/explore/{dataset_id} endpoint enables semantic caption search using the image_caption parameter.

This type of search returns results based on intent and meaning, not just exact text.

Example

A query for "beach sunset" may return:

  • “A man at the beach on sunset”
  • “Golden sun setting over the ocean waves”

Searching for:

  • red car — matches loosely related red items and cars
  • "red car" — matches images of an actual red car

Each result includes a relevance score that quantifies the quality of the match (0.0–1.0).


Authentication

All API calls require a bearer token:

Authorization: Bearer {your_api_token}

Endpoint

GET /api/v1/explore/{dataset_id}

Required Parameters

NameTypeDescription
image_captionstringThe search query (e.g. "beach sunset")
thresholdintegerClustering threshold (0–4)
entity_typestringMust be IMAGES or OBJECTS
textual_similarity_thresholdfloatScore cutoff (0.0–1.0) for relevance

Example (cURL)

curl -H "Authorization: Bearer YOUR_TOKEN" \
"https://app.visual-layer.com/api/v1/explore/95233006-eddc-11ef-b303-76dbc3993eb2?threshold=0&image_caption=beach%20sunset&entity_type=IMAGES"

Response Example

{
  "clusters": [
    {
      "id": "...",
      "entity_type": "IMAGES",
      "size": 4,
      "representative_preview": "...",
      "media": [
        {
          "id": "...",
          "preview": "...",
          "filename": "sunset_beach_aerial.jpg",
          "caption": "Golden sun setting over the ocean waves",
          "relevance_score": 0.92
        }
      ]
    }
  ],
  "total_pages": 1,
  "current_page": 0
}

Response Breakdown

  • clusters: Visual similarity groups
  • media: Individual results
  • caption: Caption match
  • relevance_score: Quality of the match
  • preview: Thumbnail preview

Filter and Refine

Add Similarity Threshold

...&textual_similarity_threshold=0.85

Filter by Tags and Labels

...&labels=[beach,sunset]&tags=7b89e36c-c2d1-4af9-9d23-f74e018e67c5

Python Client

import requests
from typing import Dict, List, Optional

class VisualLayerClient:
    def __init__(self, api_url: str, api_token: str):
        self.api_url = api_url
        self.headers = { 'Authorization': f'Bearer {api_token}' }

    def search_by_caption(self, dataset_id: str, caption_query: str, threshold: int = 0,
                          entity_type: str = 'IMAGES',
                          textual_similarity_threshold: Optional[float] = None,
                          page_number: int = 0,
                          labels: Optional[List[str]] = None,
                          tags: Optional[List[str]] = None) -> Dict:
        params = {
            'image_caption': caption_query,
            'threshold': threshold,
            'entity_type': entity_type,
            'page_number': page_number
        }
        if textual_similarity_threshold is not None:
            params['textual_similarity_threshold'] = textual_similarity_threshold
        if labels:
            params['labels'] = f"[{','.join(labels)}]"
        if tags:
            params['tags'] = "untagged" if tags == ["untagged"] else f"[{','.join(tags)}]"
        response = requests.get(f"{self.api_url}/api/v1/explore/{dataset_id}",
                                headers=self.headers, params=params)
        response.raise_for_status()
        return response.json()

    def extract_media_info(self, results: Dict) -> List[Dict]:
        media_items = []
        for cluster in results.get('clusters', []):
            for item in cluster.get('previews', []):
                media_items.append({
                    'id': item['media_id'],
                    'preview_url': item.get('media_uri'),
                    'caption': item.get('caption'),
                    'relevance_score': item.get('relevance_score'),
                    'label': item.get('label')
                })
        return media_items

Example Usage

client = VisualLayerClient(API_URL, API_TOKEN)
results = client.search_by_caption(dataset_id=DATASET_ID, caption_query="beach sunset", textual_similarity_threshold=0.75)
media = client.extract_media_info(results)

Advanced Techniques

Phrase Match

image_caption="\"golden sunset\""

Semantic Exclusions

image_caption=beach%20sunset%20without%20people

Best Practices

  • Use specific queries
  • Set a minimum similarity threshold
  • Use tags and labels to focus results
  • Sort using relevance_score
  • Remember: visual clusters ≠ semantic clusters

Pagination Helper

def get_all_caption_matches(client, dataset_id, caption_query, min_similarity=0.75):
    all_media = []
    page = 0
    total_pages = 1
    while page < total_pages:
        results = client.search_by_caption(dataset_id, caption_query,
                                           textual_similarity_threshold=min_similarity,
                                           page_number=page)
        all_media.extend(client.extract_media_info(results))
        total_pages = results.get('total_pages', 1)
        page += 1
    return all_media

Error Handling

def search_with_retries(client, dataset_id, caption_query, max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.search_by_caption(dataset_id, caption_query)
        except requests.exceptions.HTTPError as e:
            if e.response.status_code == 429:
                time.sleep((2 ** attempt) + random.uniform(0, 1))
            elif e.response.status_code == 401:
                raise Exception("Check your API token.")
            elif e.response.status_code == 404:
                raise Exception("Dataset not found.")
            else:
                raise

Limitations

  • Max 100 results per page
  • Minimum threshold: 0.5
  • Quality depends on caption accuracy
  • Complex phrasing may yield fuzzy results