Skip to main content

How This Helps

Caption Search enables you to retrieve relevant media using natural phrases like “beach sunset” or “red car”—not just exact keyword matches.

Overview

The GET /api/v1/explore/{dataset_id} endpoint enables semantic caption search using the image_caption parameter. This type of search returns results based on intent and meaning, not just exact text.

Example

A query for "beach sunset" may return:
  • “A man at the beach on sunset”
  • “Golden sun setting over the ocean waves”
Searching for:
  • red car — matches loosely related red items and cars
  • "red car" — matches images of an actual red car
Each result includes a relevance score that quantifies the quality of the match (0.0–1.0).

Authentication

All API calls require a bearer token:
Authorization: Bearer {your_api_token}

Endpoint

GET /api/v1/explore/{dataset_id}

Required Parameters

NameTypeDescription
image_captionstringThe search query (e.g. "beach sunset")
thresholdintegerClustering threshold (0–4)
entity_typestringMust be IMAGES or OBJECTS
textual_similarity_thresholdfloatScore cutoff (0.0–1.0) for relevance

Example (cURL)

curl -H "Authorization: Bearer YOUR_TOKEN" \
"https://app.visual-layer.com/api/v1/explore/95233006-eddc-11ef-b303-76dbc3993eb2?threshold=0&image_caption=beach%20sunset&entity_type=IMAGES"

Response Example

{
  "clusters": [
    {
      "id": "...",
      "entity_type": "IMAGES",
      "size": 4,
      "representative_preview": "...",
      "media": [
        {
          "id": "...",
          "preview": "...",
          "filename": "sunset_beach_aerial.jpg",
          "caption": "Golden sun setting over the ocean waves",
          "relevance_score": 0.92
        }
      ]
    }
  ],
  "total_pages": 1,
  "current_page": 0
}

Response Breakdown

  • clusters: Visual similarity groups
  • media: Individual results
  • caption: Caption match
  • relevance_score: Quality of the match
  • preview: Thumbnail preview

Filter and Refine

Add Similarity Threshold

...&textual_similarity_threshold=0.85

Filter by Tags and Labels

...&labels=[beach,sunset]&tags=7b89e36c-c2d1-4af9-9d23-f74e018e67c5

Python Client

import requests
from typing import Dict, List, Optional

class VisualLayerClient:
    def __init__(self, api_url: str, api_token: str):
        self.api_url = api_url
        self.headers = { 'Authorization': f'Bearer {api_token}' }

    def search_by_caption(self, dataset_id: str, caption_query: str, threshold: int = 0,
                          entity_type: str = 'IMAGES',
                          textual_similarity_threshold: Optional[float] = None,
                          page_number: int = 0,
                          labels: Optional[List[str]] = None,
                          tags: Optional[List[str]] = None) -> Dict:
        params = {
            'image_caption': caption_query,
            'threshold': threshold,
            'entity_type': entity_type,
            'page_number': page_number
        }
        if textual_similarity_threshold is not None:
            params['textual_similarity_threshold'] = textual_similarity_threshold
        if labels:
            params['labels'] = f"[{','.join(labels)}]"
        if tags:
            params['tags'] = "untagged" if tags == ["untagged"] else f"[{','.join(tags)}]"
        response = requests.get(f"{self.api_url}/api/v1/explore/{dataset_id}",
                                headers=self.headers, params=params)
        response.raise_for_status()
        return response.json()

    def extract_media_info(self, results: Dict) -> List[Dict]:
        media_items = []
        for cluster in results.get('clusters', []):
            for item in cluster.get('previews', []):
                media_items.append({
                    'id': item['media_id'],
                    'preview_url': item.get('media_uri'),
                    'caption': item.get('caption'),
                    'relevance_score': item.get('relevance_score'),
                    'label': item.get('label')
                })
        return media_items

Example Usage

client = VisualLayerClient(API_URL, API_TOKEN)
results = client.search_by_caption(dataset_id=DATASET_ID, caption_query="beach sunset", textual_similarity_threshold=0.75)
media = client.extract_media_info(results)

Advanced Techniques

Phrase Match

image_caption="\"golden sunset\""

Semantic Exclusions

image_caption=beach%20sunset%20without%20people

Best Practices

  • Use specific queries
  • Set a minimum similarity threshold
  • Use tags and labels to focus results
  • Sort using relevance_score
  • Remember: visual clusters ≠ semantic clusters

Pagination Helper

def get_all_caption_matches(client, dataset_id, caption_query, min_similarity=0.75):
    all_media = []
    page = 0
    total_pages = 1
    while page < total_pages:
        results = client.search_by_caption(dataset_id, caption_query,
                                           textual_similarity_threshold=min_similarity,
                                           page_number=page)
        all_media.extend(client.extract_media_info(results))
        total_pages = results.get('total_pages', 1)
        page += 1
    return all_media

Error Handling

def search_with_retries(client, dataset_id, caption_query, max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.search_by_caption(dataset_id, caption_query)
        except requests.exceptions.HTTPError as e:
            if e.response.status_code == 429:
                time.sleep((2 ** attempt) + random.uniform(0, 1))
            elif e.response.status_code == 401:
                raise Exception("Check your API token.")
            elif e.response.status_code == 404:
                raise Exception("Dataset not found.")
            else:
                raise

Limitations

  • Max 100 results per page
  • Minimum threshold: 0.5
  • Quality depends on caption accuracy
  • Complex phrasing may yield fuzzy results
I