🔍 Visual Layer API: Textual Caption Search

Use the Visual Layer Exploration API to search your image dataset using natural language captions. This feature leverages semantic understanding to return images whose captions match your intent—not just your keywords.

🧠 Overview

The GET /api/v1/explore/{dataset_id} endpoint, combined with the image_caption parameter, enables caption keyword search.

Example:

A search for "beach sunset" may return captions like:

״A man at the beach on sunset"
"Many sunsets and beaches"

In addition, adding quotation marks can make a search more specific, for example:

A search for red car will return results with any image of a car containing something red(red headlights, red paint patterns on the car)
A search for "red car" will return results of images containing a red car

Each result includes a relevance score indicating how well the caption semantically matches your query.

🔐 Authentication

On-prem installations do not require authentication.
In case you have authentication all requests require a bearer token in the header:

Authorization: Bearer {your_api_token}

📥 Textual Caption Search

Endpoint

GET /api/v1/explore/{dataset_id}

Required Query Parameters

Name	Type	Description
`image_caption`	string	The natural language caption query
`threshold`	integer	Clustering similarity threshold (0–4)
`entity_type`	string	Either `IMAGES` or `OBJECTS`
`textual_similarity_threshold`	float	Sets the minimum semantic match quality for caption searches. Higher values (closer to 1.0) return only highly relevant results, while lower values include more varied matches. This filters results based on the relevance score (0.0-1.0)

🧪 Example (cURL)

curl -H "Authorization: Bearer YOUR_TOKEN" \
"https://app.visual-layer.com/api/v1/explore/95233006-eddc-11ef-b303-76dbc3993eb2?threshold=0&image_caption=beach%20sunset&entity_type=IMAGES"

📦 Response (Simplified)

{
  "clusters": [
    {
      "id": "...",
      "entity_type": "IMAGES",
      "size": 4,
      "representative_preview": "...",
      "media": [
        {
          "id": "...",
          "preview": "...",
          "filename": "sunset_beach_aerial.jpg",
          "caption": "Golden sun setting over the ocean waves",
          "relevance_score": 0.92
        }
      ]
    }
  ],
  "total_pages": 1,
  "current_page": 0
}

🧠 Understanding the Response

clusters: Groups of visually similar images
media: Media items that match your query
caption: Matched image description
relevance_score: Semantic similarity score (0.0–1.0)
preview: Image preview URL

Results are sorted by relevance_score—most relevant first.

🧪 Filtering and Refinement

Set a Minimum Similarity Threshold

...&textual_similarity_threshold=0.85

Combine with Labels and Tags

...&labels=[beach,sunset]&tags=7b89e36c-c2d1-4af9-9d23-f74e018e67c5

🐍 Python Client

import requests
from typing import Dict, List, Optional

class VisualLayerClient:
    def __init__(self, api_url: str, api_token: str):
        self.api_url = api_url
        self.headers = { 'Authorization': f'Bearer {api_token}' }

    def search_by_caption(self, dataset_id: str, caption_query: str, threshold: int = 0,
                          entity_type: str = 'IMAGES',
                          textual_similarity_threshold: Optional[float] = None,
                          page_number: int = 0,
                          labels: Optional[List[str]] = None,
                          tags: Optional[List[str]] = None) -> Dict:
        params = {
            'image_caption': caption_query,
            'threshold': threshold,
            'entity_type': entity_type,
            'page_number': page_number
        }
        if textual_similarity_threshold is not None:
            params['textual_similarity_threshold'] = textual_similarity_threshold
        if labels:
            params['labels'] = f"[{','.join(labels)}]"
        if tags:
            params['tags'] = "untagged" if tags == ["untagged"] else f"[{','.join(tags)}]"
        response = requests.get(f"{self.api_url}/api/v1/explore/{dataset_id}",
                                headers=self.headers, params=params)
        response.raise_for_status()
        return response.json()

    def extract_media_info(self, results: Dict) -> List[Dict]:
        media_items = []
        for cluster in results.get('clusters', []):
            for item in cluster.get('previews', []):
                media_items.append({
                    'id': item['media_id'],
                    'preview_url': item.get('media_uri'),
                    'caption': item.get('caption'),
                    'relevance_score': item.get('relevance_score'),
                    'label': item.get('label')
                })
        return media_items

Example Usage

client = VisualLayerClient(API_URL, API_TOKEN)
results = client.search_by_caption(dataset_id=DATASET_ID, caption_query="beach sunset", textual_similarity_threshold=0.75)
media = client.extract_media_info(results)

📚 Advanced Techniques

Phrase Matching

image_caption="\"golden sunset\""

Compound Queries

image_caption=beach%20sunset%20without%20people

✅ Best Practices

Use specific queries for higher precision
Set a similarity threshold to filter weaker matches
Combine filters like tags and labels for refinement
Use relevance_score to sort or prioritize results
Visual clusters ≠ semantic groups – always use relevance for ranking

🛠 Pagination Helper

def get_all_caption_matches(client, dataset_id, caption_query, min_similarity=0.75):
    all_media = []
    page = 0
    total_pages = 1
    while page < total_pages:
        results = client.search_by_caption(dataset_id, caption_query,
                                           textual_similarity_threshold=min_similarity,
                                           page_number=page)
        all_media.extend(client.extract_media_info(results))
        total_pages = results.get('total_pages', 1)
        page += 1
    return all_media

🚨 Error Handling

def search_with_retries(client, dataset_id, caption_query, max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.search_by_caption(dataset_id, caption_query)
        except requests.exceptions.HTTPError as e:
            if e.response.status_code == 429:
                time.sleep((2 ** attempt) + random.uniform(0, 1))
            elif e.response.status_code == 401:
                raise Exception("Check your API token.")
            elif e.response.status_code == 404:
                raise Exception("Dataset not found.")
            else:
                raise

⚠️ Limitations

Maximum of 100 results per page
Thresholds must be ≥ 0.5
Caption quality affects match accuracy
Complex queries may yield unexpected results