π Visual Layer API: Textual Caption Search
Use the Visual Layer Exploration API to search your image dataset using natural language captions. This feature leverages semantic understanding to return images whose captions match your intentβnot just your keywords.
π§ Overview
The GET /api/v1/explore/{dataset_id}
endpoint, combined with the image_caption
parameter, enables caption keyword search.
Example:
A search for "beach sunset"
may return captions like:
- Χ΄A man at the beach on sunset"
- "Many sunsets and beaches"
In addition, adding quotation marks can make a search more specific, for example:
- A search for
red car
will return results with any image of a car containing something red(red headlights, red paint patterns on the car) - A search for
"red car"
will return results of images containing a red car
Each result includes a relevance score indicating how well the caption semantically matches your query.
π Authentication
On-prem installations do not require authentication.
In case you have authentication all requests require a bearer token in the header:
Authorization: Bearer {your_api_token}
π₯ Textual Caption Search
Endpoint
GET /api/v1/explore/{dataset_id}
Required Query Parameters
Name | Type | Description |
---|---|---|
image_caption | string | The natural language caption query |
threshold | integer | Clustering similarity threshold (0β4) |
entity_type | string | Either IMAGES or OBJECTS |
textual_similarity_threshold | float | Sets the minimum semantic match quality for caption searches. Higher values (closer to 1.0) return only highly relevant results, while lower values include more varied matches. This filters results based on the relevance score (0.0-1.0) |
π§ͺ Example (cURL)
curl -H "Authorization: Bearer YOUR_TOKEN" \
"https://app.visual-layer.com/api/v1/explore/95233006-eddc-11ef-b303-76dbc3993eb2?threshold=0&image_caption=beach%20sunset&entity_type=IMAGES"
π¦ Response (Simplified)
{
"clusters": [
{
"id": "...",
"entity_type": "IMAGES",
"size": 4,
"representative_preview": "...",
"media": [
{
"id": "...",
"preview": "...",
"filename": "sunset_beach_aerial.jpg",
"caption": "Golden sun setting over the ocean waves",
"relevance_score": 0.92
}
]
}
],
"total_pages": 1,
"current_page": 0
}
π§ Understanding the Response
clusters
: Groups of visually similar imagesmedia
: Media items that match your querycaption
: Matched image descriptionrelevance_score
: Semantic similarity score (0.0β1.0)preview
: Image preview URL
Results are sorted by
relevance_score
βmost relevant first.
π§ͺ Filtering and Refinement
Set a Minimum Similarity Threshold
...&textual_similarity_threshold=0.85
Combine with Labels and Tags
...&labels=[beach,sunset]&tags=7b89e36c-c2d1-4af9-9d23-f74e018e67c5
π Python Client
import requests
from typing import Dict, List, Optional
class VisualLayerClient:
def __init__(self, api_url: str, api_token: str):
self.api_url = api_url
self.headers = { 'Authorization': f'Bearer {api_token}' }
def search_by_caption(self, dataset_id: str, caption_query: str, threshold: int = 0,
entity_type: str = 'IMAGES',
textual_similarity_threshold: Optional[float] = None,
page_number: int = 0,
labels: Optional[List[str]] = None,
tags: Optional[List[str]] = None) -> Dict:
params = {
'image_caption': caption_query,
'threshold': threshold,
'entity_type': entity_type,
'page_number': page_number
}
if textual_similarity_threshold is not None:
params['textual_similarity_threshold'] = textual_similarity_threshold
if labels:
params['labels'] = f"[{','.join(labels)}]"
if tags:
params['tags'] = "untagged" if tags == ["untagged"] else f"[{','.join(tags)}]"
response = requests.get(f"{self.api_url}/api/v1/explore/{dataset_id}",
headers=self.headers, params=params)
response.raise_for_status()
return response.json()
def extract_media_info(self, results: Dict) -> List[Dict]:
media_items = []
for cluster in results.get('clusters', []):
for item in cluster.get('previews', []):
media_items.append({
'id': item['media_id'],
'preview_url': item.get('media_uri'),
'caption': item.get('caption'),
'relevance_score': item.get('relevance_score'),
'label': item.get('label')
})
return media_items
Example Usage
client = VisualLayerClient(API_URL, API_TOKEN)
results = client.search_by_caption(dataset_id=DATASET_ID, caption_query="beach sunset", textual_similarity_threshold=0.75)
media = client.extract_media_info(results)
π Advanced Techniques
Phrase Matching
image_caption="\"golden sunset\""
Compound Queries
image_caption=beach%20sunset%20without%20people
β
Best Practices
- Use specific queries for higher precision
- Set a similarity threshold to filter weaker matches
- Combine filters like tags and labels for refinement
- Use
relevance_score
to sort or prioritize results - Visual clusters β semantic groups β always use relevance for ranking
π Pagination Helper
def get_all_caption_matches(client, dataset_id, caption_query, min_similarity=0.75):
all_media = []
page = 0
total_pages = 1
while page < total_pages:
results = client.search_by_caption(dataset_id, caption_query,
textual_similarity_threshold=min_similarity,
page_number=page)
all_media.extend(client.extract_media_info(results))
total_pages = results.get('total_pages', 1)
page += 1
return all_media
π¨ Error Handling
def search_with_retries(client, dataset_id, caption_query, max_retries=3):
for attempt in range(max_retries):
try:
return client.search_by_caption(dataset_id, caption_query)
except requests.exceptions.HTTPError as e:
if e.response.status_code == 429:
time.sleep((2 ** attempt) + random.uniform(0, 1))
elif e.response.status_code == 401:
raise Exception("Check your API token.")
elif e.response.status_code == 404:
raise Exception("Dataset not found.")
else:
raise
β οΈ Limitations
- Maximum of 100 results per page
- Thresholds must be β₯ 0.5
- Caption quality affects match accuracy
- Complex queries may yield unexpected results
Updated 8 days ago