Visual Layer Documentation: Visual Intelligence, At Scale

How This Helps

Incrementally add new media (images or videos) to an already-indexed dataset without re-processing everything from scratch. New media is processed as an independent batch and becomes immediately visible. When ready, trigger a full reindex to re-cluster the entire dataset.

Use status_new for all status checks. The status field is being retired. See Retrieve Dataset Status.

Prerequisites

Before calling the Add Media API, ensure:

Dataset status is READY or PARTIAL INDEX
Dataset has an embedding config — the dataset must have been indexed at least once with an embedding model so new media uses the same model for consistency
Authentication — you need a valid JWT token or session cookie (see Authentication)

You can verify your dataset status using the Retrieve Dataset Status endpoint before attempting to add media.

API Endpoint

POST /api/v1/dataset/{dataset_id}/add_media
Content-Type: multipart/form-data
Authorization: Bearer <jwt>

Media Sources

Exactly one media source must be provided per request:

Source	Form Field	Description
S3 Folder (Recommended)	`s3_uri`	Path to an S3 bucket folder containing media files
S3 Manifest	`s3_uri`	Path to a `.csv`, `.parquet`, or `.txt` manifest file listing media
Direct Upload	`files`	One or more files uploaded via multipart form
Archive Upload	`archive`	A single `.zip`, `.tar`, or `.tar.gz` archive

Optional Parameters

Parameter	Type	Default	Description
`auto_reindex`	boolean	`false`	Automatically run a full reindex after the partial update completes
`assume_role`	string	`null`	AWS IAM role ARN to assume for cross-account S3 access
`batch_n_videos`	integer	`null`	Override auto-calculated video count (for resource allocation)
`batch_n_images`	integer	`null`	Override auto-calculated image count (for resource allocation)
`batch_n_objects`	integer	`null`	Override auto-calculated object count (for resource allocation)
`use_spot`	boolean	`null`	Allow pod scheduling on spot instances (cloud only)

Recommended: Add Media from S3 Bulk Path

For production workflows, using an S3 folder is the recommended approach. Point s3_uri to a folder in your S3 bucket containing images and/or videos.

Step 1: Identify your dataset ID

Find your dataset ID from the Visual Layer UI (it’s in the browser URL when viewing a dataset: https://app.visual-layer.com/dataset/<dataset_id>/data), or list your datasets via the API:

curl -H "Authorization: Bearer <jwt>" \
  https://app.visual-layer.com/api/v1/datasets

Step 2: Verify dataset is ready

curl -H "Authorization: Bearer <jwt>" \
  https://app.visual-layer.com/api/v1/dataset/<dataset_id>

Confirm the response shows "status_new": "READY" or "status_new": "PARTIAL INDEX".

Step 3: Add media from S3

curl -X POST \
  -H "Authorization: Bearer <jwt>" \
  -F "s3_uri=s3://your-bucket/path/to/media/folder/" \
  https://app.visual-layer.com/api/v1/dataset/<dataset_id>/add_media

A successful request returns HTTP 202 Accepted with an empty body. The processing runs asynchronously in the background.

Working Example

The following example was tested against a live Visual Layer Cloud environment and demonstrates adding 5 TikTok videos from an S3 bucket to an existing dataset.

Add videos from S3

curl -X POST \
  -H "Authorization: Bearer <jwt>" \
  -F "s3_uri=s3://visual-layer/datasets/tiktok_users/amit46473/" \
  https://app.visual-layer.com/api/v1/dataset/54c51218-db7a-11f0-b8bd-ea1ec7478511/add_media

Response: HTTP 202 Accepted (empty body)

Poll dataset status

After submitting, the dataset transitions to UPDATING / READ ONLY while processing:

curl -H "Authorization: Bearer <jwt>" \
  https://app.visual-layer.com/api/v1/dataset/54c51218-db7a-11f0-b8bd-ea1ec7478511

{
  "id": "54c51218-db7a-11f0-b8bd-ea1ec7478511",
  "display_name": "tomer1344",
  "status": "UPDATING",
  "status_new": "READ ONLY",
  "n_images": 17,
  "n_videos": 31,
  "progress": 100
}

Once processing completes, the dataset moves to PARTIAL INDEX, indicating new media has been added but the dataset has not yet been re-clustered.

Add Media with Auto-Reindex

If you want the dataset to be fully re-clustered automatically after the new media is processed, set auto_reindex=true:

curl -X POST \
  -H "Authorization: Bearer <jwt>" \
  -F "s3_uri=s3://your-bucket/path/to/media/" \
  -F "auto_reindex=true" \
  https://app.visual-layer.com/api/v1/dataset/<dataset_id>/add_media

This runs a combined workflow: partial update (process new media) followed by reindex (re-cluster everything). The dataset returns to READY when complete.

Use auto_reindex=true when you want a single API call to handle everything. Omit it if you plan to add multiple batches before re-clustering.

Manual Reindex

If you added media without auto_reindex, the dataset enters PARTIAL INDEX status. You can trigger a manual reindex when ready:

curl -X POST \
  -H "Authorization: Bearer <jwt>" \
  https://app.visual-layer.com/api/v1/dataset/<dataset_id>/reindex

Response: HTTP 202 Accepted

The reindex endpoint only accepts datasets in PARTIAL INDEX status. If the dataset is still processing (READ ONLY), wait for it to finish before triggering reindex.

Alternative: Direct File Upload

Upload individual files directly via multipart form. Use the files field (not files[]):

curl -X POST \
  -H "Authorization: Bearer <jwt>" \
  -F "files=@/path/to/image1.jpg" \
  -F "files=@/path/to/image2.jpg" \
  -F "files=@/path/to/video.mp4" \
  https://app.visual-layer.com/api/v1/dataset/<dataset_id>/add_media

Alternative: Archive Upload

Upload a single archive file:

curl -X POST \
  -H "Authorization: Bearer <jwt>" \
  -F "archive=@/path/to/media.zip" \
  https://app.visual-layer.com/api/v1/dataset/<dataset_id>/add_media

Cross-Account S3 Access

If your S3 data is in a different AWS account, use the assume_role parameter:

curl -X POST \
  -H "Authorization: Bearer <jwt>" \
  -F "s3_uri=s3://external-bucket/images/" \
  -F "assume_role=arn:aws:iam::123456789012:role/VisualLayerAccess" \
  https://app.visual-layer.com/api/v1/dataset/<dataset_id>/add_media

Python Example

import requests

VL_BASE_URL = "https://app.visual-layer.com"
JWT_TOKEN = "<your-jwt-token>"
DATASET_ID = "<your-dataset-id>"

headers = {"Authorization": f"Bearer {JWT_TOKEN}"}

# Step 1: Verify dataset is READY
resp = requests.get(
    f"{VL_BASE_URL}/api/v1/dataset/{DATASET_ID}",
    headers=headers,
)
dataset = resp.json()
print(f"Dataset: {dataset['display_name']}, Status: {dataset['status_new']}")

if dataset["status_new"] not in ("READY", "PARTIAL INDEX"):
    print(f"Dataset is {dataset['status_new']} — cannot add media.")
    exit(1)

# Step 2: Add media from S3
resp = requests.post(
    f"{VL_BASE_URL}/api/v1/dataset/{DATASET_ID}/add_media",
    headers=headers,
    data={"s3_uri": "s3://your-bucket/path/to/media/"},
)

if resp.status_code == 202:
    print("Add media accepted — processing in background.")
else:
    print(f"Error: {resp.status_code} — {resp.text}")
    exit(1)

# Step 3: Poll status until processing completes
import time

while True:
    resp = requests.get(
        f"{VL_BASE_URL}/api/v1/dataset/{DATASET_ID}",
        headers=headers,
    )
    status_new = resp.json()["status_new"]
    print(f"  Status: {status_new}")
    if status_new in ("READY", "PARTIAL INDEX", "ERROR"):
        break
    time.sleep(30)

print(f"Final status: {status_new}")

Response Codes

See Error Handling for the error response format and Python handling patterns.

Add Media (`POST /api/v1/dataset/{dataset_id}/add_media`)

HTTP Code	Status	Meaning	Common Cause
202	Accepted	Processing started successfully	Request valid, pipeline triggered asynchronously
400	Bad Request	Invalid request parameters	Missing embedding config, invalid S3 URI, no media source provided, or multiple media sources in one request
403	Forbidden	Feature disabled or insufficient permissions	`ADD_MEDIA_ENABLED` is false, or user lacks write access to the dataset
404	Not Found	Dataset not found	Dataset does not exist, or the authenticated user does not have access to it
409	Conflict	Dataset state incompatible	Dataset status is not `READY` or `PARTIAL INDEX`, or another add media / reindex operation is already running
500	Internal Server Error	Server-side failure	File upload to S3 failed, or pipeline trigger failed

Reindex (`POST /api/v1/dataset/{dataset_id}/reindex`)

HTTP Code	Status	Meaning	Common Cause
202	Accepted	Reindex started successfully	Request valid, reindex pipeline triggered
404	Not Found	Dataset not found	Dataset does not exist or user lacks access
409	Conflict	Dataset state incompatible	Dataset status is not `PARTIAL INDEX`
500	Internal Server Error	Server-side failure	Pipeline trigger failed

Error Response Format

Error responses return a JSON body with a detail field:

{
  "detail": "Operation 'add_media' is blocked while MEDIA_ADDITION task is running"
}

Common error messages:

Error Message	HTTP Code	What to Do
`"Dataset not found"`	404	Check the dataset ID and your access permissions
`"Dataset {id} has no embedding_config..."`	400	The dataset must be fully indexed at least once before adding media
`"Exactly one media source must be provided: files[], s3_uri, or archive"`	400	Provide exactly one media source per request
`"Invalid S3 URI or no media files found at {uri}"`	400	Check the S3 path exists and contains supported media files
`"Operation 'add_media' is blocked while MEDIA_ADDITION task is running"`	409	Wait for the current operation to finish before starting another
`"Add media feature is not enabled"`	403	Contact your administrator to enable the add media feature

Dataset Status Flow

After calling add media, the dataset goes through these status transitions:

READY ──[add_media]──> READ ONLY ──> PARTIAL INDEX
                                           │
                               [reindex] ──┘
                                           │
                                           v
                                         READY

Phase	`status_new`
Before	`READY`
Processing	`READ ONLY`
Awaiting Reindex	`PARTIAL INDEX`
Reindexing	`READ ONLY` → `INDEXING`
Complete	`READY`

For a description of each status value, see Retrieve Dataset Status.

With auto_reindex=true, the flow goes directly from UPDATING through reindexing back to READY without stopping at PARTIAL INDEX.

On-Premises (Docker Compose)

For on-premises installations, use the pipeline service endpoint:

curl "http://localhost:2080/api/v1/process/add_media?dataset_id=<dataset_id>&path=/data/new_images"

Or use the CLI tool:

# Add media without auto-reindex
./run_profiler.sh -o add_media -d <dataset_id> -p /path/to/new/media

# Add media with auto-reindex
./run_profiler.sh -o add_media -d <dataset_id> -p /path/to/new/media -a

# Manual reindex later
./run_profiler.sh -o reindex -d <dataset_id>

On-premises add media uses local file paths instead of S3 URIs. The path parameter must be an absolute path accessible from the pipeline container.

API Getting Started

Create a Dataset

Manage Datasets

Search

Guides

Useful Scripts

Add Media to an Existing Dataset

How This Helps

Prerequisites

API Endpoint

Media Sources

Optional Parameters

Recommended: Add Media from S3 Bulk Path

Step 1: Identify your dataset ID

Step 2: Verify dataset is ready

Step 3: Add media from S3

Working Example

Add videos from S3

Poll dataset status

Add Media with Auto-Reindex

Manual Reindex

Alternative: Direct File Upload

Alternative: Archive Upload

Cross-Account S3 Access

Python Example

Response Codes

Add Media (`POST /api/v1/dataset/{dataset_id}/add_media`)

Reindex (`POST /api/v1/dataset/{dataset_id}/reindex`)

Error Response Format

Dataset Status Flow

On-Premises (Docker Compose)

API Getting Started

Create a Dataset

Manage Datasets

Search

Guides

Useful Scripts

How This Helps

​Prerequisites

​API Endpoint

​Media Sources

​Optional Parameters

​Recommended: Add Media from S3 Bulk Path

​Step 1: Identify your dataset ID

​Step 2: Verify dataset is ready

​Step 3: Add media from S3

​Working Example

​Add videos from S3

​Poll dataset status

​Add Media with Auto-Reindex

​Manual Reindex

​Alternative: Direct File Upload

​Alternative: Archive Upload

​Cross-Account S3 Access

​Python Example

​Response Codes

​Add Media (POST /api/v1/dataset/{dataset_id}/add_media)

​Reindex (POST /api/v1/dataset/{dataset_id}/reindex)

​Error Response Format

​Dataset Status Flow

​On-Premises (Docker Compose)

Prerequisites

API Endpoint

Media Sources

Optional Parameters

Recommended: Add Media from S3 Bulk Path

Step 1: Identify your dataset ID

Step 2: Verify dataset is ready

Step 3: Add media from S3

Working Example

Add videos from S3

Poll dataset status

Add Media with Auto-Reindex

Manual Reindex

Alternative: Direct File Upload

Alternative: Archive Upload

Cross-Account S3 Access

Python Example

Response Codes

Add Media (`POST /api/v1/dataset/{dataset_id}/add_media`)

Reindex (`POST /api/v1/dataset/{dataset_id}/reindex`)

Error Response Format

Dataset Status Flow

On-Premises (Docker Compose)