Skip to main content

How This Helps

Incrementally add new media (images or videos) to an already-indexed dataset without re-processing everything from scratch. New media is processed as an independent batch and becomes immediately visible. When ready, trigger a full reindex to re-cluster the entire dataset.

Prerequisites

Before calling the Add Media API, ensure:
  1. Dataset status is READY or PARTIAL INDEX
  2. Dataset has an embedding config — the dataset must have been indexed at least once with an embedding model so new media uses the same model for consistency
  3. Authentication — you need a valid JWT token or session cookie (see Authentication)
You can verify your dataset status using the Retrieve Dataset Status endpoint before attempting to add media.

API Endpoint

POST /api/v1/dataset/{dataset_id}/add_media
Content-Type: multipart/form-data
Authorization: Bearer <jwt>

Media Sources

Exactly one media source must be provided per request:
SourceForm FieldDescription
S3 Folder (Recommended)s3_uriPath to an S3 bucket folder containing media files
S3 Manifests3_uriPath to a .csv, .parquet, or .txt manifest file listing media
Direct Uploadfiles[]One or more files uploaded via multipart form
Archive UploadarchiveA single .zip, .tar, or .tar.gz archive

Optional Parameters

ParameterTypeDefaultDescription
auto_reindexbooleanfalseAutomatically run a full reindex after the partial update completes
assume_rolestringnullAWS IAM role ARN to assume for cross-account S3 access
batch_n_videosintegernullOverride auto-calculated video count (for resource allocation)
batch_n_imagesintegernullOverride auto-calculated image count (for resource allocation)
batch_n_objectsintegernullOverride auto-calculated object count (for resource allocation)
use_spotbooleannullAllow pod scheduling on spot instances (cloud only)

For production workflows, using an S3 folder is the recommended approach. Point s3_uri to a folder in your S3 bucket containing images and/or videos.

Step 1: Identify your dataset ID

Find your dataset ID from the Visual Layer UI (it’s in the browser URL when viewing a dataset: https://app.visual-layer.com/dataset/<dataset_id>/data), or list your datasets via the API:
curl -H "Authorization: Bearer <jwt>" \
  https://app.visual-layer.com/api/v1/datasets

Step 2: Verify dataset is ready

curl -H "Authorization: Bearer <jwt>" \
  https://app.visual-layer.com/api/v1/dataset/<dataset_id>
Confirm the response shows "status_new": "READY" or "status_new": "PARTIAL INDEX".

Step 3: Add media from S3

curl -X POST \
  -H "Authorization: Bearer <jwt>" \
  -F "s3_uri=s3://your-bucket/path/to/media/folder/" \
  https://app.visual-layer.com/api/v1/dataset/<dataset_id>/add_media
A successful request returns HTTP 202 Accepted with an empty body. The processing runs asynchronously in the background.

Working Example

The following example was tested against a live Visual Layer Cloud environment and demonstrates adding 5 TikTok videos from an S3 bucket to an existing dataset.

Add videos from S3

curl -X POST \
  -H "Authorization: Bearer <jwt>" \
  -F "s3_uri=s3://visual-layer/datasets/tiktok_users/amit46473/" \
  https://app.visual-layer.com/api/v1/dataset/54c51218-db7a-11f0-b8bd-ea1ec7478511/add_media
Response: HTTP 202 Accepted (empty body)

Poll dataset status

After submitting, the dataset transitions to UPDATING / READ ONLY while processing:
curl -H "Authorization: Bearer <jwt>" \
  https://app.visual-layer.com/api/v1/dataset/54c51218-db7a-11f0-b8bd-ea1ec7478511
{
  "id": "54c51218-db7a-11f0-b8bd-ea1ec7478511",
  "display_name": "tomer1344",
  "status": "UPDATING",
  "status_new": "READ ONLY",
  "n_images": 17,
  "n_videos": 31,
  "progress": 100
}
Once processing completes, the dataset moves to PARTIAL INDEX, indicating new media has been added but the dataset has not yet been re-clustered.

Add Media with Auto-Reindex

If you want the dataset to be fully re-clustered automatically after the new media is processed, set auto_reindex=true:
curl -X POST \
  -H "Authorization: Bearer <jwt>" \
  -F "s3_uri=s3://your-bucket/path/to/media/" \
  -F "auto_reindex=true" \
  https://app.visual-layer.com/api/v1/dataset/<dataset_id>/add_media
This runs a combined workflow: partial update (process new media) followed by reindex (re-cluster everything). The dataset returns to READY when complete.
Use auto_reindex=true when you want a single API call to handle everything. Omit it if you plan to add multiple batches before re-clustering.

Manual Reindex

If you added media without auto_reindex, the dataset enters PARTIAL INDEX status. You can trigger a manual reindex when ready:
curl -X POST \
  -H "Authorization: Bearer <jwt>" \
  https://app.visual-layer.com/api/v1/dataset/<dataset_id>/reindex
Response: HTTP 202 Accepted
The reindex endpoint only accepts datasets in PARTIAL INDEX status. If the dataset is still processing (READ ONLY), wait for it to finish before triggering reindex.

Alternative: Direct File Upload

Upload individual files directly via multipart form:
curl -X POST \
  -H "Authorization: Bearer <jwt>" \
  -F "files[]=@/path/to/image1.jpg" \
  -F "files[]=@/path/to/image2.jpg" \
  -F "files[]=@/path/to/video.mp4" \
  https://app.visual-layer.com/api/v1/dataset/<dataset_id>/add_media

Alternative: Archive Upload

Upload a single archive file:
curl -X POST \
  -H "Authorization: Bearer <jwt>" \
  -F "archive=@/path/to/media.zip" \
  https://app.visual-layer.com/api/v1/dataset/<dataset_id>/add_media

Cross-Account S3 Access

If your S3 data is in a different AWS account, use the assume_role parameter:
curl -X POST \
  -H "Authorization: Bearer <jwt>" \
  -F "s3_uri=s3://external-bucket/images/" \
  -F "assume_role=arn:aws:iam::123456789012:role/VisualLayerAccess" \
  https://app.visual-layer.com/api/v1/dataset/<dataset_id>/add_media

Python Example

import requests

VL_BASE_URL = "https://app.visual-layer.com"
JWT_TOKEN = "<your-jwt-token>"
DATASET_ID = "<your-dataset-id>"

headers = {"Authorization": f"Bearer {JWT_TOKEN}"}

# Step 1: Verify dataset is READY
resp = requests.get(
    f"{VL_BASE_URL}/api/v1/dataset/{DATASET_ID}",
    headers=headers,
)
dataset = resp.json()
print(f"Dataset: {dataset['display_name']}, Status: {dataset['status_new']}")

if dataset["status_new"] not in ("READY", "PARTIAL INDEX"):
    print(f"Dataset is {dataset['status_new']} — cannot add media.")
    exit(1)

# Step 2: Add media from S3
resp = requests.post(
    f"{VL_BASE_URL}/api/v1/dataset/{DATASET_ID}/add_media",
    headers=headers,
    data={"s3_uri": "s3://your-bucket/path/to/media/"},
)

if resp.status_code == 202:
    print("Add media accepted — processing in background.")
else:
    print(f"Error: {resp.status_code}{resp.text}")
    exit(1)

# Step 3: Poll status until processing completes
import time

while True:
    resp = requests.get(
        f"{VL_BASE_URL}/api/v1/dataset/{DATASET_ID}",
        headers=headers,
    )
    status_new = resp.json()["status_new"]
    print(f"  Status: {status_new}")
    if status_new in ("READY", "PARTIAL INDEX", "ERROR"):
        break
    time.sleep(30)

print(f"Final status: {status_new}")

Response Codes

Add Media (POST /api/v1/dataset/{dataset_id}/add_media)

HTTP CodeStatusMeaningCommon Cause
202AcceptedProcessing started successfullyRequest valid, pipeline triggered asynchronously
400Bad RequestInvalid request parametersMissing embedding config, invalid S3 URI, no media source provided, or multiple media sources in one request
403ForbiddenFeature disabled or insufficient permissionsADD_MEDIA_ENABLED is false, or user lacks write access to the dataset
404Not FoundDataset not foundDataset does not exist, or the authenticated user does not have access to it
409ConflictDataset state incompatibleDataset status is not READY or PARTIAL INDEX, or another add media / reindex operation is already running
500Internal Server ErrorServer-side failureFile upload to S3 failed, or pipeline trigger failed

Reindex (POST /api/v1/dataset/{dataset_id}/reindex)

HTTP CodeStatusMeaningCommon Cause
202AcceptedReindex started successfullyRequest valid, reindex pipeline triggered
404Not FoundDataset not foundDataset does not exist or user lacks access
409ConflictDataset state incompatibleDataset status is not PARTIAL INDEX
500Internal Server ErrorServer-side failurePipeline trigger failed

Error Response Format

Error responses return a JSON body with a detail field:
{
  "detail": "Operation 'add_media' is blocked while MEDIA_ADDITION task is running"
}
Common error messages:
Error MessageHTTP CodeWhat to Do
"Dataset not found"404Check the dataset ID and your access permissions
"Dataset {id} has no embedding_config..."400The dataset must be fully indexed at least once before adding media
"Exactly one media source must be provided: files[], s3_uri, or archive"400Provide exactly one media source per request
"Invalid S3 URI or no media files found at {uri}"400Check the S3 path exists and contains supported media files
"Operation 'add_media' is blocked while MEDIA_ADDITION task is running"409Wait for the current operation to finish before starting another
"Add media feature is not enabled"403Contact your administrator to enable the add media feature

Dataset Status Flow

After calling add media, the dataset goes through these status transitions:
READY ──[add_media]──> UPDATING (READ ONLY) ──> PARTIAL INDEX

                                          [reindex] ───┘

                                                       v
                                                     READY
Phasestatusstatus_newWhat’s Happening
BeforeREADYREADYDataset is idle, ready for operations
ProcessingUPDATINGREAD ONLYNew media is being processed (browsing still works)
Awaiting ReindexPENDING_INDEXPARTIAL INDEXNew media is visible but dataset is not re-clustered
ReindexingINDEXINGINDEXINGFull re-clustering in progress
CompleteREADYREADYAll media processed and clustered
With auto_reindex=true, the flow goes directly from UPDATING through reindexing back to READY without stopping at PARTIAL INDEX.

On-Premises (Docker Compose)

For on-premises installations, use the pipeline service endpoint:
curl "http://localhost:2080/api/v1/process/add_media?dataset_id=<dataset_id>&path=/data/new_images"
Or use the CLI tool:
# Add media without auto-reindex
./run_profiler.sh -o add_media -d <dataset_id> -p /path/to/new/media

# Add media with auto-reindex
./run_profiler.sh -o add_media -d <dataset_id> -p /path/to/new/media -a

# Manual reindex later
./run_profiler.sh -o reindex -d <dataset_id>
On-premises add media uses local file paths instead of S3 URIs. The path parameter must be an absolute path accessible from the pipeline container.