Guides

πŸ“€ Exporting a Dataset via `curl` from Visual Layer

This guide walks you through exporting a dataset from the Visual Layer platform using simple curl commands β€” no Python code required!

You’ll learn how to:

  1. βœ… Initiate an asynchronous export
  2. πŸ” Poll the export status
  3. πŸ’Ύ Download the export ZIP file
  4. πŸ“‚ Unzip and explore your dataset

βœ… Step 1: Initiate Dataset Export

To start an export, make a GET request to the export_context_async endpoint.

curl -X GET "https://app.visual-layer.com/api/v1/dataset/<DATASET_ID>/export_context_async" \
  -H "Accept: application/json, text/plain, */*" \
  -G \
  --data-urlencode "file_name=export.zip" \
  --data-urlencode "export_format=json" \
  --data-urlencode "include_images=false"

πŸ” Replace the following:

  • <DATASET_ID>: Your dataset ID from Visual Layer
  • file_name: Desired name for the exported file
  • export_format: Format of export (json, parquet, etc.)
  • include_images: Set to true if you want to include image files

βœ… Example:

curl -X GET "https://app.visual-layer.com/api/v1/dataset/580ed256-f2aa-11ef-a22e-ae309979a730/export_context_async" \
  -H "Accept: application/json, text/plain, */*" \
  -G \
  --data-urlencode "file_name=export.zip" \
  --data-urlencode "export_format=json" \
  --data-urlencode "include_images=false"

The response will contain a task_id like this:

{
  "id": "fdb84834-d19b-4797-861a-d48b7a16f908"
}

πŸ” Step 2: Check Export Status

Poll the export status until it returns "status": "COMPLETED".

curl -X GET "https://app.visual-layer.com/api/v1/dataset/<DATASET_ID>/export_status?export_task_id=<TASK_ID>" \
  -H "Accept: application/json, text/plain, */*"

βœ… Example:

curl -X GET "https://app.visual-layer.com/api/v1/dataset/580ed256-f2aa-11ef-a22e-ae309979a730/export_status?export_task_id=fdb84834-d19b-4797-861a-d48b7a16f908" \
  -H "Accept: application/json, text/plain, */*"

Sample response:

{
  "status": "COMPLETED",
  "download_uri": "https://vl-data-export.s3.amazonaws.com/...."
}

If status is still PENDING, wait a few seconds and retry.


πŸ’Ύ Step 3: Download the Exported ZIP File

Once export status is COMPLETED, use the download_uri from the response to fetch the file:

curl -L "<DOWNLOAD_URI>" --output exported_dataset.zip

βœ… Example:

curl -L "https://vl-data-export.s3.amazonaws.com/your-export.zip?...signed-url..." \
  --output exported_dataset.zip

βœ…

-L ensures curl follows redirects (S3 pre-signed links usually redirect).


πŸ“‚ Step 4: Unzip the Dataset

After the ZIP file is downloaded, unzip it:

unzip exported_dataset.zip -d exported_dataset

You’ll get a folder like:

exported_dataset/
β”œβ”€β”€ metadata.parquet
└── images/
    β”œβ”€β”€ image1.jpg
    β”œβ”€β”€ image2.jpg
    └── ...

πŸš€ Bonus: Full Bash Script Example

Here’s an all-in-one script to automate the full process:

#!/bin/bash

DATASET_ID="your-dataset-id"
FILENAME="export.zip"

echo "πŸš€ Initiating export..."
EXPORT_TASK_RESPONSE=$(curl -s -G "https://app.visual-layer.com/api/v1/dataset/$DATASET_ID/export_context_async" \
  -H "Accept: application/json, text/plain, */*" \
  --data-urlencode "file_name=$FILENAME" \
  --data-urlencode "export_format=json" \
  --data-urlencode "include_images=false")

EXPORT_TASK_ID=$(echo $EXPORT_TASK_RESPONSE | grep -o '"id":"[^"]*' | cut -d':' -f2 | tr -d '"')

echo "πŸ“¦ Export Task ID: $EXPORT_TASK_ID"

echo "⏳ Waiting for export to complete..."
STATUS="PENDING"
while [ "$STATUS" != "COMPLETED" ]; do
  STATUS_RESPONSE=$(curl -s "https://app.visual-layer.com/api/v1/dataset/$DATASET_ID/export_status?export_task_id=$EXPORT_TASK_ID" \
    -H "Accept: application/json, text/plain, */*")
  STATUS=$(echo $STATUS_RESPONSE | grep -o '"status":"[^"]*' | cut -d':' -f2 | tr -d '"')
  echo "Status: $STATUS"
  [ "$STATUS" = "FAILED" ] && echo "❌ Export failed." && exit 1
  [ "$STATUS" != "COMPLETED" ] && sleep 5
done

DOWNLOAD_URI=$(echo $STATUS_RESPONSE | grep -o '"download_uri":"[^"]*' | cut -d':' -f2- | sed 's/^"//' | sed 's/"$//' | sed 's/\\//g')
echo "⬇️ Downloading ZIP..."
curl -L "$DOWNLOAD_URI" --output $FILENAME

echo "πŸ“‚ Unzipping..."
unzip $FILENAME -d exported_dataset

echo "βœ… Done!"

Python code is available as well here: https://github.com/visual-layer/documentation/blob/main/notebooks/Export%20via%20api/api_simplified_python.ipynb

πŸ“‚ Working with Exported Dataset (Filter by Uniqueness Score)

After exporting your dataset from Visual Layer, you will receive a ZIP file (e.g., exported_dataset.zip).
This tutorial walks you through:

  1. πŸ”“ Unzipping the file
  2. 🧠 Filtering the dataset using uniqueness_score
  3. πŸ’Ύ Saving the representative subset

βœ… Step 1: Unzip the Exported Dataset

After downloading the export:

unzip exported_dataset.zip -d exported_dataset

You will get a directory like:

exported_dataset/
β”œβ”€β”€ metadata.parquet
└── images/
    β”œβ”€β”€ image1.jpg
    β”œβ”€β”€ image2.jpg
    └── ...

πŸ“Š Step 2: Filter by Uniqueness Score

You can now use a simple Python script to sort and select the most unique images.

🐍 Minimal Python Code:

import pandas as pd

# Load the metadata file
df = pd.read_parquet("exported_dataset/metadata.parquet")

# Sort by uniqueness_score (most unique first)
df_sorted = df.sort_values(by="uniqueness_score", ascending=False)

# Select top 100 most unique images
top_unique_images = df_sorted.head(100)

# Print image filenames
print(top_unique_images["image_filename"].tolist())

# Save the filtered list to a CSV file
top_unique_images.to_csv("top_unique_images.csv", index=False)

πŸ“ Optional: Copy Selected Images

If you want to organize these images in a separate folder:

import os
import shutil

os.makedirs("top_unique_images", exist_ok=True)

for fname in top_unique_images["image_filename"]:
    src = os.path.join("exported_dataset/images", fname)
    dst = os.path.join("top_unique_images", fname)
    shutil.copy(src, dst)

βœ… Summary

You’ve now:

  • Unzipped the dataset
  • Extracted a high-quality representative subset based on uniqueness_score
  • Organized the files for downstream use

This process helps you work efficiently with smaller, high-value subsets of your visual data.