> ## Documentation Index
> Fetch the complete documentation index at: https://docs.visual-layer.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Exporting a Dataset

> Export dataset metadata and media using the Visual Layer API—supports full dataset export and selective export by cluster or media ID.

<Card title="How This Helps" icon="hand-platter">
  Export your dataset metadata, labels, and media files for use in training pipelines, annotation tools, or downstream analysis. Both full and selective exports are supported.
</Card>

## Prerequisites

* A dataset in `READY` status.
* A dataset ID (visible in the browser URL when viewing a dataset: `https://app.visual-layer.com/dataset/<dataset_id>/data`).
* A valid JWT token. See [Authentication](/api-reference/authentication).

***

## Full Dataset Export

Export all media and metadata for an entire dataset. This is an asynchronous operation — initiate the export, poll for completion, then download the result.

### Step 1: Initiate Export

```http theme={"theme":"monokai"}
GET /api/v1/dataset/{dataset_id}/export_context_async
Authorization: Bearer <jwt>
```

### Parameters

| Parameter        | Type    | Required | Description                                                           |
| ---------------- | ------- | -------- | --------------------------------------------------------------------- |
| `file_name`      | string  | Yes      | Name for the output ZIP file.                                         |
| `export_format`  | string  | Yes      | `json` or `parquet`.                                                  |
| `include_images` | boolean | No       | Set to `true` to include image files in the export. Default: `false`. |

### Example

```bash theme={"theme":"monokai"}
curl -G \
  -H "Authorization: Bearer <jwt>" \
  -H "Accept: application/json" \
  --data-urlencode "file_name=export.zip" \
  --data-urlencode "export_format=json" \
  --data-urlencode "include_images=false" \
  "https://app.visual-layer.com/api/v1/dataset/<dataset_id>/export_context_async"
```

### Response

```json theme={"theme":"monokai"}
{
  "id": "fdb84834-d19b-4797-861a-d48b7a16f908"
}
```

Save the `id` — you need it to poll export status.

### Step 2: Poll Export Status

```bash theme={"theme":"monokai"}
curl -H "Authorization: Bearer <jwt>" \
  "https://app.visual-layer.com/api/v1/dataset/<dataset_id>/export_status?export_task_id=<task_id>"
```

### Status Response

```json theme={"theme":"monokai"}
{
  "status": "COMPLETED",
  "download_uri": "https://s3.amazonaws.com/.../export.zip?..."
}
```

Poll until `status` is `COMPLETED` or `FAILED`.

### Step 3: Download

```bash theme={"theme":"monokai"}
curl -L "<download_uri>" --output export.zip
```

Use `-L` to follow S3 redirects.

***

## Selective Export

Export specific media items or entire clusters using `POST /api/v1/dataset/{dataset_id}/export_entities_async`. This is useful for exporting only the results of a search or filter operation.

```http theme={"theme":"monokai"}
POST /api/v1/dataset/{dataset_id}/export_entities_async
Authorization: Bearer <jwt>
Content-Type: application/json
```

### Query Parameters

| Parameter        | Type    | Description                        |
| ---------------- | ------- | ---------------------------------- |
| `export_format`  | string  | `json` or `parquet`.               |
| `include_images` | boolean | Include image files in the export. |

### Export by Media IDs

`media_ids` are returned in the `media_id` field of any Explore endpoint response (visual search, semantic search, or duplicate retrieval results).

```bash theme={"theme":"monokai"}
curl -X POST \
  -H "Authorization: Bearer <jwt>" \
  -H "Content-Type: application/json" \
  -d '{
    "media_selection": [
      {
        "type": "media",
        "payload": {
          "media_ids": ["9e8da312-d954-4844-afc7-357c458c5b03"]
        }
      }
    ]
  }' \
  "https://app.visual-layer.com/api/v1/dataset/<dataset_id>/export_entities_async?include_images=true&export_format=json"
```

### Export by Cluster

`cluster_id` values are returned in the `cluster_id` field of Explore endpoint responses and are visible in the Visual Layer UI when browsing clusters.

```bash theme={"theme":"monokai"}
curl -X POST \
  -H "Authorization: Bearer <jwt>" \
  -H "Content-Type: application/json" \
  -d '{
    "media_selection": [
      {
        "type": "cluster",
        "payload": {
          "cluster_id": "4e0e4d51-0fef-4fe1-a8ec-1a82b6f4880b",
          "cluster_filters": {
            "entity_type": "IMAGES"
          },
          "exclude_media_ids": []
        }
      }
    ]
  }' \
  "https://app.visual-layer.com/api/v1/dataset/<dataset_id>/export_entities_async?include_images=true&export_format=json"
```

Both return the same response format as the full export — an `id` to use for polling status.

***

## Full Automation Script

The following script handles the complete export workflow: initiate, poll, download, and extract.

```bash theme={"theme":"monokai"}
#!/bin/bash

DATASET_ID="your-dataset-id"
JWT_TOKEN="your-jwt-token"
FILENAME="export.zip"

echo "Initiating export..."
EXPORT_TASK_RESPONSE=$(curl -s -G \
  -H "Authorization: Bearer $JWT_TOKEN" \
  -H "Accept: application/json" \
  --data-urlencode "file_name=$FILENAME" \
  --data-urlencode "export_format=json" \
  --data-urlencode "include_images=false" \
  "https://app.visual-layer.com/api/v1/dataset/$DATASET_ID/export_context_async")

EXPORT_TASK_ID=$(echo $EXPORT_TASK_RESPONSE | grep -o '"id":"[^"]*' | cut -d':' -f2 | tr -d '"')
echo "Export Task ID: $EXPORT_TASK_ID"

echo "Polling for completion..."
STATUS="PENDING"
while [ "$STATUS" != "COMPLETED" ]; do
  STATUS_RESPONSE=$(curl -s \
    -H "Authorization: Bearer $JWT_TOKEN" \
    "https://app.visual-layer.com/api/v1/dataset/$DATASET_ID/export_status?export_task_id=$EXPORT_TASK_ID")
  STATUS=$(echo $STATUS_RESPONSE | grep -o '"status":"[^"]*' | cut -d':' -f2 | tr -d '"')
  echo "  Status: $STATUS"
  [ "$STATUS" = "FAILED" ] && echo "Export failed." && exit 1
  [ "$STATUS" != "COMPLETED" ] && sleep 5
done

DOWNLOAD_URI=$(echo $STATUS_RESPONSE | grep -o '"download_uri":"[^"]*' | cut -d':' -f2- | sed 's/^"//' | sed 's/"$//' | sed 's/\\//g')
echo "Downloading..."
curl -L "$DOWNLOAD_URI" --output $FILENAME

echo "Extracting..."
unzip $FILENAME -d exported_dataset

echo "Done."
```

***

## Working with Exported Data

After extraction, the archive contains a metadata file (Parquet or JSON) and optionally an `images/` folder.

### Filter by Uniqueness Score

```python theme={"theme":"monokai"}
import pandas as pd

df = pd.read_parquet("exported_dataset/metadata.parquet")
top_unique = df.sort_values(by="uniqueness_score", ascending=False).head(100)
top_unique.to_csv("top_unique_images.csv", index=False)
```

### Copy Filtered Images

```python theme={"theme":"monokai"}
import os, shutil

os.makedirs("top_unique_images", exist_ok=True)
for fname in top_unique["image_filename"]:
    shutil.copy(f"exported_dataset/images/{fname}", f"top_unique_images/{fname}")
```

***

## Response Codes

See [Error Handling](/api-reference/errors) for the error response format and Python handling patterns.

| HTTP Code | Meaning                              |
| --------- | ------------------------------------ |
| **200**   | Export task ID returned.             |
| **401**   | Unauthorized — check your JWT token. |
| **404**   | Dataset not found.                   |
| **409**   | Dataset is not in `READY` status.    |

***

## Related Resources

<CardGroup cols={2}>
  <Card title="Semantic Search" icon="scan-text" href="/api-reference/semantic-search">
    Use VQL filters to find specific content before exporting.
  </Card>

  <Card title="Share a Dataset" icon="share-2" href="/api-reference/share-a-dataset-with-another-user">
    Grant teammates access to your dataset.
  </Card>
</CardGroup>
