> ## Documentation Index
> Fetch the complete documentation index at: https://docs.visual-layer.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Create a Dataset from an S3 Bucket

> Create a new Visual Layer dataset by pointing to a public or private S3 bucket using the cloud API.

<Card title="How This Helps" icon="hand-platter">
  Creating a dataset from S3 is the recommended approach for production workflows. Point the API at your S3 bucket path and Visual Layer handles ingestion, indexing, and clustering automatically.
</Card>

<Note>
  Use `status_new` for all status checks. The `status` field is being retired. See [Retrieve Dataset Status](/api-reference/retrieve-dataset-status).
</Note>

## Prerequisites

* A Visual Layer Cloud account with API access.
* A valid JWT token. See [Authentication](/api-reference/authentication).
* An S3 bucket containing your images or videos, accessible to Visual Layer.

***

## Create a Dataset from S3

Send a `POST` request with your bucket path to create a new dataset.

```http theme={"theme":"monokai"}
POST /api/v1/dataset
Authorization: Bearer <jwt>
Content-Type: multipart/form-data
```

### Parameters

| Parameter      | Type   | Required | Description                                                  |
| -------------- | ------ | -------- | ------------------------------------------------------------ |
| `dataset_name` | string | Yes      | The display name for the new dataset.                        |
| `bucket_path`  | string | Yes      | S3 path to the bucket or folder containing your media files. |

### Example

```bash theme={"theme":"monokai"}
curl -X POST \
  -H "Authorization: Bearer <jwt>" \
  -F "dataset_name=my_dataset" \
  -F "bucket_path=s3://my-bucket/images/" \
  "https://app.visual-layer.com/api/v1/dataset"
```

### Response

```json theme={"theme":"monokai"}
{
  "dataset_id": "ad48d250-1232-11f1-bfca-fa39f6ed1f22"
}
```

Save the `dataset_id` — you need it for all subsequent operations on this dataset.

<Tip>
  Dataset creation is asynchronous. After the initial request, poll `GET /api/v1/dataset/{dataset_id}` until `status_new` is `READY` before running search or export operations.
</Tip>

***

## Monitor Dataset Status

Poll the dataset status endpoint to track progress.

```bash theme={"theme":"monokai"}
curl -H "Authorization: Bearer <jwt>" \
  "https://app.visual-layer.com/api/v1/dataset/<dataset_id>"
```

The response includes a `status_new` field that transitions from `INDEXING` to `READY` when complete. See [Retrieve Dataset Status](/api-reference/retrieve-dataset-status) for full status documentation.

***

## Python Example

```python theme={"theme":"monokai"}
import requests
import time

VL_BASE_URL = "https://app.visual-layer.com"
JWT_TOKEN = "<your-jwt-token>"

headers = {"Authorization": f"Bearer {JWT_TOKEN}"}

# Step 1: Create dataset from S3
resp = requests.post(
    f"{VL_BASE_URL}/api/v1/dataset",
    headers=headers,
    data={
        "dataset_name": "my_dataset",
        "bucket_path": "s3://my-bucket/images/",
    },
)
resp.raise_for_status()
dataset_id = resp.json()["dataset_id"]
print(f"Created dataset: {dataset_id}")

# Step 2: Poll until READY
while True:
    resp = requests.get(
        f"{VL_BASE_URL}/api/v1/dataset/{dataset_id}",
        headers=headers,
    )
    resp.raise_for_status()
    data = resp.json()
    status = data.get("status_new")
    progress = data.get("progress", 0)
    print(f"  Status: {status} ({progress}%)")
    if status in ("READY", "ERROR"):
        break
    time.sleep(30)

print(f"Dataset ready: {dataset_id}")
```

***

## Response Codes

See [Error Handling](/api-reference/errors) for the error response format and Python handling patterns.

| HTTP Code | Meaning                                                       |
| --------- | ------------------------------------------------------------- |
| **200**   | Dataset created successfully.                                 |
| **400**   | Bad Request — missing or invalid parameters.                  |
| **401**   | Unauthorized — check your JWT token.                          |
| **500**   | Internal Server Error — check that the S3 path is accessible. |

***

## Related Resources

<CardGroup cols={2}>
  <Card title="Retrieve Dataset Status" icon="circle-check" href="/api-reference/retrieve-dataset-status">
    Poll dataset status until processing completes.
  </Card>

  <Card title="Add Media to an Existing Dataset" icon="database" href="/api-reference/add-media-to-existing-dataset">
    Incrementally add new media to an indexed dataset.
  </Card>
</CardGroup>
