Skip to main content
Annotations are metadata labels that describe and categorize images or objects within images. They structure visual datasets and enable search, analysis, and AI training workflows.
Annotations must be uploaded when creating a dataset and cannot be added later.

Why Use Annotations

Annotations improve data organization and model accuracy by providing structured labels for images and objects. They enable:
  • Efficient data retrieval through filtering and search.
  • Enhanced search capabilities within large datasets.
  • Object detection and classification tasks.
  • Improved model accuracy with high-quality labeled data.

Common Annotation Types

Visual Layer supports two primary annotation types:
  • Image Annotations: Assign class labels to entire images and categorize datasets by content.
  • Object Annotations: Label individual objects within images using bounding boxes and improve model accuracy.

Supported Formats

Visual Layer accepts annotation files in the following formats: File Formats:
  • Parquet and CSV for structured image and object annotations
  • JSON for COCO-format annotations
  • YOLO format with conversion
  • Segmentation masks with conversion
  • Custom folder-based structures with conversion
File Naming Requirements: Your annotation file must be named exactly as one of the following:
  • annotations.json
  • image_annotations.csv
  • object_annotations.csv
  • image_annotations.parquet
  • object_annotations.parquet
Visual Layer cannot process files that do not match these exact names.

Preparing Annotation Files

This section explains how to structure your annotation files for Visual Layer import.

Image Annotations

For full-image class labels, create a file named image_annotations.csv or image_annotations.parquet. Each row represents an image and its corresponding label. Format:
filenamelabel
IDX_DF_SIG21341_PlasmasNeg.pngIDX_DF
IDX_DF_ALM00324_PlasmasPos.pngIDX_DF
folder/IDX_RC_ALM04559_PlasmasNeg.pngIDX_RC
Requirements:
  • The filename column must contain relative paths
  • The label column assigns a class to the entire image
  • Multiple labels can be stored as a list: ['t-shirt', 'SKU12345']
  • You may include a caption column for textual metadata
Example with Multiple Labels:
filenamelabel
cool-tshirt.png[“t-shirt”, “SKU12345”]
cool-pants.jpg[“pants”, “SKU231312”]

Object Annotations

For object-level annotations, create a file named object_annotations.csv or object_annotations.parquet. Each row represents a detected object with bounding box coordinates and class label. Format:
filenamecol_xrow_ywidthheightlabel
Kitti/raw/training/image_2/006149.png0240135133Car
Kitti/raw/training/image_2/006149.png6081695943Car
Requirements:
  • col_x and row_y define the top-left corner of the bounding box
  • width and height must be greater than zero
  • Each row corresponds to a single object within an image

JSON Annotations

Visual Layer supports COCO-format JSON annotations. Ensure the file is named annotations.json. Example Format:
{
    "images": [
        { "id": 1, "width": 640, "height": 480, "file_name": "image1.jpg" },
        { "id": 2, "width": 800, "height": 600, "file_name": "image2.jpg" }
    ],
    "categories": [
        { "id": 1, "name": "cat" },
        { "id": 2, "name": "dog" },
        { "id": 3, "name": "t-rex" }
    ],
    "annotations": [
        { "id": 1, "image_id": 1, "category_id": 1, "bbox": [100, 100, 200, 200] },
        { "id": 2, "image_id": 2, "category_id": 2, "bbox": [50, 50, 150, 150] },
        { "id": 3, "image_id": 1, "category_id": 3 },
        { "id": 4, "image_id": 2, "category_id": 3 }
    ]
}
Requirements:
  • Bounding boxes follow the format [col_x, row_y, width, height]
  • col_x and row_y define the top-left corner
  • width and height must be greater than zero
  • Remove all comments before uploading

Understanding Bounding Box Formats

Different annotation tools and datasets use different bounding box coordinate systems. This section compares common formats and shows you how to convert them to Visual Layer’s format.

Visual Layer Format

Visual Layer uses CSV format with the following structure:
Column NameDescription
filenameThe name of the image file containing the object
col_xThe x-coordinate (horizontal position) of the top-left corner
row_yThe y-coordinate (vertical position) of the top-left corner
widthThe width of the bounding box, extending from col_x
heightThe height of the bounding box, extending from row_y
labelThe class or category of the detected object
Example:
filename,col_x,row_y,width,height,label
image1.jpg,50,30,200,150,car
image2.jpg,120,60,80,100,person
image3.jpg,15,10,50,70,dog

Common Format Comparison

FormatRepresentationNormalized?File Type
Visual Layer[col_x, row_y, width, height]No.csv
COCO[x_min, y_min, width, height]No.json
VOC(x_min, y_min, x_max, y_max)No.xml
YOLO[x_center, y_center, width, height]Yes.txt
TFRecord(y_min, x_min, y_max, x_max)Yes.tfrecord
LabelMe[[x_min, y_min], [x_max, y_max]]No.json
Each format is optimized for different use cases. COCO and VOC are widely used in academic datasets, YOLO for real-time detection, TFRecord for TensorFlow-based training, and LabelMe for manual annotations.

Converting from Other Formats

If your annotations use a different format, you can convert them to Visual Layer’s format using the scripts and guides below.

Converting YOLO Annotations

YOLO format stores annotations as normalized center coordinates. Each text file corresponds to an image and contains lines in the format:
<class_id> <norm_cx> <norm_cy> <norm_w> <norm_h>
Example:
0 0.5869140625 0.2412109375 0.021484375 0.044921875
0 0.8974609375 0.185546875 0.044921875 0.1015625
Conversion Process: The script converts normalized coordinates to absolute pixel values and calculates top-left coordinates:
top_left_x = (center_x - width/2) * image_width
top_left_y = (center_y - height/2) * image_height
Python Conversion Script:
import os
import csv
import cv2

# Paths to your folders
labels_folder = "output/labels"
images_folder = "output/images"
output_csv = "annotations.csv"

# Open CSV for writing
with open(output_csv, "w", newline="") as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(["filename", "col_x", "row_y", "width", "height", "label"])

    # Process each label file
    for label_file in os.listdir(labels_folder):
        if label_file.endswith(".txt"):
            image_base = os.path.splitext(label_file)[0]
            image_filename = image_base + ".png"
            image_path = os.path.join(images_folder, image_filename)
            label_file_path = os.path.join(labels_folder, label_file)

            if not os.path.exists(image_path):
                print(f"Image not found for {label_file}")
                continue

            # Read image to get dimensions
            image = cv2.imread(image_path)
            if image is None:
                print(f"Failed to load image {image_path}")
                continue

            h_img, w_img = image.shape[:2]

            # Open and read the label file
            with open(label_file_path, "r") as f:
                lines = f.readlines()
                for line in lines:
                    parts = line.strip().split()
                    if len(parts) != 5:
                        continue

                    class_id, norm_cx, norm_cy, norm_w, norm_h = map(float, parts)

                    # Convert normalized values to absolute pixel values
                    abs_cx = norm_cx * w_img
                    abs_cy = norm_cy * h_img
                    abs_w = norm_w * w_img
                    abs_h = norm_h * h_img

                    # Calculate the top-left corner coordinates
                    top_left_x = abs_cx - abs_w / 2
                    top_left_y = abs_cy - abs_h / 2

                    label_str = "ship" if class_id == 0 else str(int(class_id))

                    writer.writerow([
                        image_filename,
                        int(top_left_x),
                        int(top_left_y),
                        int(abs_w),
                        int(abs_h),
                        label_str
                    ])

Converting Segmentation Masks

Segmentation masks use polygon coordinates to define object boundaries. You can convert these to bounding boxes by finding the minimum and maximum x,y values. Example Segmentation Mask Format:
{
  "version": "4.5.6",
  "shapes": [
    {
      "label": "QSBD",
      "points": [
        [64, 10],
        [64, 15],
        [67, 15],
        [68, 14],
        [68, 10]
      ],
      "shape_type": "polygon"
    }
  ],
  "imagePath": "example_image.png"
}
Conversion Logic:
min_x = minimum x-coordinate from all points
max_x = maximum x-coordinate from all points
min_y = minimum y-coordinate from all points
max_y = maximum y-coordinate from all points

col_x = min_x (left edge)
row_y = min_y (top edge)
width = max_x - min_x
height = max_y - min_y
Python Conversion Script:
import json
import csv
import os
from pathlib import Path
from typing import List, Tuple, Dict, Any

def extract_polygon_points(shape: Dict[str, Any]) -> List[Tuple[int, int]]:
    """Extract polygon points from a shape annotation."""
    if 'points' not in shape:
        return []

    points = []
    for point in shape['points']:
        if len(point) >= 2:
            x, y = int(point[0]), int(point[1])
            points.append((x, y))

    return points

def polygon_to_bbox(points: List[Tuple[int, int]]) -> Tuple[int, int, int, int]:
    """Convert polygon points to bounding box coordinates."""
    if not points:
        return (0, 0, 0, 0)

    x_coords = [point[0] for point in points]
    y_coords = [point[1] for point in points]

    min_x = min(x_coords)
    max_x = max(x_coords)
    min_y = min(y_coords)
    max_y = max(y_coords)

    col_x = min_x
    row_y = min_y
    width = max_x - min_x
    height = max_y - min_y

    return (col_x, row_y, width, height)

def process_json_file(json_path: Path) -> List[Dict[str, Any]]:
    """Process a JSON annotation file and extract bounding boxes."""
    try:
        with open(json_path, 'r', encoding='utf-8') as f:
            data = json.load(f)
    except (json.JSONDecodeError, FileNotFoundError) as e:
        print(f"Error reading {json_path}: {e}")
        return []

    if 'shapes' not in data:
        print(f"No 'shapes' key found in {json_path}")
        return []

    image_filename = data.get('imagePath', json_path.stem + '.png')

    bboxes = []
    for shape in data['shapes']:
        if shape.get('shape_type') != 'polygon':
            continue

        label = shape.get('label', 'unknown')
        points = extract_polygon_points(shape)

        if not points:
            continue

        col_x, row_y, width, height = polygon_to_bbox(points)

        if width <= 0 or height <= 0:
            continue

        bbox_info = {
            'filename': image_filename,
            'col_x': col_x,
            'row_y': row_y,
            'width': width,
            'height': height,
            'label': label
        }
        bboxes.append(bbox_info)

    return bboxes

# Paths to your folders
annotations_folder = "annotations"
output_csv = "segmentation_annotations.csv"

# Process all JSON files and convert to CSV
all_bboxes = []
json_files = list(Path(annotations_folder).glob('*.json'))

for json_file in json_files:
    print(f"Processing: {json_file.name}")
    bboxes = process_json_file(json_file)
    all_bboxes.extend(bboxes)
    print(f"  Found {len(bboxes)} bounding boxes")

# Write to CSV
fieldnames = ['filename', 'col_x', 'row_y', 'width', 'height', 'label']

with open(output_csv, 'w', newline='', encoding='utf-8') as csvfile:
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    writer.writeheader()
    writer.writerows(all_bboxes)

print(f"Conversion complete! Output saved to: {output_csv}")
print(f"Total bounding boxes: {len(all_bboxes)}")

Creating Annotations from Folder Structure

If your images are organized in folders where each subfolder name represents the class label, you can generate annotation files automatically. Folder Structure:
dataset/
  ├── Ulcer/
  │   ├── image1.bmp
  │   └── image2.bmp
  ├── Normal/
  │   ├── image3.bmp
  │   └── image4.bmp
  └── AVM/
      ├── image5.bmp
      └── image6.bmp
Python Script:
import os
import csv

def create_annotation_csv(root_dir, output_csv):
    rows = []

    for subdir in os.listdir(root_dir):
        subdir_path = os.path.join(root_dir, subdir)
        if os.path.isdir(subdir_path):
            label = subdir
            for filename in os.listdir(subdir_path):
                file_path = os.path.join(subdir_path, filename)
                if os.path.isfile(file_path):
                    relative_path = os.path.join(subdir, filename)
                    rows.append([relative_path, label])

    with open(output_csv, mode='w', newline='') as csv_file:
        writer = csv.writer(csv_file)
        writer.writerow(['filename', 'label'])
        writer.writerows(rows)
    print(f"CSV file '{output_csv}' created successfully.")

if __name__ == "__main__":
    root_directory = "."
    output_csv_file = "image_annotations.csv"
    create_annotation_csv(root_directory, output_csv_file)
Example Output:
filename,label
Ulcer/Ulcer_2024-08-07-08-28-10_81061.bmp,Ulcer
Ulcer/Ulcer_2024-08-07-08-29-37_82025.bmp,Ulcer
Normal/Normal_2024-08-07-08-30-15_12345.bmp,Normal
AVM/AVM_2024-08-07-08-31-22_54321.bmp,AVM

Importing Annotations into Visual Layer

Once your annotation file is properly formatted, you can import it during dataset creation. Steps:
  1. Upload your annotation file during dataset creation.
  2. Files can be uploaded from your local machine or S3 bucket.
  3. Ensure your file follows the required format and has the correct name.
The annotation file must be uploaded at the same time as your images. Visual Layer will process the annotations and make them available for filtering, search, and analysis.

Reusing Caption Data

Caption generation is one of the most time-consuming operations in Visual Layer’s dataset pipeline. When creating multiple datasets with the same images, you can extract and reuse caption data from previous pipeline runs.

Benefits

Reusing caption data allows you to:
  • Skip caption generation on subsequent dataset creations.
  • Maintain consistent captions across multiple datasets.
  • Reduce processing time significantly.
This approach is ideal when you need to create multiple datasets or dataset versions using the same images but with different configurations.

How Caption Data Is Stored

After running a dataset pipeline, Visual Layer stores processed data in:
/.vl/tmp/[dataset-id]/input/metadata/image_annotations.parquet
This parquet file contains all the caption data you need to reuse.

Extraction Process

The extraction script processes Visual Layer’s internal parquet files to create a clean annotation file:
  1. Extracts relevant columns: filename and caption
  2. Removes system paths like /hostfs, /mnt, etc.
  3. Creates relative paths by converting absolute paths to relative filenames
  4. Outputs clean parquet file named image_annotations.parquet
Script Location:

View Complete Script Code

The complete Python script is available in the Useful Scripts guide. Click here to view and copy the code.

Workflow

Step 1: Create Initial Dataset Create your first dataset with captioning enabled. After the pipeline completes, locate the parquet file:
# List recent datasets
ls -lt /.vl/tmp/

# Navigate to your dataset's metadata
cd /.vl/tmp/[your-dataset-id]/input/metadata/

# Verify the file exists
ls image_annotations.parquet
Step 2: Run the Extraction Script Process the parquet file to extract captions:
# Basic usage
python3 process_annotations.py /.vl/tmp/[dataset-id]/input/metadata/image_annotations.parquet

# Specify custom output location
python3 process_annotations.py /.vl/tmp/[dataset-id]/input/metadata/image_annotations.parquet \
  -o /path/to/new-dataset/image_annotations.parquet

# Custom prefix removal
python3 process_annotations.py input.parquet --prefix /custom/prefix/to/remove
Step 3: Copy to New Dataset Directory Place the extracted parquet file in your new dataset directory alongside the images:
# Copy to new dataset location
cp image_annotations_processed.parquet /path/to/new-dataset/image_annotations.parquet
The parquet file must be named exactly image_annotations.parquet for Visual Layer to recognize it.
Step 4: Create New Dataset Create your new dataset. Visual Layer will detect the existing image_annotations.parquet file and use the provided captions, completing much faster.

Understanding Relative Paths

Filenames in the parquet file must be relative to the dataset directory location. Visual Layer looks for images relative to where the image_annotations.parquet file is located. Correct - Relative Paths:
Dataset directory: /any/path/dataset/
  ├── image_annotations.parquet
  ├── dog_1.jpg
  ├── dog_2.jpg
  └── dog_3.jpg

Filenames in parquet:
  - dog_1.jpg
  - dog_2.jpg
  - dog_3.jpg
With Subdirectory:
Dataset directory: /any/path/dataset/
  ├── image_annotations.parquet
  └── images/
      ├── dog_1.jpg
      ├── dog_2.jpg
      └── dog_3.jpg

Filenames in parquet:
  - images/dog_1.jpg
  - images/dog_2.jpg
  - images/dog_3.jpg

Troubleshooting

Images Not Found: If Visual Layer cannot find your images, verify:
  1. Parquet file is in the same directory as images
  2. Filenames match exactly (case-sensitive)
  3. Paths in parquet are relative, not absolute
Captions Not Being Used: If Visual Layer is still generating captions:
  1. Verify filename is exactly image_annotations.parquet
  2. Ensure file is in the correct location relative to images
  3. Check that parquet file has both filename and caption columns

Next Steps

Now that you understand how to import annotations, you can create and explore datasets with rich metadata.