Guides

Preparing Annotation Data File

Visual Layer Bounding Box Annotation Format

Overview

The Visual Layer bounding box annotation format is stored in CSV format, making it easy to process and integrate with various machine learning workflows. Each row represents a single bounding box annotation for an object within an image.

CSV Format Structure

Column NameDescription
filenameThe name of the image file containing the object.
col_xThe x-coordinate (horizontal position) of the top-left corner of the bounding box.
row_yThe y-coordinate (vertical position) of the top-left corner of the bounding box.
widthThe width of the bounding box, extending from col_x.
heightThe height of the bounding box, extending from row_y.
labelThe class or category of the detected object.

Example Annotation

filename,col_x,row_y,width,height,label
image1.jpg,50,30,200,150,car
image2.jpg,120,60,80,100,person
image3.jpg,15,10,50,70,dog

The top 5 common annotation formats for bounding boxes in image datasets include:

1. COCO (Common Objects in Context) Format

  • Bounding Box Format: [x_min, y_min, width, height]
  • Details: Uses JSON format with a dictionary structure. The bounding box is represented by the top-left (x_min, y_min) coordinate, along with width and height.
  • Example:
    {
      "bbox": [100, 150, 200, 300]
    }
    
  • Used In: Microsoft COCO dataset.

2. PASCAL VOC (Visual Object Classes) Format

  • Bounding Box Format: (x_min, y_min, x_max, y_max)
  • Details: Stored in XML files using a <bndbox> tag. The bounding box is represented using absolute pixel values for top-left (x_min, y_min) and bottom-right (x_max, y_max) coordinates.
  • Example:
    <bndbox>
      <xmin>100</xmin>
      <ymin>150</ymin>
      <xmax>300</xmax>
      <ymax>450</ymax>
    </bndbox>
    
  • Used In: VOC 2012 dataset.

3. YOLO (You Only Look Once) Format

  • Bounding Box Format: [x_center, y_center, width, height], normalized (0 to 1 scale)
  • Details: Uses plain text .txt files where each row represents an object as class_id x_center y_center width height, with values normalized relative to the image width and height.
  • Example:
    0 0.5 0.6 0.25 0.4
    
  • Used In: YOLO models and datasets.

4. TFRecord / TensorFlow Object Detection API Format

  • Bounding Box Format: Normalized (y_min, x_min, y_max, x_max) (0 to 1 scale)
  • Details: Uses TFRecord format to store images and annotations in protocol buffers (Protobuf). Bounding boxes are stored as normalized float values within .tfrecord files.
  • Example (within a .record file, usually not human-readable):
    {
      "ymin": 0.25,
      "xmin": 0.20,
      "ymax": 0.75,
      "xmax": 0.80
    }
    
  • Used In: TensorFlow Object Detection API.

5. LabelMe Format

  • Bounding Box Format: (x_min, y_min, x_max, y_max)
  • Details: Uses JSON files similar to COCO but with a different structure, including polygon annotations. Bounding boxes are stored as [x_min, y_min, x_max, y_max].
  • Example:
    {
      "shapes": [
        {
          "label": "dog",
          "points": [[100, 150], [300, 450]],
          "shape_type": "rectangle"
        }
      ]
    }
    
  • Used In: LabelMe annotation tool.

Summary of Bounding Box Formats

FormatRepresentationNormalized?File Type
COCO[x_min, y_min, width, height]No.json
VOC(x_min, y_min, x_max, y_max)No.xml
YOLO[x_center, y_center, width, height]Yes.txt
TFRecord(y_min, x_min, y_max, x_max)Yes.tfrecord
LabelMe[[x_min, y_min], [x_max, y_max]]No.json

Each format is optimized for different use cases, with COCO and VOC being the most widely used in academic datasets, YOLO for real-time detection, TFRecord for TensorFlow-based training, and LabelMe for manual annotations.

Converting to Visual Layer format

  1. Convert from rom VOC2012 to Visual Layer Object Detection

📘 VL Bounding Boxes

  1. Convert from rom Coco to Visual Layer Object Detection

📘 VL Bounding Boxes