Overview

The Visual Layer bounding box annotation format is stored in CSV format, making it easy to process and integrate with various machine learning workflows. Each row represents a single bounding box annotation for an object within an image.

CSV Format Structure

Column NameDescription
filenameThe name of the image file containing the object.
col_xThe x-coordinate (horizontal position) of the top-left corner of the bounding box.
row_yThe y-coordinate (vertical position) of the top-left corner of the bounding box.
widthThe width of the bounding box, extending from col_x.
heightThe height of the bounding box, extending from row_y.
labelThe class or category of the detected object.

Example Annotation

filename,col_x,row_y,width,height,label
image1.jpg,50,30,200,150,car
image2.jpg,120,60,80,100,person
image3.jpg,15,10,50,70,dog

The top 5 common annotation formats for bounding boxes in image datasets include:

1.COCO (Common Objects in Context) Format

  • Bounding Box Format: [x_min, y_min, width, height]
  • Details: Uses JSON format with a dictionary structure. The bounding box is represented by the top-left (x_min, y_min) coordinate, along with width and height.
  • Example:
    {
      "bbox": [100, 150, 200, 300]
    }
    
  • Used In: Microsoft COCO dataset.

2.PASCAL VOC (Visual Object Classes) Format

  • Bounding Box Format: (x_min, y_min, x_max, y_max)
  • Details: Stored in XML files using a <bndbox> tag. The bounding box is represented using absolute pixel values for top-left (x_min, y_min) and bottom-right (x_max, y_max) coordinates.
  • Example:
    <bndbox>
      <xmin>100</xmin>
      <ymin>150</ymin>
      <xmax>300</xmax>
      <ymax>450</ymax>
    </bndbox>
    
  • Used In: VOC 2012 dataset.

3.YOLO (You Only Look Once) Format

  • Bounding Box Format: [x_center, y_center, width, height], normalized (0 to 1 scale)
  • Details: Uses plain text .txt files where each row represents an object as class_id x_center y_center width height, with values normalized relative to the image width and height.
  • Example:
    0 0.5 0.6 0.25 0.4
    
  • Used In: YOLO models and datasets.

4.TFRecord / TensorFlow Object Detection API Format

  • Bounding Box Format: Normalized (y_min, x_min, y_max, x_max) (0 to 1 scale)
  • Details: Uses TFRecord format to store images and annotations in protocol buffers (Protobuf). Bounding boxes are stored as normalized float values within .tfrecord files.
  • Example(within a .record file, usually not human-readable):
    {
      "ymin": 0.25,
      "xmin": 0.20,
      "ymax": 0.75,
      "xmax": 0.80
    }
    
  • Used In: TensorFlow Object Detection API.

5.LabelMe Format

  • Bounding Box Format: (x_min, y_min, x_max, y_max)
  • Details: Uses JSON files similar to COCO but with a different structure, including polygon annotations. Bounding boxes are stored as [x_min, y_min, x_max, y_max].
  • Example:
    {
      "shapes": [
        {
          "label": "dog",
          "points": [[100, 150], [300, 450]],
          "shape_type": "rectangle"
        }
      ]
    }
    
  • Used In: LabelMe annotation tool.

###Summary of Bounding Box Formats

FormatRepresentationNormalized?File Type
COCO[x_min, y_min, width, height]No.json
VOC(x_min, y_min, x_max, y_max)No.xml
YOLO[x_center, y_center, width, height]Yes.txt
TFRecord(y_min, x_min, y_max, x_max)Yes.tfrecord
LabelMe[[x_min, y_min], [x_max, y_max]]No.json

Each format is optimized for different use cases, with COCO and VOC being the most widely used in academic datasets, YOLO for real-time detection, TFRecord for TensorFlow-based training, and LabelMe for manual annotations.

Converting to Visual Layer format

  1. Convert from rom VOC2012 to Visual Layer Object Detection

📘 VL Bounding Boxes

  1. Convert from rom Coco to Visual Layer Object Detection

📘 VL Bounding Boxes