Converting YOLO annotations to visual layer format
This guide explains how to convert a Kaggle Masati dataset into a CSV file that conforms to Visual Layer’s object annotation format.
In this dataset, the images and labels are organized as follows:
Images: Located in the folder output/images (PNG format).
Annotations: Located in the folder output/labels (text files).
Each text file corresponds to an image and contains zero or more rows in YOLO format.
YOLO Annotation Format
Each annotation file contains lines with five values in the following format:
class_id
: Represents the object class (e.g. 0 corresponds to “ship”).
norm_cx
, norm_cy
: Normalized center coordinates (values between 0 and 1).
norm_w
, norm_h
: Normalized width and height (values between 0 and 1).
Conversion Process
The conversion script performs the following steps:
- Load Each Image:
The script loads each PNG image to retrieve its pixel dimensions (width and height). - Read the Corresponding Annotation File:
It reads each text file from output/labels that corresponds to an image. - Convert Normalized Coordinates:
For each annotation, the script converts the normalized center coordinates and dimensions into absolute pixel values. The top‑left coordinates are calculated as:
- Generate the CSV File:
The CSV is generated with the following columns:
filename
: The image file name.
col_x
: The x‑coordinate (column) of the top‑left corner.
row_y
: The y‑coordinate (row) of the top‑left corner.
w
: Bounding box width (pixels).
h
: Bounding box height (pixels).
label
: The object label (e.g., “ship”).
Python Conversion Script
Script Explanation
- Input & Output Folders:
The script iterates over all .txt files in output/labels and assumes a matching .png image exists in output/images. - Image Dimensions:
OpenCV (cv2) is used to load each image to determine its width and height, which are necessary for converting normalized coordinates to pixel values. - Coordinate Conversion:
The normalized center coordinates and dimensions are multiplied by the image dimensions. The top‑left corner of the bounding box is then computed by subtracting half the width and height from the center coordinates. - CSV Output:
The CSV file (annotations.csv) is written with each annotation as a row containing the image filename, the computed top‑left coordinates, bounding box width and height in pixels, and the object label (e.g., “ship”).
Example CSV Output
After running the script, the CSV output will look similar to this:
filename | col_x | row_y | width | height | label |
---|---|---|---|---|---|
output_000001.png | 150 | 100 | 30 | 40 | ship |
output_000002.png | 200 | 80 | 25 | 35 | ship |
output_000003.png | 175 | 90 | 28 | 32 | ship |
filename
: Name of the image file.
col_x
& row_y
: Top‑left pixel coordinates of the bounding box.
w
& h
: Width and height of the bounding box in pixels.
label
: Object label (“ship”).