Guides

Converting YOLO Annotations to Visual Layer Format

This guide explains how to convert a Kaggle Masati dataset into a CSV file that conforms to Visual Layer's object annotation format. In this dataset, the images and labels are organized as follows:

Images: Located in the folder output/images (PNG format).
Annotations: Located in the folder output/labels (text files).

Each text file corresponds to an image and contains zero or more rows in YOLO format.

YOLO Annotation Format

Each annotation file contains lines with five values in the following format:

<class_id> <norm_cx> <norm_cy> <norm_w> <norm_h>

0 0.5869140625 0.2412109375 0.021484375 0.044921875
0 0.8974609375 0.185546875 0.044921875 0.1015625
...


class_id: Represents the object class (e.g. 0 corresponds to "ship").
norm_cx, norm_cy: Normalized center coordinates (values between 0 and 1).
norm_w, norm_h: Normalized width and height (values between 0 and 1).

Conversion Process

The conversion script performs the following steps:

  • Load Each Image:
    The script loads each PNG image to retrieve its pixel dimensions (width and height).
  • Read the Corresponding Annotation File:
    It reads each text file from output/labels that corresponds to an image.
  • Convert Normalized Coordinates:
    For each annotation, the script converts the normalized center coordinates and dimensions into absolute pixel values. The top‑left coordinates are calculated as:
top_left_x = (center_x - width/2)
top_left_y = (center_y - height/2)

  • Generate the CSV File:
    The CSV is generated with the following columns:

filename: The image file name.
col_x: The x‑coordinate (column) of the top‑left corner.
row_y: The y‑coordinate (row) of the top‑left corner.
w: Bounding box width (pixels).
h: Bounding box height (pixels).
label: The object label (e.g., "ship").

Python Conversion Script


import os
import csv
import cv2

# Paths to your folders
labels_folder = "output/labels"
images_folder = "output/images"
output_csv = "annotations.csv"

# Open CSV for writing
with open(output_csv, "w", newline="") as csvfile:
    writer = csv.writer(csvfile)
    # Write header row
    writer.writerow(["filename", "col_x", "row_y", "width", "height", "label"])

    # Process each label file
    for label_file in os.listdir(labels_folder):
        if label_file.endswith(".txt"):
            # Assume the image file has the same base name with .png extension
            image_base = os.path.splitext(label_file)[0]
            image_filename = image_base + ".png"
            image_path = os.path.join(images_folder, image_filename)
            label_file_path = os.path.join(labels_folder, label_file)

            # Check if the image exists
            if not os.path.exists(image_path):
                print(f"Image not found for {label_file}")
                continue

            # Read image to get dimensions
            image = cv2.imread(image_path)
            if image is None:
                print(f"Failed to load image {image_path}")
                continue

            h_img, w_img = image.shape[:2]

            # Open and read the label file
            with open(label_file_path, "r") as f:
                lines = f.readlines()
                for line in lines:
                    parts = line.strip().split()
                    if len(parts) != 5:
                        continue  # Skip lines not in expected format

                    # YOLO format: class, norm_cx, norm_cy, norm_w, norm_h
                    class_id, norm_cx, norm_cy, norm_w, norm_h = map(float, parts)

                    # Convert normalized values to absolute pixel values
                    abs_cx = norm_cx * w_img
                    abs_cy = norm_cy * h_img
                    abs_w = norm_w * w_img
                    abs_h = norm_h * h_img

                    # Calculate the top-left corner coordinates
                    top_left_x = abs_cx - abs_w / 2
                    top_left_y = abs_cy - abs_h / 2

                    # Map class_id to a string label; here we assume 0 means "ship"
                    label_str = "ship" if class_id == 0 else str(int(class_id))

                    # Write the row to CSV (coordinates cast to int)
                    writer.writerow([
                        image_filename,
                        int(top_left_x),
                        int(top_left_y),
                        int(abs_w),
                        int(abs_h),
                        label_str
                    ])


Script Explanation

  • Input & Output Folders:
    The script iterates over all .txt files in output/labels and assumes a matching .png image exists in output/images.
  • Image Dimensions:
    OpenCV (cv2) is used to load each image to determine its width and height, which are necessary for converting normalized coordinates to pixel values.
  • Coordinate Conversion:
    The normalized center coordinates and dimensions are multiplied by the image dimensions. The top‑left corner of the bounding box is then computed by subtracting half the width and height from the center coordinates.
  • CSV Output:
    The CSV file (annotations.csv) is written with each annotation as a row containing the image filename, the computed top‑left coordinates, bounding box width and height in pixels, and the object label (e.g., "ship").

Example CSV Output
After running the script, the CSV output will look similar to this:

filenamecol_xrow_ywidthheightlabel
output_000001.png1501003040ship
output_000002.png200802535ship
output_000003.png175902832ship

filename: Name of the image file.
col_x & row_y: Top‑left pixel coordinates of the bounding box.
w & h: Width and height of the bounding box in pixels.
label: Object label ("ship").