This notebook is Part 2 of the dataset enrichment notebook series where we utilize various zero-shot models to enrich datasets.

Part 1 - Dataset Enrichment with Zero-Shot Classification Models
Part 2 - Dataset Enrichment with Zero-Shot Detection Models
Part 3 - Dataset Enrichment with Zero-Shot Segmentation Models

If you haven’t checked out Part 1, we highly encourage you to go through it first before proceeding with this notebook.

👍 Purpose In this notebook, we show an end-to-end example of how you can enrich the metadata of your visual using open-source zero-shot models such Grounding DINO using the output we obtained from Part 1. By the end of the notebook, you’ll learn how to:

Install and load the Grounding DINO in fastdup.

Enrich your dataset using bounding boxes and labels generated by the Grounding DINO model.

Run inference using SAM on a single iamge.

Specify custom prompt to search for object of interest in your dataset.

Export the enriched dataset into COCO .json format.

Installation

First, let’s install the necessary packages:

fastdup - To analyze issues in the dataset.
MMEngine, MMDetection, groundingdino-py - To use the Grounding DINO and MMDetection model.
gdown - To download demo data hosted on Google Drive.

Run the following to install all the above packages.

pip install -Uq fastdup mmengine mmdet groundingdino-py gdown

Test the installation. If there’s no error message, we are ready to go.

import fastdup
fastdup.__version__

'1.57'

🚧 CUDA Runtime fastdup runs perfectly on CPUs, but larger models like Grounding DINO runs much slower on CPU compared to GPU. This codes in this notebook can be run on CPU or GPU. But, we highly recommend running in CUDA-enabled environment to reduce the run time. Running this notebook in Google Colab or Kaggle is a good start!

Download Dataset

Download the coco-minitrain dataset - A curated mini-training set consisting of 20% of COCO 2017 training dataset. The coco-minitrain consists of 25,000 images and annotations.

First, let’s load the dataset from the coco-minitrain dataset.

gdown --fuzzy https://drive.google.com/file/d/1iSXVTlkV1_DhdYpVDqsjlT4NJFQ7OkyK/view
unzip -qq coco_minitrain_25k.zip

Zero-Shot Detection with Grounding DINO

Apart from zero-shot recognition models, fastdup also supports zero-shot detection models like Grounding DINO (and more to come). Grounding DINO is a powerful open-set zero-shot detection model. It accepts image-text pairs as inputs and outputs a bounding box.

1. Inference on a bulk of images

In Part 1 of the enrichment notebook series, we utilized zero-shot image tagging models such as Recognize Anything Model and ran an inference over the images in our dataset. We ended up with a DataFrame consisting of filename and ram_tags column as follows. If you’d like to reproduce the above DataFrame, Part 1 notebook details the code you need to run. We can now use the image tags from the above DataFrame in combination with Grounding DINO to further enrich the dataset with bounding boxes. To run the enrichment on a DataFrame, use the fd.enrich method and specify model='grounding-dino'. By default fastdup loads the smaller variant (Swin-T) backbone for enrichment. Also specify the DataFrame to run the enrichment on and the name of the column as the input to the Grounding DINO model. In this example, we take the text prompt from the ram_tags column which we have computed earlier.

fd = fastdup.create(input_dir='./coco_minitrain_25k')

df = fd.enrich(task='zero-shot-detection', 
               model='grounding-dino', 
               input_df=df, 
               input_col='ram_tags'
     )

📘 More on fd.enrich Enriches an input DataFrame by applying a specified model to perform a specific task. Currently supports the following parameters:

Once, done you’ll notice that 3 new columns are appended into the DataFrame namely - grounding_dino_bboxes, grounding_dino_scores, and grounding_dino_labels. Now let’s plot the results of the enrichment using the plot_annotationsfunction.

from fastdup.models_utils import plot_annotations

plot_annotations(df, 
                 image_col='filename',                # column specifying image filenames
                 tags_col='ram_tags',                 # column specifying image labels
                 bbox_col='grounding_dino_bboxes',    # column specifying bounding boxes
                 scores_col='grounding_dino_scores',  # column specifying label scores
                 labels_col='grounding_dino_labels',  # column specifying label text
                 num_rows=10                          # the number of rows in the dataframe to plot
)                         

Search for Specific Objects with Custom Text Prompt

Let’s suppose you’d like to search for specific objects in your dataset, you can create a column in the DataFrame specifying the objects of interest and run the .enrich method. Let’s create a column in our DataFrame and name it custom_prompt.

df["custom_prompt"] = "face . eye . hair . "

Now let’s run the enrichment with the custom prompt column.

df = fd.enrich(task='zero-shot-detection',  
               model='grounding-dino',  
               input_df=df,  
               input_col='custom_prompt'  
     )

df

Not all images contain “face”, “eye” and “hair”, let’s remove the columns with no detections and plot the column with detections.

df = df[df['grounding_dino_labels'].astype(bool)]

And plot the results.

plot_annotations(df, 
                 image_col='filename', 
                 tags_col='custom_prompt', 
                 bbox_col='grounding_dino_bboxes', 
                 scores_col='grounding_dino_scores', 
                 labels_col='grounding_dino_labels', 
                 num_rows=10
)

2. Inference on a single image

fastdup provides an easy way to load the Grounding DINO model and run an inference. Let’s suppose we have the following image and would like to run an inference with the Grounding DINO model.

from IPython.display import Image
Image("coco_minitrain_25k/images/val2017/000000449996.jpg")

You’ll have to import the module and provide it with an image-text input pair.

from fastdup.models_grounding_dino import GroundingDINO

model = GroundingDINO()
results = model.run_inference(image_path="coco_minitrain_25k/images/val2017/000000449996.jpg",
                              text_prompt="air field . airliner . plane . airport . airport runway . airport terminal . jet . land . park . raceway . sky . tarmac . terminal",
                              box_threshold=0.3,
                              text_threshold=0.25
          )

📘 Note Note: Text prompts must be separated with " . ".

By default, fastdup uses the smaller variant of Grounding DINO (Swin-T backbone). The results variable contains a dict with labels, scores and bounding boxes.

{'labels': ['sky',
  'airport terminal',
  'plane',
  'airliner',
  'jet',
  'airport terminal',
  'jet',
  'tarmac'],
 'scores': [0.5281, 0.3444, 0.3824, 0.4883, 0.386, 0.3005, 0.3512, 0.3034],
 'boxes': [(1.47, 1.45, 638.46, 241.38),
  (329.36, 291.55, 468.11, 319.69),
  (142.03, 247.3, 261.97, 296.54),
  (443.6, 111.93, 495.47, 130.84),
  (113.85, 290.28, 246.56, 340.23),
  (518.36, 291.7, 638.48, 324.26),
  (391.59, 271.71, 465.11, 295.5),
  (2.34, 277.73, 637.63, 425.31)]}

Let’s plot the image and results using the annotate_image convenience function.

from fastdup.models_utils import annotate_image

annotate_image("coco_minitrain_25k/images/val2017/000000449996.jpg", results)

You can optionally load another variant of Grounding DINO (Swin-B backbone) from the official Grounding DINO repo. Download the weights and config into your local directory and pass them as arguments to the GroundingDINO contructor.

model = GroundingDINO(model_config="GroundingDINO_SwinB_cfg.py", 
                      model_weights="groundingdino_swinb_cogcoor.pth")

Convert Annotations to COCO Format

Once the enrichment is complete, you can also conveniently export the DataFrame into the COCO .json annotation format. For now, only the bounding boxes and labels are exported. Masks will be added in a future release.

from fastdup.models_utils import export_to_coco

export_to_coco(df, 
               bbox_col='grounding_dino_bboxes', 
               label_col='grounding_dino_labels', 
               json_filename='grounding_dino_annot_coco_format.json'
)

Wrap Up

In this tutorial, we showed how you can run zero-shot image detection models to enrich your dataset. This notebook is Part 2 of the dataset enrichment notebook series where we utilize various zero-shot models to enrich datasets.

Part 1 - Dataset Enrichment with Zero-Shot Classification Models
Part 2 - Dataset Enrichment with Zero-Shot Detection Models
Part 3 - Dataset Enrichment with Zero-Shot Segmentation Models

👍 Next Up Try out the Google Colab and Kaggle notebook to reproduce this example. Also, check out Part 3 of the series where we explore how to generate bounding boxes from the tags using zero-shot detection models like Grounding DINO. See you there!

Questions about this tutorial? Reach out to us on our Slack channel!

VL Profiler - A faster and easier way to diagnose and visualize dataset issues

The team behind fastdup also recently launched VL Profiler, a no-code cloud-based platform that lets you leverage fastdup in the browser. VL Profiler lets you find:

Duplicates/near-duplicates.
Outliers.
Mislabels.
Non-useful images.

Here’s a highlight of the issues found in the RVL-CDIP test dataset on the VL Profiler.

👍 Free Usage Use VL Profiler for free to analyze issues on your dataset with up to 1,000,000 images. Get started for free.

Not convinced yet? Interact with a collection of datasets like ImageNet-21K, COCO, and DeepFashion here. No sign-ups needed.

First Steps

Tutorials

Loading from Data Sources

Data Enrichment

Embeddings

Advanced Features

Common Uses

API Reference

About Us

Metadata Enrichment with Zero-Shot Detection Models

Installation

Download Dataset

Zero-Shot Detection with Grounding DINO

1. Inference on a bulk of images

Search for Specific Objects with Custom Text Prompt

2. Inference on a single image

Convert Annotations to COCO Format

Wrap Up

VL Profiler - A faster and easier way to diagnose and visualize dataset issues

First Steps

Tutorials

Loading from Data Sources

Data Enrichment

Embeddings

Advanced Features

Common Uses

API Reference

About Us

​Installation

​Download Dataset

​Zero-Shot Detection with Grounding DINO

​1. Inference on a bulk of images

​Search for Specific Objects with Custom Text Prompt

​2. Inference on a single image

​Convert Annotations to COCO Format

​Wrap Up

​VL Profiler - A faster and easier way to diagnose and visualize dataset issues

Installation

Download Dataset

Zero-Shot Detection with Grounding DINO

1. Inference on a bulk of images

Search for Specific Objects with Custom Text Prompt

2. Inference on a single image

Convert Annotations to COCO Format

Wrap Up

VL Profiler - A faster and easier way to diagnose and visualize dataset issues