numpy matrix, along with a list of the matching filenames.
Version 1.x
Version 0.2x
End-to-end Example
This section shows an end-to-end example of using pre-computed feature vectors using DINOv2 and using the features in a fastdup run.Step 1: Compute the feature vectors using DINOv2 model in fastdup.
๐ Tip Try out our DINOv2 example on Colab/Kaggle and pre-compute the feature vectors of your dataset. Or use fastdup to compute the feature vectors with your own ONNX model.
Step 2: Load the pre-computed feature vectors.
Upon completion of the run, fastdup saves the feature vectors locally in thework_dir/atrain_features.datfile.
Letโs load them with:
๐ NoteRead more on DINOv2 here.
7384corresponds to the number of images in the dataset.384corresponds to the output dimension of the DINOv2s model.
Step 3: Run fastdup.
To run fastdup on the pre-computed feature vectors, point theannotations parameter to the filenames and embeddings parameter to the feature vector.
๐ Tip The benefit of running fastdup over pre-computed feature vector is speed. Compared to running fastdup on the raw images, running fastdup over pre-computed features takes only a fraction of the time otherwise. The time it takes to run the above code is approximately 15s.
Step 4: View Galleries.
You can use all of fastdup gallery methods to view duplicates, clusters, etc.Step 5: Iterate
Letโs suppose you are not satisfied with the image cluster results above, you can always tweak therun parameters until a desired outcome is reached.
For example, letโs rerun fastdup with ccthreshold=0.8and visualize the clusters again.
๐ Tip Read more on how to tune the run parameters to obtain a desired output on your dataset here.
Wrap Up
In this tutorial, we showed how you can run fastdup using pre-computed feature vectors. Running over pre-computed feature vectors significantly reduces run time compared to running over raw image files. Questions about this tutorial? Reach out to us on our Slack channel!VL Profiler - A faster and easier way to diagnose and visualize dataset issues
The team behind fastdup also recently launched VL Profiler, a no-code cloud-based platform that lets you leverage fastdup in the browser. VL Profiler lets you find:- Duplicates/near-duplicates.
- Outliers.
- Mislabels.
- Non-useful images.
๐ Free Usage Use VL Profiler for free to analyze issues on your dataset with up to 1,000,000 images. Get started for free.Not convinced yet? Interact with a collection of datasets like ImageNet-21K, COCO, and DeepFashion here. No sign-ups needed.