Using Computer Vision to Become 10x More Productive with Image Datasets
Discover how Computer Vision can help data scientists become 10x more productive when dealing with image datasets. Learn about the team at @akridata and their platform, and see screenshots of the platform in action.

Santiago
Machine Learning. I run https://t.co/iZifcK7n47 and write @0xbnomial.

-
99% of data scientists have no idea what's in their data.
— Santiago (@svpino) March 6, 2023
They train their models and hope for the best, but reality catches up quickly: garbage in, garbage out.
I found a solution that will make you 10x more productive when dealing with a dataset of images: -
90% of my work is Computer Vision, and there's nothing worse than spending hours looking at random photos.
— Santiago (@svpino) March 6, 2023
The team at @akridata showed me a demo of their platform.
I want to show you a few screenshots: -
I loaded a dataset with 3,000 street photos into their product.
— Santiago (@svpino) March 6, 2023
Right away, their service clustered my data into three groups plus outliers. It used HDBSCAN, but I could pick a different method.
Right off the bat, this is already helping me separate related images! pic.twitter.com/c0GlGJ5iXL -
If I look at one of the clusters, every picture is related:
— Santiago (@svpino) March 6, 2023
The platform determined that these pictures have something in common and automatically put them in the same cluster:
Night-time pictures.
How much time did I save from having to do this manually? pic.twitter.com/UNVxYkFOca -
I can see that a few daytime pictures snuck into the cluster.
— Santiago (@svpino) March 6, 2023
I can start with these images and search using positive and negative samples to refine the results.
I can use the thumbs-up and down buttons to "teach" the platform exactly what I want. pic.twitter.com/JRrqTyMtdS -
For example, I wanted to see how quickly I could find yellow cabs.
— Santiago (@svpino) March 6, 2023
I started with one picture and searched for similar ones. I repeatedly added positive and negative samples.
After 55 samples, I have a lot of yellow cab photos!
It took me 90 seconds to do this. pic.twitter.com/5Iowc31t2N -
But I can go one step further:
— Santiago (@svpino) March 6, 2023
I want to search for pedestrian crossings.
Instead of using positive and negative samples, I can open an image and draw a rectangle around the crossing.
I can then search my datasets for images that show a similar patch. pic.twitter.com/60UlWi7UrW -
The results are impressive!
— Santiago (@svpino) March 6, 2023
The platform returned many pictures showing pedestrian crossings using a single patch from an image.
From here, I can thumb up and down the results to refine the search even more.
It took me less than 30 seconds to do this! pic.twitter.com/lpmTGWZ0og -
I've used many products that visualize and help me deal with a dataset of pictures.
— Santiago (@svpino) March 6, 2023
But I haven't seen one that helps me slice and dice a dataset this quickly.
If you have to deal with images, check @akridata's tool for free: https://t.co/zAOHXtafZf