Object Detection: A 2-Minute Introduction
Object Detection is one of the most useful applications of Machine Learning. Here is a 2-minute introduction to Object Detection, and how it can help us find objects in pictures.
Machine Learning. I run https://t.co/iZifcK7n47 and write @0xbnomial.
While everyone is looking at Large Language Models, Object Detection is one of the most useful applications of Machine Learning.— Santiago (@svpino) June 27, 2023
Here is a 2-minute introduction to Object Detection:
Object Detection helps us find objects in pictures.— Santiago (@svpino) June 27, 2023
We can do that by training a Machine Learning model with lots of example pictures until it can spot objects by itself.
There are two main ways a computer detects objects:
The first approach is to find potential objects, then guess what each object is. This is called a "two-stage" detector. This is slower but more accurate.— Santiago (@svpino) June 27, 2023
The second is called a "single-stage" detector and it attempts to do both things at once. This is faster but less accurate.
There's a big issue with Object Detection:— Santiago (@svpino) June 27, 2023
Teaching a computer from scratch requires too much time, money, and too many pictures.
Instead of starting from scratch, we use pre-trained models that are already trained on large datasets and "fine-tune" them with our pictures.
Imagine you want to detect birds, and you have a dataset of 500 photos.— Santiago (@svpino) June 27, 2023
Instead of starting from scratch, you can find a model that was pre-trained on millions of pictures and fine-tune it with your photos.
It will be cheap and get you better results.
I'm sure you have seen these annotations before.— Santiago (@svpino) June 27, 2023
That's the result we get from an Object Detection model: a set of bounding boxes containing every object we care about.
Here is a GIF showing how @Cometml displays annotations around every person on the screen. pic.twitter.com/ybOlqGBCii
To evaluate Object Detection models, we compare the bounding boxes it predicts to the actual boxes from our annotated dataset.— Santiago (@svpino) June 27, 2023
If they overlap a lot, then our model is doing a good job.
This metric is called "Intersection over Union" or IoU for short.
We can also look at the Precision and Recall of the model.— Santiago (@svpino) June 27, 2023
Precision tells us how accurate our model is, and Recall tells us how many objects our model can find.
There's a trade-off between Precision and Recall. Ideally, we get a model that balances them appropriately.
Here is a fantastic article that will show you step-by-step how to build and compare different Object Detection models using TorchVision and @Cometml.— Santiago (@svpino) June 27, 2023
The best part: Open the Colab notebook that comes with the article and make sure you follow along!https://t.co/vgN97Tycis