Object Detection: A 2-Minute Introduction
Object Detection is one of the most useful applications of Machine Learning. Here is a 2-minute introduction to Object Detection, and how it can help us find objects in pictures.
Santiago
Machine Learning. I run https://t.co/iZifcK7n47 and write @0xbnomial.
-
While everyone is looking at Large Language Models, Object Detection is one of the most useful applications of Machine Learning.
— Santiago (@svpino) June 27, 2023
Here is a 2-minute introduction to Object Detection: -
Object Detection helps us find objects in pictures.
— Santiago (@svpino) June 27, 2023
We can do that by training a Machine Learning model with lots of example pictures until it can spot objects by itself.
There are two main ways a computer detects objects: -
The first approach is to find potential objects, then guess what each object is. This is called a "two-stage" detector. This is slower but more accurate.
— Santiago (@svpino) June 27, 2023
The second is called a "single-stage" detector and it attempts to do both things at once. This is faster but less accurate. -
There's a big issue with Object Detection:
— Santiago (@svpino) June 27, 2023
Teaching a computer from scratch requires too much time, money, and too many pictures.
Instead of starting from scratch, we use pre-trained models that are already trained on large datasets and "fine-tune" them with our pictures. -
Imagine you want to detect birds, and you have a dataset of 500 photos.
— Santiago (@svpino) June 27, 2023
Instead of starting from scratch, you can find a model that was pre-trained on millions of pictures and fine-tune it with your photos.
It will be cheap and get you better results. -
I'm sure you have seen these annotations before.
— Santiago (@svpino) June 27, 2023
That's the result we get from an Object Detection model: a set of bounding boxes containing every object we care about.
Here is a GIF showing how @Cometml displays annotations around every person on the screen. pic.twitter.com/ybOlqGBCii -
To evaluate Object Detection models, we compare the bounding boxes it predicts to the actual boxes from our annotated dataset.
— Santiago (@svpino) June 27, 2023
If they overlap a lot, then our model is doing a good job.
This metric is called "Intersection over Union" or IoU for short. -
We can also look at the Precision and Recall of the model.
— Santiago (@svpino) June 27, 2023
Precision tells us how accurate our model is, and Recall tells us how many objects our model can find.
There's a trade-off between Precision and Recall. Ideally, we get a model that balances them appropriately. -
Here is a fantastic article that will show you step-by-step how to build and compare different Object Detection models using TorchVision and @Cometml.
— Santiago (@svpino) June 27, 2023
The best part: Open the Colab notebook that comes with the article and make sure you follow along!https://t.co/vgN97Tycis