Solving the One Million Dollar Problem: How to Keep Machine Learning Models Accurate Over Time
Companies spend a lot of money to keep their Machine Learning models up to date and accurate. Here are three examples of how to solve this problem.
Santiago
Machine Learning. I run https://t.co/iZifcK7n47 and write @0xbnomial.
-
The one million dollar problem:
— Santiago (@svpino) April 12, 2023
How do you keep a Machine Learning model returning accurate predictions over time?
Companies spend absurd amounts of money yearly to solve this, yet many have no idea where to start.
Here are three examples: -
A model that uses stock prices to make predictions needs updates every second as prices changes.
— Santiago (@svpino) April 12, 2023
Netflix's recommendations change as frequently as you and the people around you watch more movies.
Amazon's sales predictions change when customers spend money on the site. -
These problems have something in common:
— Santiago (@svpino) April 12, 2023
You have a billion samples but can't use them all for training your model. Instead, you need a slice of the data.
Most people will tell you to get a random slice, but that doesn't work. -
The solution:
— Santiago (@svpino) April 12, 2023
You need to "condense" the data while preserving its information.
In other words, use a slice of the data that preserves the original distribution.
But this is easier said than done. -
Here is a paper that proposes a solution using "Coresets": https://t.co/2zjo8P8MHA
— Santiago (@svpino) April 12, 2023
Fortunately, you don't have to do this manually!
I work with the team @akridata. Their platform is awesome, and if you work with a dataset of images, you need to check it out! -
If you start with a large dataset, you can get a slice of your data that maintains the underlying distribution with a single click.
— Santiago (@svpino) April 12, 2023
Look at the example images:
We can select 1% of the data without losing any information.
The tool just works! pic.twitter.com/Lym0JBbfR5 -
I've used many products that visualize and help me deal with datasets of images.
— Santiago (@svpino) April 12, 2023
But I haven't seen one that helps me slice and dice a dataset this quickly.
Check @akridata's Data Explorer for free using this link:https://t.co/zAOHXtafZf