Mar 15, 2023

The State-of-the-Art Model in Speech Recognition: Conformer-1

AssemblyAI just released a state-of-the-art model in speech recognition: Conformer-1. Learn how they achieved this, the Transformer architecture, and the potential of Convolutional Neural Networks.

AI ALGORITHM COMPUTERS

Santiago

Machine Learning. I run https://t.co/iZifcK7n47 and write @0xbnomial.

Member of Software Developers

GPT-4 is old news.

The team at @AssemblyAI just released a state-of-the-art model in speech recognition: Conformer-1.

But what's interesting is how they achieved this.

Here is everything you need to know:
— Santiago (@svpino) March 15, 2023
Google introduced the Transformer architecture back in 2017.

They quickly became one of the most influential advances in the history of AI.

Transformer models are good at capturing global interactions in the data they process.

But can we do better than this?
— Santiago (@svpino) March 15, 2023
Convolutional Neural Networks have been around for way longer than Transformers.

Some people argued they were destined to die.

But they do something very well: A CNN knows how to exploit local features in the data.

What happens if we combine Transformers and CNNs?
— Santiago (@svpino) March 15, 2023
That's what the Google Brain team did with the Conformer architecture:

They combined Transformers and CNNs to get the best of both worlds.

Conformer can efficiently model an audio sequence's local and global dependencies.
— Santiago (@svpino) March 15, 2023
The team at @AssemblyAI used the Conformer architecture in their Conformer-1 release.

But that's not all!

They also leveraged new scaling laws to train an LLM model optimally:
— Santiago (@svpino) March 15, 2023
In May 2020, OpenAI published the GPT-3 paper.

They announced their data scaling laws:

To train an LLM with 175B parameters, they needed 300B tokens. That gives us 1.7 tokens per parameter.

But this law didn't last for long.
— Santiago (@svpino) March 15, 2023
Last year, DeepMind proposed something different:

An LLM with 70B parameters needs 1.4T tokens.

Instead of the 1.7 tokens per parameter proposed by OpenAI, DeepMind suggested 20 tokens per parameter.

Conformer-1 uses the Conformer architecture with DeepMind's scaling laws.
— Santiago (@svpino) March 15, 2023
The result speaks for itself:

Conformer-1 is more robust on real-world data than popular commercially available models.

For example, Conformer-1 makes 43% fewer errors on noisy data than Whisper.
— Santiago (@svpino) March 15, 2023
You can read more about Conformer-1 in this blog post: https://t.co/qa6dI6bO6l

The model is already available in @AssemblyAI's API, and you can try it for free on the playground: https://t.co/e2OSizCxsH.
— Santiago (@svpino) March 15, 2023