The State-of-the-Art Model in Speech Recognition: Conformer-1
AssemblyAI just released a state-of-the-art model in speech recognition: Conformer-1. Learn how they achieved this, the Transformer architecture, and the potential of Convolutional Neural Networks.

Santiago
Machine Learning. I run https://t.co/iZifcK7n47 and write @0xbnomial.
-
GPT-4 is old news.
— Santiago (@svpino) March 15, 2023
The team at @AssemblyAI just released a state-of-the-art model in speech recognition: Conformer-1.
But what's interesting is how they achieved this.
Here is everything you need to know: -
Google introduced the Transformer architecture back in 2017.
— Santiago (@svpino) March 15, 2023
They quickly became one of the most influential advances in the history of AI.
Transformer models are good at capturing global interactions in the data they process.
But can we do better than this? -
Convolutional Neural Networks have been around for way longer than Transformers.
— Santiago (@svpino) March 15, 2023
Some people argued they were destined to die.
But they do something very well: A CNN knows how to exploit local features in the data.
What happens if we combine Transformers and CNNs? -
That's what the Google Brain team did with the Conformer architecture:
— Santiago (@svpino) March 15, 2023
They combined Transformers and CNNs to get the best of both worlds.
Conformer can efficiently model an audio sequence's local and global dependencies. -
The team at @AssemblyAI used the Conformer architecture in their Conformer-1 release.
— Santiago (@svpino) March 15, 2023
But that's not all!
They also leveraged new scaling laws to train an LLM model optimally: -
In May 2020, OpenAI published the GPT-3 paper.
— Santiago (@svpino) March 15, 2023
They announced their data scaling laws:
To train an LLM with 175B parameters, they needed 300B tokens. That gives us 1.7 tokens per parameter.
But this law didn't last for long. -
Last year, DeepMind proposed something different:
— Santiago (@svpino) March 15, 2023
An LLM with 70B parameters needs 1.4T tokens.
Instead of the 1.7 tokens per parameter proposed by OpenAI, DeepMind suggested 20 tokens per parameter.
Conformer-1 uses the Conformer architecture with DeepMind's scaling laws. -
The result speaks for itself:
— Santiago (@svpino) March 15, 2023
Conformer-1 is more robust on real-world data than popular commercially available models.
For example, Conformer-1 makes 43% fewer errors on noisy data than Whisper. -
You can read more about Conformer-1 in this blog post: https://t.co/qa6dI6bO6l
— Santiago (@svpino) March 15, 2023
The model is already available in @AssemblyAI's API, and you can try it for free on the playground: https://t.co/e2OSizCxsH.