Mozilla’s DeepSpeech project accumulated 26,500 GitHub stars before the repository was officially archived on June 19, 2025. The final model, version 0.9.3, recorded a 7.06% word error rate on the LibriSpeech clean test set — a figure now surpassed by newer open-source alternatives like NVIDIA Canary-Qwen at 5.63% WER. This article covers DeepSpeech statistics in 2026, from its accuracy benchmarks and adoption legacy to the speech recognition market trends that have moved past it.
DeepSpeech Statistics — TL;DR
Mozilla archived the DeepSpeech GitHub repository on June 19, 2025, making it read-only after years of inactivity.
The final 0.9.3 model achieved a 7.06% word error rate on LibriSpeech clean test corpus, released in December 2020.
DeepSpeech supported 19 languages — well behind OpenAI Whisper’s 99+ language coverage as of 2025.
The global speech recognition market reached $9.66 billion in 2025 and is projected to grow to $23.11 billion by 2030 at a 19.1% CAGR, according to MarketsandMarkets.
NVIDIA Canary-Qwen 2.5B now leads the Hugging Face Open ASR Leaderboard with a 5.63% average WER, replacing Whisper and DeepSpeech-era models at the top.
What Is DeepSpeech and Why Was It Discontinued?
DeepSpeech was an open-source speech-to-text engine created by Mozilla, based on Baidu’s Deep Speech research paper. It used TensorFlow and a recurrent neural network architecture to convert audio into text. The engine ran on hardware as small as a Raspberry Pi 4 and could also use high-power GPU servers.
Mozilla released DeepSpeech in 2017 and published the last meaningful update — version 0.9.3 — in December 2020. The company went through major layoffs in 2020, which stalled development. Active code commits stopped in 2021. On June 19, 2025, Mozilla formally archived the repository, and it became read-only. The project’s spiritual successor, Coqui AI (a fork started by former DeepSpeech contributors), also shut down in December 2025.
DeepSpeech Statistics: Speech Recognition Accuracy and Benchmarks
DeepSpeech 0.9.3 achieved a 7.06% word error rate on the LibriSpeech clean test corpus. On the broader Common Voice dataset, independent benchmarks placed DeepSpeech at 43.82% WER — a much weaker result on diverse, real-world audio. In noisy factory environments, however, one 2025 benchmarking report from InvexTech found DeepSpeech outperformed Whisper, scoring 89% accuracy versus 76%.
Here’s how DeepSpeech’s accuracy stacks up against current open-source models on the Hugging Face Open ASR Leaderboard:
| Model | Avg WER (Lower = Better) | Languages | Status (2026) |
|---|---|---|---|
| NVIDIA Canary-Qwen 2.5B | 5.63% | 4+ | Active |
| IBM Granite Speech 3.3 8B | 5.85% | 4+ | Active |
| OpenAI Whisper Large V3 | ~6.4% | 99+ | Active |
| Mozilla DeepSpeech 0.9.3 | 7.06% (clean) / 43.82% (diverse) | 19 | Discontinued |
| Kaldi Gigaspeech XL | 3.8% (clean) / 8.76% (other) | Limited | Legacy |
Source: Hugging Face Open ASR Leaderboard, Gladia, WhisperAPI
DeepSpeech Statistics: GitHub and Community Metrics
Before its archival, the DeepSpeech repository had gathered 26,500 stars and over 4,100 forks on GitHub. The project received contributions from the open-source community and Mozilla’s machine learning team. Training the 0.9.3 model required 8 Quadro RTX 6000 GPUs, each with 24 GB of VRAM, using cuDNN RNN acceleration and 2,000 hours of American English audio with synthetic noise augmentation.
Standard model sizes ranged from 50 MB to 1.5 GB depending on configuration, which made the lighter versions practical for embedded and edge devices. Audio input was limited to 10-second clips, which restricted DeepSpeech to command processing rather than long-form transcription.
| Metric | Value |
|---|---|
| GitHub Stars | 26,500 |
| GitHub Forks | 4,100+ |
| Final Version | 0.9.3 (December 2020) |
| Repository Archived | June 19, 2025 |
| Training Data | 2,000 hours (American English) |
| Model Size Range | 50 MB – 1.5 GB |
| Max Audio Clip Length | 10 seconds |
| Languages Supported | 19 |
Source: Mozilla/DeepSpeech GitHub, QuantumRun
How Big Is the Speech Recognition Market in 2026?
The speech and voice recognition market was valued at $9.66 billion in 2025, according to MarketsandMarkets. It is projected to reach $23.11 billion by 2030, growing at a 19.1% compound annual growth rate. Statista’s narrower “speech recognition” segment estimate put the 2025 figure at $8.58 billion, with a forecast of $15.87 billion by 2030 at 13.09% CAGR.
Cloud-based deployments held 59% of the market in 2025. North America accounted for the largest regional share at approximately 42%, followed by Asia-Pacific at around 22%. The U.S. market alone was valued at $5.60 billion in 2025.
| Year | Market Size (USD Billion) |
|---|---|
| 2022 | $14.0 |
| 2023 | $17.0 |
| 2024 | $20.0 |
| 2025 | $25.0 |
| 2026 (Projected) | $30.0 |
| 2027 (Projected) | $36.0 |
| 2030 (Projected) | $56.0 |
Source: Scoop Market.us
DeepSpeech Statistics vs. Modern ASR Models: What Changed?
The open-source ASR field looks nothing like it did when DeepSpeech was active. NVIDIA Canary-Qwen 2.5B, released in June 2025, topped the Hugging Face Open ASR Leaderboard with 5.63% average WER. It uses a FastConformer encoder paired with a Qwen3-1.7B language model decoder. IBM Granite Speech 3.3 8B came in second at 5.85% WER.
Whisper, despite being dethroned on accuracy benchmarks, remains the most-used open-source ASR model in 2026 because of its 99+ language support and large ecosystem. Whisper Large V3 Turbo reduced decoder layers from 32 to 4, delivering 6x faster inference while keeping accuracy within 1–2% of the full model. For real-time speed, NVIDIA’s Parakeet CTC 1.1B processes audio 2,728x faster than real-time, though it ranks 23rd in accuracy.
DeepSpeech had one remaining advantage in benchmarks: it processed audio 30% faster than Whisper in real-time tests, and held up better in noisy industrial audio. But without active development, these edges are disappearing against newer, purpose-built streaming models like NVIDIA Riva (sub-100 ms latency) and faster-whisper (4x speed gains over reference Whisper).
Speech Recognition Adoption by Industry
Healthcare, automotive, and consumer electronics are the three largest verticals driving speech recognition spending. In healthcare, clinical documentation and medical transcription generate strong demand — the healthcare speech recognition segment is projected to reach $14.11 billion in revenue by 2032. Nuance, now owned by Microsoft, reported a 30% improvement in transcription accuracy and 25% reduction in medical report turnaround time with its 2025 update.
In automotive, in-car voice assistants passed 240 million active users by late 2024, with 50 million new vehicles shipped containing embedded voice connectivity that year. Consumer electronics accounted for 29.48% of the voice recognition market in 2025. Authentication and security held 36.93% of the market by application segment.
| Segment | 2025 Share / Stat |
|---|---|
| Consumer Electronics | 29.48% of market |
| Authentication & Security | 36.93% of revenue by application |
| Smartphones & Tablets | 39.17% by device type |
| Cloud Deployment | 59% of market |
| North America | ~42% regional share |
| Asia-Pacific | ~22% regional share |
Source: Fortune Business Insights, MarketsandMarkets
Voice Assistant and Speech Recognition Adoption Trends in 2026
An estimated 157.1 million Americans are projected to use voice assistants by 2026, according to Statista. Globally, there are 8.4 billion voice-enabled devices in use, a number that exceeds the world population. Google Assistant leads the U.S. market with approximately 92.4 million users, followed by Siri at 87 million and Alexa at 77.6 million.
Smart speaker ownership in the U.S. has plateaued at 35% of Americans aged 12 and older — roughly 101 million people — and has stayed near that level for four consecutive years, per Edison Research. About 30% of global internet users aged 16–64 use a voice assistant every week. Voice search accounts for 20.5% of global internet queries, with 56% of those searches happening on smartphones.
DeepSpeech Statistics: What Are the Best Alternatives in 2026?
Teams still running DeepSpeech in production need a migration plan. The Gladia team and other ASR analysts recommend three primary alternatives depending on use case. For batch transcription where accuracy matters most, OpenAI Whisper (or its faster-whisper variant) is the default choice. For real-time streaming with low latency, NVIDIA Riva delivers sub-100 ms response times on T4 or A10G GPUs. For on-device and edge use cases — where DeepSpeech once had an advantage — Moonshine and whisper.cpp now fill that gap.
Forasoft’s 2026 vendor guide stated clearly that Mozilla DeepSpeech, CMU Sphinx, and original Coqui should not be used in new builds. Legacy open-source ASR projects have lost both their development momentum and their monetization path.
| Use Case | Recommended Alternative | Key Advantage |
|---|---|---|
| Batch Transcription | OpenAI Whisper / faster-whisper | Highest accuracy, 99+ languages |
| Real-Time Streaming | NVIDIA Riva | Sub-100 ms latency |
| On-Device / Edge | Moonshine / whisper.cpp | CPU and ARM support, offline |
| Best Benchmark WER | NVIDIA Canary-Qwen 2.5B | 5.63% avg WER (#1 on leaderboard) |
| Enterprise Voice Agents | Deepgram Nova-3 | Sub-300 ms, per-minute pricing |
Source: Gladia, Forasoft, Northflank
DeepSpeech Statistics: Regional Market Breakdown
North America generated $7.96 billion in speech and voice recognition revenue in 2025 and is projected to reach $9.79 billion in 2026, according to Fortune Business Insights. The U.S. alone is expected to account for $6.01 billion of that 2026 total. Asia-Pacific generated $4.25 billion in 2025 (22.30% of the global market) and is expected to grow to $5.37 billion in 2026, with the fastest expansion rate among all regions.
Latin America held 3.20% of the global market in 2025 at $0.6 billion. Europe accounted for roughly 27% of market share, driven by enterprise usage and multilingual voice assistant demand across French, Spanish, German, and other European languages.
FAQ
Is Mozilla DeepSpeech still maintained in 2026?
No. Mozilla archived the DeepSpeech GitHub repository on June 19, 2025, making it read-only. The last code commit was in 2021, and the project is officially discontinued.
What was DeepSpeech’s best word error rate?
DeepSpeech 0.9.3 achieved a 7.06% word error rate on the LibriSpeech clean test corpus. On diverse real-world audio, the error rate was significantly higher at 43.82%.
What replaced DeepSpeech as the top open-source ASR model?
NVIDIA Canary-Qwen 2.5B leads the Hugging Face Open ASR Leaderboard in 2026 with a 5.63% average WER. OpenAI Whisper remains the most widely used model overall.
How large is the speech recognition market in 2026?
The global speech and voice recognition market is projected at $30 billion in 2026, according to Scoop Market.us estimates, growing at roughly 19–20% annually.
How many people use voice assistants in the United States?
An estimated 157.1 million Americans are projected to use voice assistants by 2026, per Statista. Google Assistant leads with approximately 92.4 million U.S. users.
https://www.gladia.io/blog/best-open-source-speech-to-text-models
https://www.marketsandmarkets.com/Market-Reports/speech-voice-recognition-market-202401714.html
https://www.fortunebusinessinsights.com/industry-reports/speech-and-voice-recognition-market-101382
https://scoop.market.us/speech-and-voice-recognition-statistics/
