Mistral 7B crossed 500,000 downloads on Hugging Face within its first month of release — a number that set a pace few open-weight language models had matched before it. Released in September 2023 with 7.3 billion parameters, the model outperformed LLaMA 2 13B on nearly every standard benchmark while requiring roughly half the compute. This article covers the key Mistral 7B statistics you need to know in 2026, from benchmark scores to inference speeds, community adoption, and Mistral AI’s broader financial trajectory.
Mistral 7B Statistics: Key Numbers at a Glance
- Mistral 7B scored 60.1% on the MMLU benchmark, outperforming LLaMA 2 13B’s 55.6% by 4.5 percentage points.
- The model reached 500,000 Hugging Face downloads within its first month of release in September 2023.
- On TensorRT-LLM with H100 GPUs, Mistral 7B delivers throughput of up to 93.63 tokens per second.
- Mistral AI’s valuation climbed from $260 million in June 2023 to $6.2 billion by June 2024.
- Orca-Math, a fine-tune built on Mistral 7B’s architecture, scored 86.81% on the GSM8K math reasoning benchmark without external tools.
What Are Mistral 7B’s Core Technical Specifications?
Mistral 7B packs 7.3 billion parameters into a transformer architecture that uses two efficiency-focused mechanisms: Grouped-Query Attention (GQA) and Sliding Window Attention (SWA). GQA speeds up inference and cuts memory use compared to full multi-head attention. SWA allows the model to process long sequences without the memory scaling problems that typically come with standard attention layers.
The base context window spans 32,768 tokens, which Mistral AI has maintained across v0.1 through v0.3. The Apache 2.0 license permits commercial use, modification, and redistribution without royalty fees — a decision that directly drove rapid community uptake.
| Specification | Detail |
|---|---|
| Parameters | 7.3 billion |
| Architecture | Transformer with GQA + SWA |
| Context Window | 32,768 tokens |
| License | Apache 2.0 |
| Release Date | September 27, 2023 |
| Training Hardware | CoreWeave cluster |
| Variants Available | Base, Instruct (v0.1, v0.2, v0.3) |
Source: Mistral AI Release Blog, Stanford FMTI May 2024
How Does Mistral 7B Perform on Benchmarks?
Mistral 7B scored 60.1% on MMLU, which measures knowledge and reasoning across 57 subjects. That places it above LLaMA 2 13B (55.6%) despite having nearly half as many parameters. On code generation tasks, the gap is wider: Mistral 7B hit 31.1% on HumanEval versus just 11.6% for LLaMA 2 7B — a 168% improvement. The MBPP coding benchmark showed a similar split, with Mistral 7B at 52.5% against LLaMA 2 7B’s 26.1%.
On reasoning and commonsense tasks, Mistral 7B holds parity with or exceeds LLaMA 2 34B in several categories, according to Mistral AI’s internal evaluation pipeline.
Source: Mistral AI, Quantumrun Foresight
Fine-Tuned Derivatives and Leaderboard Performance
The Mistral 7B base model has spawned hundreds of fine-tuned variants. Linq-Embed-Mistral claimed first place on the MTEB retrieval leaderboard in May 2024, posting a 68.2 average score across 56 datasets. Orca-Math, trained on the Mistral architecture, reached 86.81% on GSM8K without any external tool use. MistralOrca, fine-tuned on a filtered GPT-4 augmented dataset, ranked first on the Open LLM Leaderboard for models under 30 billion parameters at the time of its release.
| Derivative Model | Achievement | Score / Rank |
|---|---|---|
| Linq-Embed-Mistral | MTEB retrieval leaderboard (May 2024) | 68.2 avg across 56 datasets |
| Orca-Math | GSM8K (no external tools) | 86.81% |
| MistralOrca | Open LLM Leaderboard, sub-30B models | #1 at release |
| Mistral 7B Instruct v0.1 | MT-Bench | Top 7B model at release |
Source: Quantumrun Foresight, Open-Orca / Hugging Face
Mistral 7B Inference Speed and Deployment Statistics
Production inference numbers vary by hardware and runtime. On Mistral’s own API, the Instruct variant generates output at 182.8 tokens per second with a time to first token of 0.29 seconds, according to Artificial Analysis benchmarks. That throughput sits well above the median of 87.2 tokens per second for comparable open-weight models of similar size.
On standard hardware — without specialized optimization — Mistral 7B achieves around 170 tokens per second sustained throughput and a 130-millisecond time to first token. With TensorRT-LLM optimization on H100 GPUs, that ceiling rises to 93.63 tokens per second in batch workloads. Pricing on Mistral’s API sits at $0.25 per million input tokens and $0.25 per million output tokens.
Source: Artificial Analysis, Baseten Benchmarking (March 2024), Quantumrun Foresight
| Runtime / Hardware | Throughput (tokens/sec) | Time to First Token |
|---|---|---|
| Mistral API (standard) | 182.8 | 0.29s |
| Standard hardware (no opt.) | ~170 | 130ms |
| TensorRT-LLM (H100) | 93.63 (batch-optimized) | — |
Source: Artificial Analysis, Baseten (March 2024)
Mistral 7B on Hugging Face: Community Adoption
Mistral 7B’s first month on Hugging Face produced over 500,000 downloads — a rate that placed it among the fastest-adopted open-weight models ever listed on the platform. The model is available across Hugging Face Hub, Vertex AI, Replicate, SageMaker JumpStart, Baseten, and Kaggle Models. This distribution across multiple inference providers made it accessible without requiring local GPU infrastructure, which accelerated adoption in research and enterprise settings.
The base model has spawned a large library of community fine-tunes. Popular quantized GGUF versions from TheBloke alone account for millions of additional downloads, enabling the model to run on consumer hardware without full-precision weights. This is relevant for users interested in running AI tools directly in a browser environment, where lightweight inference is a practical constraint.
Three API providers currently offer Mistral 7B Instruct access, making it one of the more portable open-weight models for production deployments. The wide distribution contrasts with closed models that require vendor-specific platforms.
Mistral 7B Multilingual Performance
On the MEGAVERSE benchmark using the IndicXNLI classification task, Mistral 7B shows uneven performance across languages. German achieved the highest accuracy at 68.6% on the base model, but the Instruct variant dropped that score to 60.7% — a 7.9 percentage point decline that suggests instruction tuning introduces trade-offs for certain languages. Korean and Chinese accuracy improved by 4 to 5 percentage points in the Instruct variant.
| Language | Base Model Accuracy | Instruct Variant | Change |
|---|---|---|---|
| German | 68.6% | 60.7% | −7.9 pp |
| Korean | ~52% | ~56–57% | +4–5 pp |
| Chinese | ~54% | ~58–59% | +4–5 pp |
Source: Quantumrun Foresight — Mistral 7B Statistics and User Trends
The multilingual results indicate Mistral 7B performs more reliably in European languages than in Asian languages, which aligns with the training data distribution typical of models in its generation. Teams deploying it in multilingual settings should evaluate the Instruct variant separately from the base model per language.
Mistral AI Revenue, Valuation, and Funding Statistics 2026
Mistral AI generated an estimated $10 million in revenue in 2023, its first full year of commercial operations. That figure grew to approximately $30 million in 2024 — a 200% increase — driven by enterprise adoption of its API and model access products. Revenue projections for 2025 stood at $60 million, representing a further 100% year-over-year gain.
The company’s valuation moved from $260 million at its June 2023 seed round to $6.2 billion by June 2024. By late 2025, reports pegged the valuation near $10 billion as Mistral entered discussions for further funding. Total capital raised reached $1.2 billion across four rounds, with the $640 million Series B in June 2024 accounting for 54% of all funding. Mistral employed approximately 316 people as of 2025.
Source: ElectroIQ — Mistral AI Statistics 2025, TapTwice Digital
The company’s investor base includes Lightspeed Venture Partners, Andreessen Horowitz, General Catalyst, Microsoft, Nvidia, and Salesforce — alongside European institutional backers such as BNP Paribas and Bpifrance. That mix reflects both Silicon Valley confidence and a deliberate European anchor strategy. Understanding how these AI infrastructure trends map to computing platforms is useful context for anyone following cloud market share shifts driven by generative AI workloads.
| Round | Date | Amount | Valuation |
|---|---|---|---|
| Seed | June 2023 | $113M | $260M |
| Series A | December 2023 | $415M | ~$2B |
| Series A Extension | February 2024 | $16M | — |
| Series B | June 2024 | $640M | $6.2B |
Source: ElectroIQ, TapTwice Digital — Mistral AI Statistics
The funding pace mirrors trends in broader AI infrastructure spending, where enterprise cloud-native adoption has pulled demand for smaller, efficient models that can run at lower cost per inference than frontier models. Mistral 7B sits at the intersection of that demand — capable enough for many production tasks, cheap enough to operate at scale.
How Does Mistral 7B Compare to Competitors?
On the Artificial Analysis Intelligence Index — a composite benchmark spanning reasoning, knowledge, math, and coding — Mistral 7B Instruct scores 7, against a median of 11 for comparable open-weight non-reasoning models. That places it at the lower end of the current field, which has grown considerably since September 2023. Models like LLaMA 3 8B and Mistral’s own later releases have moved the benchmark ceiling upward.
The cost-per-inference picture is more competitive. At $0.25 per million tokens for both input and output, Mistral 7B Instruct is on the expensive end relative to a median of $0.09 for input tokens — though its speed advantage (182.8 t/s versus a median of 87.2 t/s) partially offsets that for latency-sensitive deployments. For teams running lightweight AI tasks without specialized hardware, Mistral 7B’s efficiency relative to its parameter count remains a practical argument for its use.
Source: Artificial Analysis Intelligence Index
Mistral’s newer model families — Mistral Small, Mistral Large 2 (123B, 128K context), and Mixtral 8×7B — have since extended the product line. But Mistral 7B remains the entry point most frequently cited in fine-tuning guides, research papers, and open-source tooling, largely because it was the first model to demonstrate that a sub-10B parameter model could challenge 13B baselines. That matters for anyone evaluating analytical workloads that need to run on modest hardware.
FAQ
What benchmark score did Mistral 7B achieve on MMLU?
Mistral 7B scored 60.1% on MMLU, outperforming LLaMA 2 13B (55.6%) by 4.5 percentage points despite having nearly half as many parameters.
How many Hugging Face downloads did Mistral 7B get?
Mistral 7B exceeded 500,000 downloads on Hugging Face within its first month of release in September 2023, one of the fastest adoption rates for an open-weight model at the time.
What is Mistral 7B’s inference speed?
On Mistral’s API, the Instruct model runs at 182.8 tokens per second with a 0.29-second time to first token. On standard hardware, throughput is around 170 tokens per second.
What is Mistral AI’s current valuation?
Mistral AI reached a valuation of $6.2 billion in June 2024. By late 2025, reports indicated discussions at a valuation near $10 billion as the company sought additional funding.
Is Mistral 7B free to use commercially?
Yes. Mistral 7B is released under the Apache 2.0 license, which permits commercial use, modification, and redistribution without royalty fees or usage restrictions.
