AI Hallucination Rates Across Different Models 2026

Google’s Gemini-2.0-Flash-001 recorded a hallucination rate of just 0.7% on Vectara’s benchmark as of April 2025, making it the most factually consistent large language model tested to date. Four models now sit below the 1% threshold. Yet the average hallucination rate across all models for general knowledge questions remains around 9.2%, and reasoning-focused models like OpenAI’s o3 and o4-mini have pushed error rates in the opposite direction, reaching 33% and 48% on person-specific questions. Global financial losses tied to AI hallucinations hit $67.4 billion in 2024. Here is where AI hallucination rates stand heading into 2026.

AI Hallucination Rates Key Statistics

Gemini-2.0-Flash-001 recorded the lowest AI hallucination rate at 0.7% as of April 2025 (Vectara Leaderboard).
Four AI models now have sub-1% hallucination rates on summarization benchmarks.
OpenAI’s o3 reasoning model hallucinated 33% of the time on PersonQA, double the rate of its predecessor o1.
47% of enterprise AI users made at least one major business decision based on hallucinated content in 2024 (Deloitte).
Retrieval-Augmented Generation (RAG) reduces AI hallucination rates by up to 71% when properly implemented.

AI Hallucination Rates by Model (2025 Rankings)

Vectara’s Hughes Hallucination Evaluation Model (HHEM) has tested over 100 language models on document summarization tasks. Each model summarized 831 short documents, and outputs were scored for factual consistency against source material. The results show a wide gap between the best and worst performers.

Model	Hallucination Rate	Category
Google Gemini-2.0-Flash-001	0.7%	Sub-1%
Google Gemini-2.0-Pro-Exp	0.8%	Sub-1%
OpenAI o3-mini-high	0.8%	Sub-1%
GPT-4o (ChatGPT)	1.5%	Low
Claude 3.5 Sonnet	4.4%	Medium
Claude 3 Opus	10.1%	High
TII Falcon-7B-Instruct	29.9%	Very High

Google’s Gemini family dominates the top of the leaderboard. GPT-4o sits at 1.5%, placing it comfortably in the low-hallucination tier. Claude models ranged from 4.4% (Sonnet) to 10.1% (Opus). At the bottom, TII’s Falcon-7B-Instruct hallucinated in nearly one out of every three responses.

These numbers measure grounded summarization, where models are given source text and asked to stay faithful to it. Open-ended factual questions produce much higher error rates across the board.

AI Hallucination Rates Over Time (2021–2025)

The best-performing models have improved from a 21.8% hallucination rate in 2021 to 0.7% in 2025. That is a 96% reduction over four years. According to analysis of the Hugging Face Hallucination Leaderboard, AI hallucination rates across different models decline by roughly 3 percentage points per year on standardized benchmarks.

Some individual models reported up to a 64% drop in their hallucination rates during 2025 alone. If the current trajectory holds, projections suggest near-zero rates by 2027, though researchers caution that this depends on continued investment in training data quality and model architecture.

A 2025 mathematical proof confirmed that hallucinations cannot be fully eliminated under current LLM architectures. These systems generate statistically probable responses based on pattern matching rather than retrieving verified facts, which means some level of confabulation is baked into how they work.

Reasoning Models and the AI Hallucination Rates Paradox

A strange pattern emerged in 2025: models built for deeper reasoning actually hallucinated more on factual benchmarks. OpenAI’s o3 model hallucinated 33% of the time on the PersonQA benchmark, more than double the 16% rate of its predecessor o1. The smaller o4-mini performed even worse at 48%.

This trade-off appears structural. Models optimized for chain-of-thought reasoning excel at complex problems but tend to fill knowledge gaps with plausible-sounding guesses rather than abstaining. On Vectara’s summarization benchmark, reasoning models like o3-mini-high scored well at 0.8%. The divergence shows up when models are asked open-ended questions without source documents to anchor their responses.

A Columbia Journalism Review study from March 2025 tested models on a different task: identifying the original source of news excerpts. Grok-3 got answers wrong 94% of the time. Paid models actually fared worse than free versions in this test, and most failed to express any uncertainty despite frequent errors.

AI Hallucination Rates by Domain

Hallucination rates vary widely depending on subject matter. Legal content proved especially problematic. A Stanford study found that when LLMs answered legal questions, they hallucinated at least 75% of the time about court rulings, producing over 120 fabricated cases with realistic names and detailed but fictional reasoning.

Even top-performing models showed a 6.4% hallucination rate on legal information, compared to 0.8% for general knowledge. Medical hallucinations occurred at a 2.3% rate among the best models, while domain-specific evaluations in scientific and technical fields reported rates of 10% to 20% or higher.

These domain gaps matter for businesses deploying AI in specialized contexts. Organizations concerned about AI privacy and data integrity face additional complications when hallucinated outputs enter regulated workflows.

AI Hallucination Rates in Legal Proceedings

Courts dealt with hundreds of rulings addressing AI-generated hallucinations in legal filings during 2025. Judges issued sanctions, procedural penalties, and standing orders requiring disclosure of AI use. In Australia, a Deloitte report submitted to the government contained fabricated academic sources and a fake court quote, costing A$440,000. A separate Deloitte report for the Newfoundland government included at least four non-existent research papers.

NeurIPS 2025 accepted papers weren’t immune either. GPTZero’s analysis of over 4,000 accepted papers found hundreds of flawed references across at least 50 papers, including entirely invented citations, altered author names, and fabricated journal titles.

Enterprise Impact of AI Hallucination Rates

The business consequences go beyond embarrassment. A Deloitte survey found that 47% of enterprise AI users made at least one major decision based on hallucinated content in 2024. Knowledge workers now spend an average of 4.3 hours per week verifying AI outputs, according to Microsoft’s 2025 data.

The financial toll reached $67.4 billion globally in 2024. Each enterprise employee costs companies roughly $14,200 per year in hallucination-related mitigation efforts, per Forrester Research. The market for hallucination detection tools grew 318% between 2023 and 2025 as organizations scrambled for solutions.

In Q1 2025 alone, 12,842 AI-generated articles were removed from online platforms because they contained hallucinated content. And 76% of enterprises now run human-in-the-loop processes specifically to catch hallucinations before deployment. With generative AI adoption rates climbing past 78% of organizations, the scale of the verification burden keeps growing.

AI Hallucination Rates Mitigation Techniques

RAG remains the most effective countermeasure, cutting hallucination rates by 71% when properly integrated. Google’s 2025 research showed that models with built-in reasoning verification reduced hallucinations by up to 65%. But these gains come with trade-offs in latency, cost, and complexity.

Technique	Reduction	Trade-off
Retrieval-Augmented Generation	Up to 71%	Added latency, infrastructure cost
Self-consistency checking	Up to 65%	Higher compute, slower responses
Model scaling (more parameters)	~3% per year	Exponential compute requirements
Prompt engineering (abstention)	Varies widely	Lower answer rates
Human-in-the-loop review	High (manual)	4.3 hours/week per worker

Anthropic’s 2025 interpretability research on Claude identified internal circuits responsible for declining to answer when the model lacks sufficient information. Hallucinations occurred when these circuits were incorrectly inhibited — for example, when Claude recognized a name but lacked actual knowledge about the person, generating plausible but false responses instead of saying “I don’t know.”

As AI usage statistics show increased adoption across more business functions, the industry’s 91% rate of enterprises implementing explicit hallucination mitigation protocols signals that organizations treat this as a persistent operational risk rather than a problem with a clean fix.

AI Hallucination Rates Outlook for 2026 and Beyond

If the 3-percentage-point annual decline holds, top models could approach near-zero hallucination rates by 2027. Analysis of the Hugging Face leaderboard data suggests zero hallucinations would require models with roughly 10 trillion parameters, a scale expected around 2027.

But the picture is more nuanced than a single trend line. On grounded tasks with source documents, hallucinations are clearly declining. On open-ended questions, complex reasoning tasks, and domain-specific queries, the numbers have actually worsened for some model families. The gap between what ChatGPT or Perplexity can do on summarization versus what they do on novel questions keeps widening.

The Vectara leaderboard itself is evolving. Its November 2025 update introduced a larger, harder dataset with domain-specific evaluations, and the new benchmark produced much higher hallucination rates across the board. That recalibration is a useful reminder: the improvement story depends heavily on what yardstick you use.

FAQ

What is the lowest AI hallucination rate recorded in 2025?

Google’s Gemini-2.0-Flash-001 recorded 0.7% on Vectara’s summarization benchmark as of April 2025. Three other models also achieved sub-1% rates on the same test.

How much do AI hallucinations cost businesses?

Global losses from AI hallucinations reached $67.4 billion in 2024. Per employee, enterprises spend approximately $14,200 annually on hallucination mitigation, including 4.3 hours per week of fact-checking time.

Do reasoning AI models hallucinate more?

On open-ended factual benchmarks, yes. OpenAI’s o3 hallucinated 33% of the time on PersonQA, double its predecessor. On grounded summarization tasks, reasoning models like o3-mini-high scored 0.8%.

Can AI hallucinations be fully eliminated?

Current research says no. A 2025 mathematical proof showed that hallucinations are structurally inevitable under existing LLM architectures. RAG and human review reduce but cannot eliminate them.

Which technique best reduces AI hallucination rates?

Retrieval-Augmented Generation (RAG) is the most effective method, reducing hallucinations by up to 71% when properly implemented. Self-consistency checking adds another 65% reduction in some setups.

1. AllAboutAI – AI Hallucination Report 2026

2. Vectara Hallucination Leaderboard (GitHub)

3. ScottGraffius.com – Are AI Hallucinations Getting Better or Worse

4. Drainpipe.io – The Reality of AI Hallucinations in 2025

AI Hallucination Rates Across Different Models 2026

AI Hallucination Rates Key Statistics

AI Hallucination Rates by Model (2025 Rankings)

AI Hallucination Rates Over Time (2021–2025)

Reasoning Models and the AI Hallucination Rates Paradox

AI Hallucination Rates by Domain

AI Hallucination Rates in Legal Proceedings

Enterprise Impact of AI Hallucination Rates

AI Hallucination Rates Mitigation Techniques

AI Hallucination Rates Outlook for 2026 and Beyond

FAQ

Imagen AI: The Best Photo Editing AI In 2026

Alphafold AI from Google Deepmind 2026

Agentic AI Pindrop Anonybit: The Future of Secure Identity Verification

Google Bard Statistics And User Data 2026

Azure OpenAI Explained

ChromeOS Accessibility Feature Usage Statistics 2026

Chromebook Resale Value Depreciation Statistics 2026

ChromeOS App Ecosystem Growth Statistics 2026

Chromebook vs Tablet Usage In Education Statistics 2026

ChromeOS Update Frequency Statistics 2026