LegalBERT records 72,362 monthly downloads on Hugging Face as of May 2026, with 105 fine-tuned derivative models built on top of it. The domain-adapted transformer — trained on 12 GB of legal text spanning US case law, EU legislation, and SEC contracts — outperforms generic BERT by 5 to 12 percentage points across standard legal NLP benchmarks. This article covers LegalBERT’s accuracy metrics, benchmark scores, the legal AI market it feeds into, and adoption patterns across law firms and corporate legal departments in 2026.
LegalBERT Statistics in 2026 — TL;DR
LegalBERT has 110 million parameters and was trained on 3.4 million legal decisions from the Harvard Law case corpus.
Its contract-focused variant (Contracts-BERT-base) hits an F1 score of 0.94 on named entity recognition tasks, based on peer-reviewed benchmarks published in Neural Computing and Applications.
Legal-Vocab-BERT reaches 95.28% test accuracy on majority-dissent opinion classification, 1.13 percentage points above standard BERT.
The broader AI market around legal software reached $5.21 billion in 2026, per Fortune Business Insights, growing at a 29.4% compound annual rate.
69% of legal professionals now use generative AI tools for work, more than double the 31% recorded in 2025, according to the 8am 2026 Legal Industry Report.
How Accurate Is LegalBERT?
LegalBERT’s accuracy varies by task. On contract-level named entity recognition — identifying parties, dates, amounts, and clauses — the Contracts-BERT-base variant from the LEGAL-BERT family scored an F1 of 0.94, the highest among all tested models including CRFs and BiLSTM configurations. The research was published in Springer’s Neural Computing and Applications journal.
For legal reasoning classification on US Supreme Court opinions, LEGAL-BERT-SC reached a macro-F1 of 0.70, outperforming generic BERT (0.68), DistilBERT (0.67), and T5-base (0.64). On the LexGLUE benchmark, legal-oriented models like LEGAL-BERT and CaseLaw-BERT consistently beat general-purpose transformers, with the gap widest on US case law tasks at 2 to 4 percentage points in macro-F1.
A 2025 study from Frontiers in Artificial Intelligence built LegNER on Legal-BERT foundations and reported 99% accuracy and over 99% F1 on court case entity recognition across six entity types, processing more than 12 documents per second.
| Task | Model | Score | Metric |
|---|---|---|---|
| Contract NER | Contracts-BERT-base | 0.94 | F1 |
| Opinion Classification | Legal-Vocab-BERT | 95.28% | Accuracy |
| Legal Reasoning (SCOTUS) | LEGAL-BERT-SC | 0.70 | Macro-F1 |
| Court Case NER | LegNER | 99% | Accuracy |
| Tax UTP Classification | LegalBERT | 82.0% | Accuracy |
Source: Springer (Neural Computing and Applications), Frontiers in AI, LexGLUE Benchmark [ACL 2022], Emergent Mind
LegalBERT vs Generic BERT — Performance Gap
Domain-specific pretraining is the difference. LEGAL-BERT was trained on 12 GB of legal corpora: 164,141 US court cases, 116,062 EU legislation documents, 12,554 ECHR cases, 61,826 UK statutes, and 76,366 SEC contracts. Standard BERT trained on Wikipedia and BookCorpus, roughly 16 GB of general-purpose text.
On the CHANCERY corporate governance benchmark, LegalBERT scored 82.0% accuracy on uncertain tax position classification. GPT-3.5 managed 78.4% on the same test. A domain-adapted LLaMA variant topped both at 90.5%, but at significantly higher compute cost. For mid-range NLP tasks where generative AI tools are overkill, LEGAL-BERT remains a practical option — lighter, cheaper, and often comparable in accuracy.
| Model | Parameters | Training Data | Legal NLP Improvement vs BERT |
|---|---|---|---|
| BERT-base | 110M | 16 GB (Wikipedia, Books) | Baseline |
| LEGAL-BERT-SC | 110M | 12 GB (Legal corpora) | +5–12% |
| CaseLaw-BERT | 110M | 37 GB (Harvard Law) | +2–6% |
| PoL-BERT-Large | 335M | 256 GB (Pile of Law) | +3–8% |
Source: Chalkidis et al. (EMNLP 2020), Henderson et al. (2022), Hugging Face
LegalBERT Downloads and Community Adoption
The primary LEGAL-BERT model (nlpaueb/legal-bert-base-uncased) on Hugging Face logged 72,362 downloads in the most recent month. It has 309 likes, 105 fine-tuned derivative models, 10 adapter models, and more than 100 active Spaces running on top of it.
Multiple variants exist. The CaseHOLD LegalBERT focuses on US case law holdings. Custom-LegalBERT adds domain vocabulary. InLegalBERT extends LEGAL-BERT-SC with Indian legal text for an additional 300,000 training steps. The Pile of Law BERT-Large model trained on 256 GB of combined legal and administrative data. The Barcelona Supercomputing Center released MrBERT-legal in February 2026 for European legal classification tasks, reflecting continued cloud-based AI service expansion across research institutions.
How Big Is the Legal AI Market in 2026?
The legal AI market reached $5.59 billion in 2026, up from $4.59 billion in 2025, according to ResearchAndMarkets. That is a 22.3% year-over-year increase. Fortune Business Insights places the legal AI software segment specifically at $5.21 billion in 2026, projecting growth to $40.94 billion by 2034 at a 29.4% CAGR.
North America accounts for roughly 46% of the global legal AI market. Asia-Pacific is the fastest-growing region. Machine learning and deep learning technologies made up over 63% of the market by technology type in 2024, based on data compiled across multiple analyst reports.
| Year | Legal AI Market Size | YoY Growth |
|---|---|---|
| 2023 | $3.10B | — |
| 2024 | $3.75B | 21.0% |
| 2025 | $4.59B | 22.4% |
| 2026 | $5.59B | 21.8% |
| 2030 (proj.) | $12.49B | — |
Source: ResearchAndMarkets “AI in Legal Global Market Report 2026”
Legal Industry AI Adoption Rates
Generative AI adoption among legal professionals hit 69% in 2026, per the 8am 2026 Legal Industry Report surveying over 1,300 practitioners. That was 31% just one year earlier. Among those users, 42% now work with tools built specifically for legal practice, not just general-purpose chatbots like ChatGPT or Google Gemini.
Corporate legal departments moved even faster. FTI Consulting and Relativity reported that 87% of general counsel now use AI within their teams, nearly double the 44% recorded in 2025. Firms with 51 or more lawyers adopted at 39%, while smaller firms trailed at around 20%.
Law firm technology spending grew 9.7% in 2025 — the fastest real growth on record in the legal industry, according to the 2026 Thomson Reuters/Georgetown Law report. Knowledge management tools grew at 10.5%. Firms with a formal AI strategy were 3.9 times more likely to report measurable benefits compared to those without one.
| Metric | 2024 | 2025 | 2026 |
|---|---|---|---|
| Legal Professionals Using GenAI | 27% | 31% | 69% |
| General Counsel Reporting AI Use | — | 44% | 87% |
| GenAI Integration (Active, Global) | 14% | 26% | — |
| Large Firms (500+) Deploying AI | 82% | — | — |
Source: 8am 2026 Legal Industry Report, FTI Consulting/Relativity General Counsel Report 2026, Thomson Reuters
LegalBERT Accuracy in Contract Review
AI-assisted contract review reaches 95% accuracy based on 2024 benchmarks. Manual review by attorneys averages 80%. Ivo, a contract analysis engine, reported 97% accuracy on the Contract Understanding Atticus Dataset (CUAD). These numbers come from platforms built on transformer models that descend from or parallel LegalBERT’s architecture.
Key clause extraction accuracy sits at 98%. AI e-discovery recall improved to 90%, up from 75% in earlier systems. Compliance monitoring false positive rates dropped to 5%. These gains explain why 52% of law firms adopted AI for contract review in 2023, almost double the 28% from 2022. The cost savings are substantial: firms using AI report 25 to 35% reductions in operational costs and average annual research savings of $100,000 per lawyer.
LegalBERT Training Data and Architecture
LEGAL-BERT follows the BERT-base-uncased architecture: 12 hidden layers, 768 hidden dimensions, 12 attention heads, and 110 million parameters. It was trained for 1 million steps with batches of 256 sequences of length 512 using a learning rate of 1e-4. The team used a single Google Cloud TPU v3-8.
The training corpus covers six sources: EU legislation from EUR-Lex (116,062 documents), UK legislation (61,826 documents), European Court of Justice cases (19,867), ECHR cases from HUDOC (12,554), US court cases from the Case Law Access Project (164,141), and US contracts from SEC’s EDGAR database (76,366). A lightweight variant, LEGAL-BERT-SMALL, runs about four times faster than the base model with competitive accuracy, making it practical for resource-constrained environments.
| Training Source | Documents | Jurisdiction |
|---|---|---|
| EUR-Lex Legislation | 116,062 | EU |
| UK Legislation Portal | 61,826 | UK |
| ECJ Cases | 19,867 | EU |
| ECHR Cases (HUDOC) | 12,554 | EU |
| US Court Cases (Case Law Access) | 164,141 | US |
| SEC Contracts (EDGAR) | 76,366 | US |
Source: Chalkidis et al., “LEGAL-BERT: The Muppets straight out of Law School” (EMNLP 2020)
LegalBERT Statistics by Use Case
Named Entity Recognition
Contract NER is where LegalBERT variants perform strongest. The Contracts-BERT-base model achieved F1 of 0.94 on identifying parties, dates, amounts, governing law, and notification periods in English-language contracts. Standard BERT scored lower across all entity types in the same evaluation.
Document Classification
On the LEDGAR dataset within LexGLUE — classifying contract provisions into categories — LegalBERT produced best-in-class results for contract-based natural language inference. The LexGLUE benchmark covers seven datasets across human rights law, US case law, EU law, and contract law, giving a broad picture of model performance on legal text.
Case Outcome Prediction
Predictive analytics in law now claims 85% case outcome accuracy using AI. LegalBERT and its derivatives contribute to this pipeline as feature extractors and classifiers. The CaseHOLD task in LexGLUE — predicting which legal holding applies to a given set of case facts — saw legal-oriented models outperform generic ones by 2 to 4 points in macro-F1.
What Are the Gaps in Legal AI Adoption?
Speed of individual adoption has outpaced institutional readiness. 54% of firms provide no AI training. Fewer than 10% enforce AI usage policies. Only 22% report strategic clarity about their AI investments, according to the 8am 2026 report.
The Thomson Reuters 2026 legal market analysis flagged risks around an AI bubble in legal tech. Firms increased technology spending by 39.3% between 2021 and 2025, with 2025 alone seeing nearly 11% growth. If client demand softens or AI tools fail to deliver measurable ROI, the correction could be sharp.
The NLP market itself is projected to reach $70.11 billion in 2026, per MarketsandMarkets, with legal applications representing a growing but still small slice of overall demand.
FAQ
What is LegalBERT?
LegalBERT is a BERT-based transformer model pre-trained on 12 GB of legal text from US case law, EU legislation, and SEC contracts. It has 110 million parameters and is designed for legal NLP tasks like document classification and entity recognition.
How accurate is LegalBERT for contract analysis?
The Contracts-BERT-base variant scores an F1 of 0.94 on contract named entity recognition. Legal-Vocab-BERT reaches 95.28% accuracy on opinion classification tasks.
How does LegalBERT compare to generic BERT?
LegalBERT outperforms standard BERT by 5 to 12 percentage points on legal NLP benchmarks. The gap is largest on US case law tasks like SCOTUS classification and CaseHOLD.
How many law firms use AI in 2026?
69% of legal professionals use generative AI for work in 2026, and 87% of general counsel report AI use in their departments, per FTI Consulting and the 8am Legal Industry Report.
What is the legal AI market size in 2026?
The legal AI market reached $5.59 billion in 2026, growing at 22.3% annually. It is projected to hit $12.49 billion by 2030, according to ResearchAndMarkets.
Sources:
