Google’s Med-PaLM 2 scored 86.5% on the MedQA benchmark of USMLE-style medical exam questions — a 19 percentage point jump over its predecessor and the first large language model to reach expert-level accuracy on that dataset. This article covers the latest Med-PaLM 2 statistics for 2026, including benchmark performance, physician preference ratings, safety scores, competitive comparisons, and the broader healthcare AI market that Med-PaLM 2 helped shape before Google transitioned to Med-Gemini.
Med-PaLM 2 Statistics — TL;DR
Med-PaLM 2 reached 86.5% accuracy on the MedQA (USMLE) benchmark, up from Med-PaLM’s 67.6%.
Physicians preferred Med-PaLM 2 answers over human physician answers on 8 of 9 clinical evaluation axes.
The model recorded a 90.6% low-risk-of-harm rating in adversarial safety testing, up from Med-PaLM’s 79.4%.
92.6% of Med-PaLM 2 outputs aligned with established scientific and medical consensus.
Med-PaLM 2 powers MedLM, which became available to Google Cloud healthcare customers in the United States through Vertex AI in late 2023. Google has since introduced Med-Gemini (91.1% on MedQA) and the open-source MedGemma as successors. The global AI market in healthcare reached $36.96 billion in 2025 and is projected to grow to $51.20 billion in 2026.
How Accurate Is Med-PaLM 2 on Medical Exams?
Med-PaLM 2 scored 86.5% on MedQA, which draws from USMLE-style questions used to license physicians in the United States. The original Med-PaLM scored 67.6% on the same benchmark. That 19 percentage point gap came from improvements to the base PaLM 2 model, medical domain fine-tuning, and a new ensemble refinement prompting strategy.
The model also posted strong results on other medical benchmarks. On MedMCQA, PubMedQA, and MMLU clinical topics, Med-PaLM 2 approached or exceeded prior records. Evaluation by 15 physicians from the US, UK, and India confirmed these gains through structured human review.
| Model | MedQA Accuracy | Year |
|---|---|---|
| GPT-3.5 | 60.2% | 2023 |
| Flan-PaLM | 67.6% | 2022 |
| Med-PaLM | 67.6% | 2022 |
| GPT-4 (base) | 86.1% | 2023 |
| Med-PaLM 2 | 86.5% | 2023 |
| Med-Gemini | 91.1% | 2024 |
Source: Google Research, arXiv (2305.09617), Nature
Med-PaLM 2 Safety and Harm Ratings
In adversarial testing designed to probe the model’s limitations, Med-PaLM 2 responses were rated as low risk of harm 90.6% of the time. The original Med-PaLM scored 79.4% on the same tests — an 11.2 percentage point gap. Statistical significance was confirmed at P < 0.001 across all evaluation axes comparing Med-PaLM 2 to its predecessor.
The model showed 92.6% alignment with scientific consensus and no detectable demographic bias across tested population subgroups. When compared against GPT-4 and GPT-3.5 on a 140-question MultiMedQA subset, Med-PaLM 2 produced safer outputs with lower potential for patient harm, according to the physician evaluation panel.
| Metric | Med-PaLM | Med-PaLM 2 |
|---|---|---|
| Low Risk of Harm Rating | 79.4% | 90.6% |
| Scientific Consensus Alignment | Not reported | 92.6% |
| Physician Preference (vs. human) | Baseline | Preferred on 8 of 9 axes |
Source: Google Research, Nature (Med-PaLM 2 Paper)
Do Physicians Prefer Med-PaLM 2 Over Human Answers?
In a pairwise comparison of 1,066 consumer medical questions, physicians preferred Med-PaLM 2 responses over those written by fellow physicians on 8 out of 9 clinical axes. The one axis where physicians still held an edge was the inclusion of less inaccurate or irrelevant information — human physicians scored lower on that specific error dimension. Across scientific consensus alignment, physicians chose Med-PaLM 2 answers 72.9% of the time.
A separate pilot study using real-world bedside consultation questions found that specialist physicians preferred Med-PaLM 2 answers over generalist physician answers 65% of the time. Generalist evaluators rated the two roughly equal at 50% preference. Both specialist and generalist physicians rated Med-PaLM 2 as equally safe compared to physician-generated answers across all criteria.
Med-PaLM 2 Statistics Compared to Competitors
Med-PaLM 2’s 86.5% on MedQA edged out GPT-4-base’s 86.1% by a narrow margin. The gap was larger against general-purpose models: GPT-3.5 reached only 60.2% on the same benchmark. Medical domain-specific fine-tuning gave Med-PaLM 2 better safety characteristics in physician evaluations than general-purpose generative AI models, even when raw accuracy scores were close.
Google’s own successor, Med-Gemini, has since reached 91.1% on MedQA — a 4.6 percentage point improvement over Med-PaLM 2. Med-Gemini also added multimodal capabilities including medical image interpretation across radiology, pathology, dermatology, ophthalmology, and genomics.
From Med-PaLM 2 to MedLM and Med-Gemini
Med-PaLM 2 became one of the research models powering MedLM, a family of healthcare-focused foundation models Google made available to Cloud customers through Vertex AI in December 2023. MedLM shipped with two model sizes built on Med-PaLM 2 architecture, covering tasks from simple Q&A to complex clinical workflows. In May 2025, Google announced MedGemma at Google I/O, an open-source model based on Gemma 3 for medical text and image comprehension.
Google tested Med-PaLM 2 at the Mayo Clinic starting in July 2023, though neither party has publicly released results from that pilot. The transition from Med-PaLM 2 to Med-Gemini happened without extensive real-world clinical validation reports being published, a gap that independent researchers have flagged.
Healthcare AI Market Size in 2026
The global AI in healthcare market reached $36.96 billion in 2025, according to Precedence Research. That figure is expected to grow to $51.20 billion in 2026, on a trajectory toward $613.81 billion by 2034 at a 36.83% CAGR. North America held the largest regional share at roughly 45% of the global market in 2025, with the US alone contributing $17.51 billion.
AI captured 46% of all healthcare venture investment in 2025, totaling over $18 billion across deals. Q1 2026 digital health funding hit $4 billion, the strongest opening quarter since the pandemic peak. Average deal size climbed to $36.7 million, with 12 megadeals above $100 million accounting for 59% of that quarterly total.
| Year | Market Size (USD Billions) |
|---|---|
| 2024 | $14.92 |
| 2025 | $36.96 |
| 2026 (Projected) | $51.20 |
| 2030 (Projected) | $110.61 |
| 2034 (Projected) | $613.81 |
Source: Precedence Research, Fortune Business Insights
FDA-Approved AI Medical Devices in 2025
The FDA authorized a record 295 AI/ML-enabled medical devices in 2025, bringing the cumulative total to 1,451 since tracking began in 1995. Radiology accounted for 76% of all authorized devices (1,104 total), followed by cardiology at 8.8% and neurology at 4.7%. The pace of approvals has grown sharply — up from 221 in 2023 and 253 in 2024.
Aidoc’s CARE1 foundation model received FDA clearance in February 2025, becoming the first foundation-model-powered clinical AI to do so. GE HealthCare held the most radiology AI authorizations at 120, followed by Siemens Healthineers at 89 and Philips at 50. Physician adoption of AI tools reached 63% in the November 2025 to January 2026 Doximity survey, up from 47% in March 2025.
Med-PaLM 2 Multimodal Capabilities
Med-PaLM M, the multimodal extension of the Med-PaLM architecture, processes medical images, genomic data, and clinical text within a single framework. Three parameter variants (12B, 40B, and 84B) were tested across 14 biomedical tasks. The 84B parameter version achieved the best balance between accuracy and error rates in radiology report generation. Clinicians preferred Med-PaLM 2 radiology reports over radiologist-written reports in up to 40.5% of cases across a sample of 246 retrospective chest X-rays.
These multimodal capabilities carried forward into Med-Gemini and eventually into MedGemma, which Google released as an open model for developers building health applications. MedGemma’s baseline performance on clinical knowledge and reasoning tasks is comparable to earlier closed models, according to Google’s own benchmarks shared at I/O 2025.
Physician AI Adoption Statistics in 2026
63% of US physicians reported using AI tools in clinical practice as of the November 2025–January 2026 Doximity survey, up from 47% just nine months earlier and 38% in 2023. 66% of physicians used health AI by mid-2025, a 78% increase from the 38% figure recorded in 2023. 68% of physicians recognized at least some advantage of AI in patient care, up from 63% the prior year.
At HIMSS 2026, Google Cloud showcased production-scale deployments: Humana deployed Gemini-powered Agent Assist to 20,000 member advocates handling roughly 80 million calls per year. Highmark Health scaled its AI assistant across 74 use cases, processing over 6 million prompts and delivering an estimated $27.9 million in value during 2025. CVS Health launched Health100, a Gemini-backed consumer engagement platform integrating wearable devices and health records.
| Period | US Physician AI Adoption Rate |
|---|---|
| 2023 | 38% |
| March 2025 | 47% |
| Mid-2025 | 66% |
| Nov 2025 – Jan 2026 | 63% |
Source: Doximity Physician Survey, AMA Digital Health Research
Generative AI in Healthcare Market
The generative AI in healthcare market is projected at $4.7 billion in 2026, climbing to $39.8 billion by 2035 at a 26.7% CAGR, per Roots Analysis. North America accounts for roughly 56% of global generative AI healthcare spend. Treatment planning and clinical documentation are the leading use cases. Medical imaging and diagnostics hold 22.30% of the broader healthcare AI market share in 2026.
AI-generated operative reports showed 87.3% accuracy in a 2025 study of 158 cases, outperforming surgeon-written reports at 72.8%. The ROI on AI in healthcare averages $3.20 for every $1 invested, with a typical return realized within 14 months. Hippocratic AI raised $126 million in Series C in 2025 to scale clinical safety models, backed by Amazon’s NVentures and Menlo Ventures.
What Are the Limitations of Med-PaLM 2?
Med-PaLM 2 had a higher percentage of inferior answers (27%) compared to physicians (14%) on the dimension of inaccurate or irrelevant information. Both scored roughly equal at 59% for responses rated as fully accurate. Google researchers themselves stated the model is “not ready for autonomous clinical decision-making” and is intended to augment, not replace, human clinicians.
The lack of interpretability remains a barrier to clinical adoption. Medical professionals need to justify their decisions, and the “black-box” nature of deep learning makes that difficult. Google’s own senior research director said in 2023 that he wouldn’t want the technology in his family’s healthcare journey at that stage, while acknowledging its long-term potential. The Gemini-powered successors address some of these gaps through improved multimodal reasoning and long-context processing, but real-world clinical validation data remains sparse.
FAQ
What is Med-PaLM 2’s accuracy on the USMLE exam?
Med-PaLM 2 scored 86.5% on the MedQA dataset of USMLE-style questions, making it the first LLM to reach expert-level performance on that benchmark when it was released in 2023.
How does Med-PaLM 2 compare to GPT-4 in medical tasks?
Med-PaLM 2 scored 86.5% on MedQA, slightly above GPT-4-base at 86.1%. Physician evaluators rated Med-PaLM 2 outputs as safer due to its medical-specific fine-tuning.
Is Med-PaLM 2 still used in 2026?
Med-PaLM 2 powers MedLM on Google Cloud’s Vertex AI. Google has since introduced Med-Gemini (91.1% MedQA) and the open-source MedGemma as newer alternatives.
How large is the healthcare AI market in 2026?
The global AI in healthcare market is projected at $51.20 billion in 2026, growing from $36.96 billion in 2025 at a 36.83% CAGR, per Precedence Research.
What percentage of physicians use AI tools in 2026?
63% of US physicians reported using AI tools in the November 2025 to January 2026 Doximity survey, up from 38% in 2023.
Sources:
https://arxiv.org/abs/2305.09617
https://sites.research.google/gr/med-palm/
https://research.google/blog/advancing-medical-ai-with-med-gemini/
https://www.precedenceresearch.com/artificial-intelligence-in-healthcare-market
