Close Menu
    Facebook X (Twitter) Instagram
    • About
    • Privacy Policy
    • Write For Us
    • Newsletter
    • Contact
    Instagram
    About ChromebooksAbout Chromebooks
    • Linux
    • News
      • Stats
      • Reviews
    • AI
    • How to
      • DevOps
      • IP Address
    • Apps
    • Business
    • Q&A
      • Opinion
    • Gaming
      • Google Games
    • Blog
    • Podcast
    • Contact
    About ChromebooksAbout Chromebooks
    AI

    Galactica Statistics 2026: Scientific Research Reports And AI Capabilities

    Dominic ReignsBy Dominic ReignsDecember 12, 2025Updated:May 30, 2026No Comments8 Mins Read

    Meta’s Galactica processed 106 billion tokens from 48 million scientific papers before its public demo was pulled after just 72 hours in November 2022. The 120-billion-parameter model scored 68.2% on LaTeX equation prediction where GPT-3 managed 49%, yet it also fabricated citations attributed to real researchers. This page covers the latest Galactica statistics for 2026, including training data, model architecture, benchmark performance, and the model’s influence on AI adoption in scientific research.

    Galactica Statistics in 2026 — TL;DR

    Galactica trained on 106 billion tokens across 4.25 epochs, processing roughly 450 billion total tokens.

    The largest variant had 120 billion parameters and was sized to fit a single NVIDIA A100 node with 80 GB memory.

    Meta pulled the public demo on November 17, 2022 — three days after launch — after users documented fabricated outputs.

    Citation prediction accuracy ranged from 36.6% to 69.1% depending on the evaluation dataset.

    As of 2024, 80.9% of published researchers report using LLMs in at least one area of their work, according to a survey of over 800 verified authors.

    How Much Data Was Galactica Trained On?

    Galactica’s training corpus contained 106 billion tokens drawn from 48 million scientific papers, 360 million in-context citations, roughly 8 million textbooks and lecture notes, and 2 million code samples. Meta called this the “NatureBook” dataset and described it as curated rather than web-scraped, a distinction from how most general-purpose LLMs source their data.

    The corpus combined natural language (papers, encyclopedias, reference material) with natural sequences like protein structures and chemical compounds using SMILES notation. Special tokens marked citations, step-by-step reasoning, and molecular data so the model could switch between modalities within a single text stream.

    Data SourceVolume
    Scientific Papers48 million
    In-Context Citations360+ million
    Unique References50+ million
    Textbooks & Lecture Notes~8 million
    Code Samples~2 million
    Total Tokens106 billion

    Source: Meta AI / Galactica Paper (Taylor et al., 2022)

    Galactica Model Variants and Architecture

    Meta released five Galactica variants: 125M, 1.3B, 6.7B, 30B, and 120B parameters. All used a decoder-only Transformer architecture with a 2,048-token context window and a 50,000-token vocabulary. The 120B model was designed to run on a single NVIDIA A100 node, which made it more accessible to academic researchers with limited compute budgets than comparably sized models.

    VariantParametersDisk Size
    Mini125M480 MB
    Base1.3B5 GB
    Standard6.7B26 GB
    Large30B56 GB
    Huge120B453 GB

    Source: Hugging Face / Meta AI

    Training used 128 NVIDIA A100 nodes (512 GPUs total) with 16,384 GB of combined memory. The model ran for 4.25 epochs over the corpus, meaning each token was seen roughly four times. Validation loss kept falling with repeated passes, which challenged the common assumption that repeated tokens degrade performance in LLMs.

    How Did Galactica Perform on Scientific Benchmarks?

    Galactica 120B scored 68.2% accuracy on LaTeX equation prediction across 434 equations from chemistry, physics, math, statistics, and economics. GPT-3 scored 49.0% on the same test. On MATH, the 120B model hit 20.4% compared to PaLM 540B’s 8.8%. The 30B variant also beat PaLM 540B on MATH — with 18 times fewer parameters.

    On medical and biomedical QA, Galactica set new state-of-the-art scores at the time: 77.6% on PubMedQA and 52.9% on MedMCQA. It also outperformed BLOOM and OPT-175B on BIG-bench despite being trained only on scientific text rather than a general web corpus.

    Galactica vs. Other Models — Benchmark Scores
    BenchmarkGalactica 120BCompetitorCompetitor Score
    LaTeX Equations68.2%GPT-349.0%
    MATH20.4%PaLM 540B8.8%
    MMLU (Math)41.3%Chinchilla35.7%
    PubMedQA77.6%Previous SOTA72.2%
    MedMCQA52.9%Previous SOTA—

    Source: Galactica Paper (Taylor et al., 2022)

    Galactica LaTeX Accuracy by Model Size

    Performance on LaTeX equations scaled predictably with parameter count. The 125M variant scored 0.5%, the 1.3B scored 20.5%, the 6.7B reached 41.7%, the 30B hit 51.5%, and the 120B topped out at 68.2%. This steady climb suggests scientific knowledge acquisition in LLMs is closely tied to model capacity.

    Galactica LaTeX Accuracy by Model Size

    Why Was Galactica Shut Down?

    Meta launched the Galactica public demo on November 15, 2022, and removed it on November 17. Within 24 hours of launch, researchers found the model generating citations to non-existent papers attributed to real scientists, including fabricated publications from Meta’s own Reality Labs and Google AI researchers. It also produced fictitious abstracts that read as plausible but contained made-up claims.

    Michael Black, a director at the Max Planck Institute, flagged the model for producing fake papers under his name. Carl Bergstrom, a University of Washington professor, called it “a random bullshit generator.” The backlash was fast and broad across academic social media. Meta AI VP Joelle Pineau later confirmed the model “was never meant to be a product.”

    Citation prediction accuracy ranged from 36.6% to 69.1%. Larger variants showed stronger preference for well-cited papers, which raised concerns about reinforcing existing academic hierarchies rather than surfacing lesser-known but relevant work.

    Galactica Statistics — Impact on Meta’s AI Strategy

    The Galactica backlash directly shaped how Meta handled its next major model release. Yann LeCun, Meta’s chief AI scientist, confirmed that the cautious rollout of Llama in February 2023 — using form-based researcher access instead of an open demo — was a direct response to what happened with Galactica. Llama models have since crossed 650 million downloads, averaging about one million per day since launch.

    Meta’s overall AI spending has scaled up sharply since Galactica’s release. Capital expenditures reached $39.2 billion in 2024, with projections of $65 billion for 2025 — a 130% increase over two years. Meta AI monthly users reached approximately 600 million, with projections above 700 million.

    Meta AI Capital Expenditure (2022–2025)

    Is Galactica Still Available in 2026?

    Yes. The public demo is gone, but all five Galactica variants remain available on Hugging Face and the open-source GitHub repository (paperswithcode/galai). The Hugging Face page has 2.74k stars. Researchers continue to build on the model: OPI-Galactica-6.7B, a variant fine-tuned for protein tasks, was accepted at the NeurIPS 2024 Workshop on Foundation Models for Science.

    ChatGPT launched 13 days after Galactica’s demo was pulled — on November 30, 2022 — which shifted public and media attention toward conversational AI and away from domain-specific scientific models.

    Galactica Statistics and AI in Scientific Research

    Galactica aimed to solve a real problem: information overload in scientific publishing. The volume of AI-related papers in materials science grew 241-fold between 1980 and 2024. In physics, the growth was 307-fold over the same period. Researchers are publishing more than ever, and AI is accelerating that trend.

    A Cornell University study published in Science in December 2025 analyzed 2.1 million abstracts and found that scientists using LLM tools posted far more papers. Output in social sciences and humanities increased 59.8%. Biology and life sciences saw a 52.9% rise. Physics and math recorded a 36.2% boost.

    AI-Assisted Research Output Increase by Field

    According to the 2026 Stanford HAI AI Index Report, natural sciences produced approximately 80,150 AI publications in 2025 — up 26% from 2024. AI now accounts for 5.8% to 8.8% of all scientific research output depending on the discipline, up from below 1% in 2010.

    Researcher LLM Adoption

    A survey of over 800 verified published authors found that 80.9% reported using LLMs in at least one research area as of 2024. Enterprise AI adoption reached 78% in the same year. The global LLM market was valued at $4.5 billion in 2023, is projected at $82.1 billion by 2033, with a compound annual growth rate of 33.7%.

    MetricValueYear
    Researcher LLM Adoption80.9%2024
    Enterprise AI Adoption78%2024
    AI Share of Scientific Output5.8%–8.8%2025
    Natural Science AI Publications~80,1502025
    Global LLM Market Size$4.5B2023
    LLM Market Projection$82.1B2033

    Source: Stanford HAI 2026 AI Index Report; McKinsey; Precedence Research

    Galactica Statistics vs. Modern Scientific AI Models

    Galactica was released before ChatGPT, and the AI field has moved fast since. On ChemBench, today’s best models surpass human expert averages across 2,700+ chemistry questions, according to the 2026 Stanford HAI report. On ReplicationBench, frontier models still score below 20% on full paper-scale replication in astrophysics. AION-1, trained on over 200 million celestial objects, became the first astronomy foundation model.

    Galactica’s approach — training a single model on curated scientific text rather than a general web corpus — was ahead of its time. The model proved that data quality could outweigh data breadth: it beat general-purpose models on BIG-bench tasks despite never seeing non-scientific text. That finding still informs how researchers think about domain-specific training today.

    FAQ

    How many parameters does the largest Galactica model have?

    The largest Galactica variant contains 120 billion parameters. It was designed to fit on a single NVIDIA A100 node with 80 GB of memory.

    Is Galactica still accessible for research in 2026?

    Yes. All five model variants are available on Hugging Face and the open-source GitHub repository. The public demo was removed in November 2022, but the weights and code remain downloadable.

    How did Galactica compare to GPT-3 on scientific tasks?

    Galactica 120B scored 68.2% on LaTeX equation prediction versus GPT-3’s 49.0%. It also outperformed GPT-3 on MMLU math subtasks and set new highs on PubMedQA and MedMCQA.

    Why did Meta shut down the Galactica demo?

    Researchers found the model fabricating citations to non-existent papers and generating fictitious abstracts. Meta pulled the demo within 72 hours of launch after widespread criticism from scientists.

    What percentage of researchers now use LLMs?

    A 2024 survey of over 800 published authors found 80.9% used LLMs in at least one area of their research. AI-assisted output has increased up to 59.8% in some fields.

    Sources:

    https://arxiv.org/pdf/2211.09085

    https://hai.stanford.edu/ai-index/2026-ai-index-report/science

    https://huggingface.co/facebook/galactica-1.3b

    https://phys.org/news/2025-12-scientists-ai-tools-publishing-papers.html

    Dominic Reigns
    • Website
    • Instagram

    As a senior analyst, I benchmark and review gadgets and PC components, including desktop processors, GPUs, monitors, and storage solutions on Aboutchromebooks.com. Outside of work, I enjoy skating and putting my culinary training to use by cooking for friends.

    Best of AI

    LMArena AI: Chatbot Ranking Platform 2026

    May 27, 2026

    Will AI Take Over the World

    May 25, 2026

    AI21 Jurassic Statistics 2026: Model Size, Usage and AI Performance Trends

    May 19, 2026

    Chub AI Explained

    May 6, 2026

    Stable Diffusion AI: Free Text To Image AI Generator

    May 5, 2026
    Trending Stats

    Chromebook Browser Usage Statistics 2026: User Behavior Data And Reports

    June 3, 2026

    ChromeOS vs Windows Power Consumption Statistics 2026: Battery Life, Wattage, and Energy Cost Data

    June 2, 2026

    Chromebook Price vs Performance Statistics 2026: Value And Hardware Trends

    May 27, 2026

    Chromebook Failure Rates vs Windows Laptops Statistics 2026: Reliability, Repairs And Performance Comparison

    May 26, 2026

    ChromeOS Update Failure Rates Statistics 2026: Stability, Security And System Reliability Trends

    May 25, 2026
    • About
    • Tech Guest Post
    • Contact
    • Privacy Policy
    • Sitemap
    © 2026 About Chrome Books. All rights reserved.

    Type above and press Enter to search. Press Esc to cancel.