Close Menu
    Facebook X (Twitter) Instagram
    • About
    • Privacy Policy
    • Write For Us
    • Newsletter
    • Contact
    Instagram
    About ChromebooksAbout Chromebooks
    • Linux
    • News
      • Stats
      • Reviews
    • AI
    • How to
      • DevOps
      • IP Address
    • Apps
    • Business
    • Q&A
      • Opinion
    • Gaming
      • Google Games
    • Blog
    • Podcast
    • Contact
    About ChromebooksAbout Chromebooks
    AI

    CodeT5 Statistics 2026: Code Generation Accuracy, Adoption and AI Capabilities

    Dominic ReignsBy Dominic ReignsJanuary 6, 2026Updated:May 26, 2026No Comments6 Mins Read

    CodeT5-base recorded 22,172 monthly downloads on Hugging Face as of 2026, with 761+ related models deployed across the platform. Developed by Salesforce Research and released under the Apache 2.0 license, the CodeT5 family spans eight model variants ranging from 60 million to 16 billion parameters. This article covers the latest CodeT5 statistics for 2026, including code generation accuracy on benchmarks, adoption metrics, training data composition, and how the model family compares to other open-source code LLMs.

    CodeT5 Statistics in 2026 — TL;DR

    CodeT5-base pulls 22,172 monthly downloads on Hugging Face and supports 761+ models on the platform as of 2026.

    InstructCodeT5+ 16B scored 35.0% pass@1 on HumanEval in zero-shot settings, beating OpenAI’s code-cushman-001 at 33.5%.

    CodeT5+ trained on 51.5 billion tokens — a 50x increase over the original CodeSearchNet corpus of 8.35 million instances.

    The CodeT5+ 770M model matched performance of models 8 to 80 times its size, reaching 15.5% pass@1 on HumanEval.

    Salesforce archived the official GitHub repository in May 2025. Model weights remain on Hugging Face and community forks stay active.

    How Many Parameters Does Each CodeT5 Model Have?

    The CodeT5 family includes eight variants. Salesforce released the first three in 2021 as part of the original CodeT5 line. Five more arrived with CodeT5+ in 2023, introducing flexible encoder-decoder architectures and shallow-encoder/deep-decoder configurations for the larger models.

    Model VariantParametersArchitectureYear
    CodeT5-small60MEncoder-Decoder2021
    CodeT5-base220MEncoder-Decoder2021
    CodeT5-large770MEncoder-Decoder2021
    CodeT5+ 220M220MFlexible Encoder-Decoder2023
    CodeT5+ 770M770MFlexible Encoder-Decoder2023
    CodeT5+ 2B2BShallow Encoder, Deep Decoder2023
    CodeT5+ 6B6BShallow Encoder, Deep Decoder2023
    CodeT5+ 16B16BShallow Encoder, Deep Decoder2023

    Source: Salesforce Research / arXiv

    CodeT5+ can operate in three modes — encoder-only, decoder-only, or full encoder-decoder — allowing teams to pick the right setup for tasks like code completion, embedding, or translation without maintaining separate model instances.

    CodeT5 Code Generation Accuracy on HumanEval

    HumanEval is a benchmark of 164 hand-authored Python programming problems. Each problem requires generating a function body that passes hidden unit tests. InstructCodeT5+ 16B hit 35.0% pass@1 and 54.5% pass@10 in zero-shot evaluation, according to the original CodeT5+ paper. When paired with CodeT test generation, accuracy climbed to 42.9% pass@1 and 67.8% pass@10.

    ModelPass@1Pass@10Setting
    InstructCodeT5+ 16B35.0%54.5%Zero-shot
    InstructCodeT5+ 16B + CodeT42.9%67.8%Zero-shot + Test Gen
    CodeT5+ 770M-py15.5%—Zero-shot
    OpenAI code-cushman-00133.5%—Zero-shot
    GPT-NeoX 20B15.4%—Zero-shot
    PaLM 62B15.9%—Zero-shot

    Source: Wang et al., arXiv 2305.07922

    The 770M variant’s 15.5% pass@1 came close to PaLM 62B (15.9%) despite having roughly 80 times fewer parameters. That gap between model size and output quality is one of the more notable findings from the CodeT5+ evaluation.

    CodeT5 Training Data and Programming Language Support

    CodeT5+ trained on 51.5 billion tokens from permissively licensed GitHub repositories. That’s a 50x jump from the original CodeSearchNet corpus, which contained 8.35 million function-level instances. The training data only includes code under MIT, Apache-2.0, BSD-3-Clause, BSD-2-Clause, CC0-1.0, Unlicense, and ISC licenses, keeping commercial deployment clear of legal friction.

    The model supports nine programming languages. The original CodeT5 covered eight. CodeT5+ added C++ to the roster.

    LanguageCodeT5CodeT5+
    PythonYesYes
    JavaYesYes
    JavaScriptYesYes
    GoYesYes
    RubyYesYes
    PHPYesYes
    CYesYes
    C#YesYes
    C++NoYes

    Source: Hugging Face / Salesforce Research

    CodeT5 Hugging Face Adoption Metrics in 2026

    CodeT5-base generated 22,172 monthly downloads on Hugging Face as of 2026. The model accumulated 132 community likes and powers 36 dependent spaces on the platform. Developers built 86 finetuned derivative models and 17 adapter models from the base checkpoint. Across all variants and finetuned offshoots, the CodeT5 family totals 761+ models on Hugging Face Hub.

    MetricValue
    Monthly Downloads (CodeT5-base)22,172
    Community Likes132
    Dependent Spaces36
    Finetuned Derivative Models86
    Adapter Models17
    Total Models on Hub761+

    Source: Hugging Face

    The GitHub repository at salesforce/CodeT5 has 3,100+ stars and 486+ forks. Salesforce archived it in May 2025, but model weights stay available through Hugging Face and community-maintained forks continue development under the Apache 2.0 license.

    CodeT5 Downstream Task Performance

    CodeT5+ was evaluated on over 20 code-related benchmarks across zero-shot, finetuning, and instruction-tuning settings. The strongest gains showed up in retrieval-augmented code generation, where scores jumped by an average of +5.8 BLEU-4 points over prior baselines. Text-to-code retrieval improved by +3.2 average MRR across eight tasks. Line-level code completion added +2.1 average exact match across two benchmarks.

    Task CategoryBenchmarksImprovement
    Text-to-Code Retrieval8 tasks+3.2 avg. MRR
    Line-Level Code Completion2 tasks+2.1 avg. Exact Match
    Retrieval-Augmented Code Gen2 tasks+5.8 avg. BLEU-4

    Source: Wang et al., arXiv 2305.07922

    On mathematical programming, CodeT5+ 770M reached 87.4% pass@80 on MathQA-Python and 73.8% pass@100 on GSM8K-Python after finetuning — results that outperformed models with up to 137 billion parameters.

    How Does CodeT5 Compare to Other Open Code LLMs?

    When CodeT5+ 16B launched, its 35.0% pass@1 on HumanEval was the top score among open-source code models. Since then, newer models like StarCoder2 (15B parameters, trained on 1 trillion tokens) and Code Llama (up to 70B parameters) have arrived. StarCoder2-15B matched Code Llama 33B on several code completion tasks at roughly twice the processing speed, according to the BigCode project’s evaluation. CodeT5 retains an edge in encoder-decoder tasks like code summarization and code search, where separate research found it outperformed PLBART across all generation benchmarks.

    The broader AI code assistant market reached an estimated $8.5 billion by 2026, with 84% of developers reporting they use or plan to use AI tools. Within that space, CodeT5 serves as a popular base model for teams building custom code intelligence pipelines rather than competing directly with end-user products like GitHub Copilot or Cursor.

    CodeT5 Licensing and Open-Source Status

    All CodeT5 and CodeT5+ models are released under the Apache 2.0 license. Organizations can modify, distribute, and deploy without paying fees or requesting additional permissions. The training code uses a BSD-3-Clause license. Salesforce archived the official GitHub repo in May 2025, but the archive status doesn’t affect the license terms or access to model weights on Hugging Face.

    This open licensing approach has contributed to CodeT5’s adoption in both academic and enterprise settings. Researchers at multiple universities have finetuned CodeT5 for tasks including automated program repair, code review automation, and vulnerability detection.

    CodeT5 Environmental Footprint

    Training CodeT5-base produced 49.25 kg of CO2 emissions on Google Cloud Platform, according to Salesforce’s own reporting. Google Cloud offset those emissions through its carbon credit programs. By releasing pretrained weights openly, the model eliminates the need for other teams to repeat the pretraining step from scratch, reducing cumulative compute and energy costs across the community.

    FAQ

    How many monthly downloads does CodeT5-base get on Hugging Face?

    CodeT5-base recorded 22,172 monthly downloads on Hugging Face as of 2026, with 761+ total models in the CodeT5 family deployed on the platform.

    What accuracy did CodeT5+ achieve on HumanEval?

    InstructCodeT5+ 16B scored 35.0% pass@1 on HumanEval in zero-shot settings. With CodeT test generation, that number rose to 42.9% pass@1.

    How many programming languages does CodeT5 support?

    CodeT5+ supports nine languages: Python, Java, JavaScript, Go, Ruby, PHP, C, C#, and C++. The original CodeT5 covered eight (no C++).

    Is CodeT5 free for commercial use?

    Yes. All CodeT5 models use the Apache 2.0 license, allowing unrestricted commercial deployment, modification, and distribution without fees.

    Is the CodeT5 GitHub repository still active?

    Salesforce archived the official repository in May 2025. Model weights remain on Hugging Face, and community forks continue active development.

    Sources:

    https://arxiv.org/pdf/2305.07922

    https://huggingface.co/Salesforce/codet5-base

    https://github.com/salesforce/CodeT5

    https://www.salesforce.com/blog/codet5-open-code-large-language-models/

    Dominic Reigns
    • Website
    • Instagram

    As a senior analyst, I benchmark and review gadgets and PC components, including desktop processors, GPUs, monitors, and storage solutions on Aboutchromebooks.com. Outside of work, I enjoy skating and putting my culinary training to use by cooking for friends.

    Best of AI

    LMArena AI: Chatbot Ranking Platform 2026

    May 27, 2026

    Will AI Take Over the World

    May 25, 2026

    AI21 Jurassic Statistics 2026: Model Size, Usage and AI Performance Trends

    May 19, 2026

    Chub AI Explained

    May 6, 2026

    Stable Diffusion AI: Free Text To Image AI Generator

    May 5, 2026
    Trending Stats

    Chromebook Browser Usage Statistics 2026: User Behavior Data And Reports

    June 3, 2026

    ChromeOS vs Windows Power Consumption Statistics 2026: Battery Life, Wattage, and Energy Cost Data

    June 2, 2026

    Chromebook Price vs Performance Statistics 2026: Value And Hardware Trends

    May 27, 2026

    Chromebook Failure Rates vs Windows Laptops Statistics 2026: Reliability, Repairs And Performance Comparison

    May 26, 2026

    ChromeOS Update Failure Rates Statistics 2026: Stability, Security And System Reliability Trends

    May 25, 2026
    • About
    • Tech Guest Post
    • Contact
    • Privacy Policy
    • Sitemap
    © 2026 About Chrome Books. All rights reserved.

    Type above and press Enter to search. Press Esc to cancel.