Meta’s Make-A-Video reached a milestone in September 2022 when it demonstrated the first large-scale text-to-video generation system trained on 2.3 billion text-image pairs. The model established foundational approaches for an industry now valued at $614.8 million in 2024. Marketing adoption of AI video tools surged from 18% in 2023 to 41% in 2024, representing 128% year-over-year growth.
Make-A-Video Key Statistics
- Make-A-Video trained on 2.3 billion text-image pairs combined with 20 million unlabeled videos from WebVid-10M and HD-VILA-100M datasets.
- The AI video generator market reached $614.8 million in 2024 and projects growth to $2,562.9 million by 2032 at a 20.0% compound annual growth rate.
- Marketing adoption of AI video generation increased from 18% in 2023 to 41% in 2024, while 75% of video marketers now employ AI tools.
- Meta invested $39.2 billion in capital expenditure for AI development in 2024, with projections reaching $60-65 billion in 2025.
- Organizations implementing AI video tools report efficiency gains up to 80% in time and budget savings across content production workflows.
Make-A-Video Technical Architecture
Make-A-Video employs diffusion-based architecture with pseudo-3D convolutions and temporal attention mechanisms. The system generates 16-frame sequences at a native resolution of 64 × 64 pixels, upscaled to 768 × 768 pixels through super-resolution networks.
The model produces videos with a maximum duration of 5 seconds without audio generation capabilities. Meta trained the system using separate datasets for visual understanding and motion learning, eliminating the requirement for paired text-video training data.
| Specification | Value |
|---|---|
| Training Image Dataset | 2.3 billion text-image pairs |
| Primary Video Dataset | WebVid-10M (10 million videos) |
| Secondary Video Dataset | HD-VILA-100M subset |
| Output Frame Count | 16 frames per video |
| Native Resolution | 64 × 64 pixels |
| Upscaled Resolution | 768 × 768 pixels |
| Maximum Duration | 5 seconds |
| Audio Capability | Silent (no audio) |
Make-A-Video Training Dataset Composition
The training methodology combined labeled image-text pairs from a filtered LAION dataset subset with unlabeled video footage from Shutterstock via WebVid-10M. This approach enabled object recognition through static images while learning motion dynamics from video sequences.
Meta implemented filtering processes to remove NSFW content, toxic text, and images with watermark probability exceeding 0.5. The 2.3 billion text-image pairs provided visual understanding, while 20 million videos taught temporal coherence without requiring text descriptions.
| Dataset | Size | Purpose |
|---|---|---|
| LAION Subset | 2.3 billion pairs | Object recognition and text understanding |
| WebVid-10M | 10 million videos | Motion learning and frame interpolation |
| HD-VILA-100M Subset | 10 million videos | Super-resolution training |
AI Video Generator Market Growth
The text-to-video generation market established by Make-A-Video recorded a valuation of $614.8 million in 2024. Research firms project the market will reach $2,562.9 million by 2032, representing a compound annual growth rate of 20.0%.
North America maintains market dominance with a 40.61% share, recording $249.7 million in 2024. The United States contributed $155.3 million projected for 2025. Asia-Pacific captured a 31.40% revenue share in 2024.
| Metric | Value |
|---|---|
| Global Market Size (2024) | $614.8 million |
| Projected Market Size (2032) | $2,562.9 million |
| CAGR (2025-2032) | 20.0% |
| North America Market Share (2024) | 40.61% |
| Asia-Pacific Revenue Share (2024) | 31.40% |
| Solutions Segment Revenue Share (2024) | 63.31% |
Make-A-Video Impact on Marketing Adoption
Professional adoption of AI video generation accelerated following Make-A-Video’s demonstration of text-to-video capabilities. Marketer usage increased from 18% in 2023 to 41% in 2024, marking 128% year-over-year growth.
Video marketers showed higher adoption rates, with 75% employing AI tools in their workflows. Survey data indicates 96% of marketers believe AI is critical for video marketing strategies, while 68% of CMOs deployed AI for video generation.
| Adoption Metric | Percentage |
|---|---|
| Marketers Using AI in Video Production (2024) | 41% |
| Marketers Using AI in Video Production (2023) | 18% |
| Video Marketers Employing AI Tools | 75% |
| Marketers Believing AI Critical | 96% |
| CMOs Deploying AI for Video Generation | 68% |
| Small Business Adoption | 50% |
Content Production Efficiency Gains
Organizations implementing AI video tools report time and budget savings up to 80%. Text-to-video creation time decreased by over 50%, while video production costs reduced up to 60%.
AI-generated product demonstrations showed a 40% conversion rate boost. Small businesses achieved parity with larger enterprises, with 50% now utilizing AI video creation tools. Online advertisements featuring AI-generated video reached 57% of digital campaigns.
Regional Market Distribution
Geographic adoption patterns reveal North America leading with $249.7 million in 2024 market value and a projected 20.3% compound annual growth rate. The United States accounts for $155.3 million in 2025 projections.
Germany demonstrates the highest European growth rate at 20.5%, driven by government commitment to invest 5 billion Euros in AI development by 2025. France recorded $30.2 million in 2025 projections, while the United Kingdom reached $26.8 million.
Text-to-Video Model Comparison
Multiple text-to-video models emerged following Make-A-Video’s 2022 release. CogVideo from Tsinghua University deployed 9.4 billion parameters, while Google’s Imagen Video utilized 11.6 billion parameters in the same year.
Meta’s successor model Movie Gen scaled to 30 billion parameters for video generation and 13 billion for audio in 2024. The system extended maximum video length to 16 seconds at 16 frames per second, representing substantial architectural improvements over Make-A-Video’s 5-second limitation.
| Model | Developer | Parameters | Release Year |
|---|---|---|---|
| CogVideo | Tsinghua University | 9.4 billion | 2022 |
| Make-A-Video | Meta | Not disclosed | 2022 |
| Imagen Video | 11.6 billion | 2022 | |
| ModelScope | Alibaba | 1.7 billion | 2023 |
| Movie Gen Video | Meta | 30 billion | 2024 |
Meta AI Investment Following Make-A-Video
Meta expanded AI video capabilities through increased infrastructure investment following Make-A-Video’s introduction. Capital expenditure reached $39.2 billion in 2024, with projections indicating $60-65 billion for 2025.
The company planned deployment of over 1.3 million GPUs in 2025 to support AI development. Meta AI reached 1 billion monthly active users in Q1 2025, demonstrating substantial user engagement with the company’s AI products.
FAQ
When was Make-A-Video released?
Meta unveiled Make-A-Video in September 2022 as a text-to-video generation system trained on 2.3 billion text-image pairs and 20 million videos.
How large is the AI video generator market?
The global AI video generator market reached $614.8 million in 2024 and projects growth to $2,562.9 million by 2032 at a 20.0% compound annual growth rate.
What training data did Make-A-Video use?
Make-A-Video trained on 2.3 billion text-image pairs from filtered LAION dataset, 10 million videos from WebVid-10M, and 10 million videos from HD-VILA-100M subset.
How much has marketing adoption of AI video increased?
Marketing adoption increased from 18% in 2023 to 41% in 2024, representing 128% year-over-year growth. Video marketers show 75% adoption rates for AI tools.
What efficiency gains do AI video tools provide?
Organizations report time and budget savings up to 80%, text-to-video creation time reduction over 50%, and video production cost reduction up to 60%.
Sources
Simon Willison’s Analysis of Make-A-Video
Fortune Business Insights Market Research
