Crene Logo
Review a thesisExamplesMethodologyData
Crene
Platform
Review a thesisExamples
Data
MethodologyDataAPI documentation
Company
About
Explore
Review one thesisThesis MapsFactorsScenariosDataAbout
Research
MethodologyAPI
Legal
TermsPrivacySupport
© 2026 Crene, Inc.
Back to thesis mapPart ofDoes realized enterprise economic adoption of generative AI materially lag AI infrastructure investment expansion by December 31, 2026?
Event · POLICY & GEOPOLITICS

Will Meta release a Llama model exceeding GPT-4-class capability on standard benchmarks during 2026?

Resolves Dec 31, 2026
Probability
64%

4-model average

Confidence
LOW

building category history

Stability
—

loading

Models
Mixed

12pt spread

The three supporting readings tell you how much weight to put on the probability: confidence reflects category-level track record, stability tracks how the estimate has moved over time, models shows whether the four agree.

Belief trajectory

Loading belief trajectory...

Per-model probabilities

GPT-4o
65%+1 vs avg

Meta's Llama model series has shown rapid improvement, with Llama 2 released in mid-2023 improving on prior benchmarks significantly, indicating strong R&D momentum. Additionally, Meta's substantial investment in AI, reported to exceed $10 billion annually, along with favorable economic conditions for tech investment, suggest a high likelihood of achieving GPT-4-class capabilities by 2026. However, competition from other leading AI firms, such as OpenAI and Google, remains a critical concern.

Key uncertaintyThe pace of technological advancement by competitors, particularly developments from OpenAI and Google's Bard models, could influence the effectiveness of Meta's strategic plan and investment outcomes.
Gemini
60%-4 vs avg

Meta has demonstrated rapid progress with Llama models, achieving near GPT-4 performance with Llama 3. Given the accelerating pace of AI development, substantial investment by Meta, and the continuous architectural advancements in LLMs, it is highly probable they will surpass GPT-4's benchmark performance within the next two years. Historical trends show rapid iteration and improvement in this field, making continued breakthroughs likely.

Key uncertaintyThe exact capabilities and release timeline of OpenAI's next-generation model (GPT-5 or equivalent) which will set the new "GPT-4-class" benchmark.
Claude
72%+8 vs avg

Meta has demonstrated rapid iteration capability with Llama 2 (70B) achieving competitive performance with Llama 3 (405B) showing substantial improvements, and the company has committed significant resources to LLM development with quarterly AI infrastructure investments exceeding $5B. GPT-4's dominance on benchmarks like MMLU (88.7%) and coding tasks is substantial but not insurmountable—Llama 3 already matches or exceeds GPT-4 on several metrics (math, reasoning), and with 18+ months remaining in 2026, the trajectory suggests Meta can close remaining gaps through scaling, architecture improvements, and training innovations. The main headwind is that OpenAI may also advance GPT-4 capabilities during this period, creating a moving target.

Key uncertaintyWhether "GPT-4-class capability" is defined as matching the original GPT-4 (released March 2023) or any evolved GPT-4 variant OpenAI releases before end-2026—this definitional clarity would shift probability by ±15 percentage points depending on interpretation.
GrokGrok
62%-2 vs avg

Meta's Llama 3.1 405B already reached 88.6 on MMLU and 86.6 on HumanEval in July 2024, trailing GPT-4o (89.3/90.2) by only 1-4 points, while Meta's 2025 budget for Llama training is $8-10B with 600k H100-equivalent GPUs, exceeding OpenAI's disclosed spend. Historical precedent shows Meta closed a 12-point MMLU gap in 14 months (Llama-2 to Llama-3), and the 2026 timeline allows two full training runs on 10x larger clusters. Scaling laws indicate a 2-3 trillion parameter model trained on 30T tokens would surpass GPT-4-class on aggregate benchmarks.

Key uncertaintyWhether U.S. export controls on advanced Nvidia chips to non-allied nations will delay Meta's 2026 cluster build-out.

Resolution criteria

SourceMeta AI publications, standard benchmark leaderboards 2026
CRENE-AIER-C078-20261231Generated Jun 17, 2026