4-model average
building category history
loading
12pt spread
The three supporting readings tell you how much weight to put on the probability: confidence reflects category-level track record, stability tracks how the estimate has moved over time, models shows whether the four agree.
Loading belief trajectory...
Meta's Llama model series has shown rapid improvement, with Llama 2 released in mid-2023 improving on prior benchmarks significantly, indicating strong R&D momentum. Additionally, Meta's substantial investment in AI, reported to exceed $10 billion annually, along with favorable economic conditions for tech investment, suggest a high likelihood of achieving GPT-4-class capabilities by 2026. However, competition from other leading AI firms, such as OpenAI and Google, remains a critical concern.
Meta has demonstrated rapid progress with Llama models, achieving near GPT-4 performance with Llama 3. Given the accelerating pace of AI development, substantial investment by Meta, and the continuous architectural advancements in LLMs, it is highly probable they will surpass GPT-4's benchmark performance within the next two years. Historical trends show rapid iteration and improvement in this field, making continued breakthroughs likely.
Meta has demonstrated rapid iteration capability with Llama 2 (70B) achieving competitive performance with Llama 3 (405B) showing substantial improvements, and the company has committed significant resources to LLM development with quarterly AI infrastructure investments exceeding $5B. GPT-4's dominance on benchmarks like MMLU (88.7%) and coding tasks is substantial but not insurmountable—Llama 3 already matches or exceeds GPT-4 on several metrics (math, reasoning), and with 18+ months remaining in 2026, the trajectory suggests Meta can close remaining gaps through scaling, architecture improvements, and training innovations. The main headwind is that OpenAI may also advance GPT-4 capabilities during this period, creating a moving target.
Meta's Llama 3.1 405B already reached 88.6 on MMLU and 86.6 on HumanEval in July 2024, trailing GPT-4o (89.3/90.2) by only 1-4 points, while Meta's 2025 budget for Llama training is $8-10B with 600k H100-equivalent GPUs, exceeding OpenAI's disclosed spend. Historical precedent shows Meta closed a 12-point MMLU gap in 14 months (Llama-2 to Llama-3), and the 2026 timeline allows two full training runs on 10x larger clusters. Scaling laws indicate a 2-3 trillion parameter model trained on 30T tokens would surpass GPT-4-class on aggregate benchmarks.