Assumptions, model disagreement, and rethink triggers, updated weekly before your PM, risk, or IC discussion. Currently accepting one macro thesis and one AI-economy thesis for July.
4-model average
building category history
loading
53pt spread
Consensus sits at 34% across the four models and is — (loading). Models are diverging at a 53pt spread, so the average conceals live disagreement worth inspecting below. Category confidence is LOW (building category history).
Confidence reflects category-level track record. Stability tracks estimate movement. Models shows whether the four agree.
Loading belief trajectory...
Given the trajectory of AI development and the current investments, the cost of generating AI content is expected to continue decreasing, but a reduction exceeding 95% from 2025 levels is overly aggressive. As of 2023, leading AI models, like OpenAI's GPT-4, have operating costs that are still substantial, with estimates suggesting that generating high-quality content remains in the range of $0.002 to $0.01 per word. Scaling down this cost by over 95% by 2028 would require extraordinary advancements in efficiency and reduction in power consumption that have not historically been observed at that scale within such a brief time frame.
While AI content generation costs have seen rapid decreases, a 95% drop from 2025 to 2028 (a 3-year period) is exceptionally aggressive. Current trends suggest continued cost reduction due to algorithmic improvements and hardware advancements, but the diminishing returns of Moore's Law and the increasing complexity of advanced AI models will likely slow this rate of decline. Achieving such a drastic reduction would require a fundamental, unforeseen breakthrough in AI efficiency or hardware architecture.
AI inference costs have declined approximately 50-60% from 2022-2025 due to improved model efficiency, hardware optimization, and competition (e.g., GPT-4 API pricing dropped ~90% from initial launch, but from an artificially high baseline). A 95% reduction from 2025 levels by 2028 would require cumulative cost declines of 20x over 3 years, implying ~75% annual cost reductions—substantially faster than the historical 40-50% annual improvements observed in the 2023-2025 period. While semiconductor advances (improved chip yields, architectural efficiency) and algorithmic improvements continue, physical compute constraints and training data scaling limitations suggest diminishing returns. The most likely scenario involves 60-80% cost reductions by 2028 (significant but not 95%), with further gains bottlenecking on energy costs and hardware manufacturing constraints.
Historical cost declines for frontier LLMs show 4-6x reductions every 18-24 months (GPT-3 at $60/M tokens in 2020 to GPT-4o at ~$2.50/M tokens by 2024), driven by 3-4x hardware efficiency gains per Nvidia generation and algorithmic improvements like mixture-of-experts reducing active parameters by 5-10x. Current 2025 baseline of ~$0.50-1.00/M output tokens faces continued pressure from TSMC 2nm ramp in 2026 and Google/Amazon custom silicon targeting 3-5x inference efficiency, though energy and data constraints may slow the final 20x drop needed by 2028.