Assumptions, model disagreement, and rethink triggers, updated weekly before your PM, risk, or IC discussion. Currently accepting one macro thesis and one AI-economy thesis for July.
4-model average
building category history
loading
7pt spread
Consensus sits at 69% across the four models and is — (loading). Models are aligned at a 7pt spread. Category confidence is LOW (building category history).
Confidence reflects category-level track record. Stability tracks estimate movement. Models shows whether the four agree.
Loading belief trajectory...
Current advancements in edge AI, such as Apple's M1 and M2 chips demonstrating robust on-device machine learning capabilities, indicate substantial progress toward parity with cloud models. High-performance edge devices combined with developments in model distillation and quantization are enabling significant improvements in speed and efficiency. However, cloud AI solutions continue to leverage vast computational resources and larger datasets, creating an ongoing competitive advantage.
API error
Edge AI inference has already achieved quality parity with cloud models for many tasks (vision, speech recognition, language understanding), but generative content (text, images, video) lags significantly due to parameter constraints. Current flagship edge chips (Apple Neural Engine, Qualcomm Snapdragon X) support 10-20B parameters efficiently, while SOTA cloud models (GPT-4, Claude) use 100B-1T parameters. However, quantization techniques (4-bit, 3-bit) and architectural innovations (mixture-of-experts, distillation) are closing this gap at ~15-20% annual improvements. By 2028 (4 years), reasonable extrapolation suggests edge models could reach 40-60B effective parameters with quality approaching today's 70B cloud models, though not 2024's frontier. The primary constraint is memory bandwidth, which improves ~25% annually, supporting this trajectory.
On-device models have closed the gap from 15% to 65% of cloud benchmark scores between 2022-2024 (MLPerf Mobile 3.0 vs 4.0), with Apple A18 and Qualcomm Snapdragon 8 Gen 3 delivering 15-20 TOPS NPU performance that now matches 2022 cloud GPUs; Samsung and Google have committed 2025-2026 silicon roadmaps targeting 40+ TOPS, aligning with the 3-year doubling cycle observed since 2021. Quantization advances (GPTQ/AQLM) and speculative decoding have reduced memory requirements 4-6x while retaining 92-95% of original quality on MT-Bench, enabling 7-13B parameter models to fit within 8-12GB mobile DRAM. Historical precedent shows mobile GPUs reached 80% of desktop performance within 4 years of introduction (2014-2018), suggesting similar compression curves will continue through 2027.