Will any AI tutoring system match human tutoring outcomes in a controlled study with over 1000 participants by 2028?
Resolves Dec 31, 2028
53%probability
4-model average
LOWconfidence
building category history
—stability
loading
Divergingmodels
37pt spread
The three supporting readings tell you how much weight to put on the probability: confidence reflects category-level track record, stability tracks how the estimate has moved over time, models shows whether the four agree.
Belief trajectory
Loading belief trajectory...
Per-model probabilities
GPT-4o
65%+12 vs avg
Gemini
35%-18 vs avg
Claude
72%+19 vs avg
Grok
42%-11 vs avg
Key disagreementClaude (72%) vs Gemini (35%): Different weighting of factors