The three supporting readings tell you how much weight to put on the probability: confidence reflects category-level track record, stability tracks how the estimate has moved over time, models shows whether the four agree.
Belief trajectory
Loading belief trajectory...
Per-model probabilities
GPT-4o
75%+28 vs avg
Gemini
60%+13 vs avg
Claude
28%-19 vs avg
Grok
25%-22 vs avg
Key disagreementGPT-4o (75%) vs Grok (25%): Different weighting of factors