Crene Research

ResearchMAR 2026

Four frontier AI models independently forecast 1,091 active events across Revenue beats, EPS surprises, macro releases, and central bank decisions — each scored with Brier metrics against verified outcomes. CRENE is building the first large-scale benchmark of frontier AI models forecasting real-world economic events. When models show high confidence, accuracy reaches 73%.

1,091
Active Forecasts
9 categories
0
Resolved Events
CRENE-native predictions, verified against official sources
73%
High Confidence Accuracy
+15pt lift over baseline
Consensus Brier Score
Market: —
Top Model
What Makes This Dataset Unique
4 models
Multi-Model Ensemble

Four frontier LLMs forecast independently — no anchoring. Cross-model spread reveals uncertainty that single-model systems miss.

Official sources
Structured Resolution

Every prediction has named resolution criteria and authoritative sources (SEC filings, BLS, Fed statements). Not crowd-sourced — verified.

Per-event scoring
Per-Model Calibration

Brier scores computed per model per event. Enables model-level analysis: which LLM forecasts best in which domain?

Confidence Signal Performance
Baseline
58.5%
5,688 predictions
High Confidence
73%
212 predictions
Very High Confidence
68.6%
35 predictions
When all four models converge on a high-probability outcome (low spread), accuracy reaches 73% — a 15-point improvement over predictions where models disagree. This spread-based confidence signal is a core output of the Crene dataset.
Calibration Analysis

Are the probabilities meaningful? A well-calibrated model predicts 70% and is correct 70% of the time. Points near the dashed line indicate good calibration.

Loading calibration data...
Methodology
01Event Detection

Automated scanners detect upcoming earnings (Polygon.io financials ), macro releases (CPI, NFP, PMI), central bank meetings, and market events. Each gets structured binary resolution criteria and a named authoritative source.

024-Model Consensus

GPT-4o, Gemini 2.0 Flash, Claude Haiku 3.5, and Grok 3 each forecast independently — no model sees another's output. Ensemble consensus is the mean probability. Spread (max − min) measures disagreement.

03Confidence Signals

Low spread + high consensus = high confidence. These predictions historically achieve 73%+ accuracy. The spread-based signal is a core dataset output that enables downstream signal construction.

04Resolution & Scoring

Daily automated resolution against SEC filings, BLS data, central bank statements. Polygon.io financials API auto-resolves earnings. Brier scores computed per model per event. All data served via public REST API.

Dataset Coverage
1,100+
Predictions
Active forecasts across categories
10
Categories
Earnings, macro, crypto, and more
4
AI Models
Independent probability estimates
Daily
Resolution Cadence
Verified against official sources
4 frontier LLMs9 categoriesUpdated dailyBrier scoredAll data public
Crene — Research: AI Forecasting Intelligence