Crene Research
Four frontier AI models independently forecast 0 active events across Revenue beats, EPS surprises, macro releases, and central bank decisions — each scored with Brier metrics against verified outcomes. CRENE is building the first large-scale benchmark of frontier AI models forecasting real-world economic events. When models show high confidence, accuracy reaches 73%.
Four frontier LLMs forecast independently with no anchoring. Cross-model spread reveals uncertainty that single-model systems miss.
Every prediction has named resolution criteria and authoritative sources (SEC filings, BLS, Fed statements). Not crowd sourced. Verified.
Brier scores computed per model per event. Enables model-level analysis: which LLM forecasts best in which domain?
Are the probabilities meaningful? A well-calibrated model predicts 70% and is correct 70% of the time. Points near the dashed line indicate good calibration.
Automated scanners detect upcoming earnings (Polygon.io financials ), macro releases (CPI, NFP, PMI), central bank meetings, and market events. Each gets structured binary resolution criteria and a named authoritative source.
GPT-4o, Gemini 2.5 Flash Lite, Claude Haiku 4.5, and Grok 4 Fast each forecast independently with no model seeing another's output. Ensemble consensus is the mean probability. Spread (max minus min) measures disagreement.
Low spread + high consensus = high confidence. These predictions historically achieve 73%+ accuracy. The spread-based signal is a core dataset output that enables downstream signal construction.
Earnings resolved daily against Polygon.io SEC-derived financials as primary source, with Alpha Vantage cross-check and a per-event audit trail recording every source response. Macro events resolved via Gemini search grounding, with the model cited source URL classified against an authoritative source allowlist (government statistical agencies, central banks, regulators). Brier scores computed per model per event. All data served via public REST API.