LLM Forecasting Benchmark

Measuring how AI
forecasts the real world.

Four frontier AI models forecast against live prediction markets from Polymarket and Kalshi every 4 hours. The result: the largest public dataset of LLM forecasting behavior, with 76% accuracy at high divergence.

0%
High Confidenceaccuracy
76%High
69%Medium
57%Low
0Resolved
76%HIGH CONFIDENCE
RESOLVED
4AI MODELS
BRIER
76%HIGH CONFIDENCE
RESOLVED
4AI MODELS
BRIER
Performance

Accuracy by domain

Every forecast resolved against verified outcomes. Brier scored and calibrated.

Loading calibration data…
Top Alpha vs Market
Data Structure

Forecast taxonomy

Every forecast is classified into structured metadata — enabling filtering by domain, time horizon, specificity, and measurability for quantitative research.

{
  "domain":      "btc_price",
  "time_horizon": "months",
  "specificity":  4,
  "measurability": 5,
  "entities":     ["SEC", "BlackRock"],
  "keywords":     ["bitcoin", "etf"],
  "deadline":     "2026-06-01"
}
domainbtc_price, fed_rates, us_election
Specific forecast sub-domain
time_horizondays / weeks / months / quarters / years
Expected resolution timeframe
specificity1-5 scale
How precise the forecast is
measurability1-5 scale
Binary verifiable vs subjective
entitiesSEC, BlackRock, Fed
Companies, people, orgs referenced
resolution_deadline2026-06-01
Expected outcome date
Built for

Built for teams that study forecasting

AI Labs & Eval Teams

Benchmark how your models perform on real-world forecasting. Per-model Brier scores, calibration curves, and blind vs informed comparison across 8 domains.

Per-model metricsBlind vs informedCalibration data

Forecasting Researchers

The largest public dataset of LLM forecasting behavior. CSV/Parquet exports, standardized schema, and reproducible methodology.

Dataset exportsCalibration curvesReproducible methodology

Data Science Teams

Structured prediction data with full provenance. Filter by domain, time horizon, and model. Designed for data pipelines and analysis.

Structured JSON/CSVFull taxonomyAPI access

Media & Journalism

Which AI model is best at predicting what? Structured data and visualizations for stories about AI capabilities and prediction markets.

Embeddable dataModel comparisonsDomain breakdowns
API

Dataset access for researchers and teams

Forecasting data, evaluation metrics, and model comparisons — delivered as JSON or CSV. Public endpoints for research, authenticated for exports.

REST APIJSON + CSVAPI Key Auth5K req/dayBrier Scores
Developer docs
$ curl -H "X-API-Key: crn_..." \
  api-get.crene.com/api/predictions/analytics/

{
  "total_resolved": 8977,
  "confidence_signal": {
    "high":   75.6%,
    "medium": 69.4%,
    "low":    56.5%
  },
  "brier_scores":  {
    "ai_consensus": 0.236,
    "market":       0.233
  },
  "calibration":   [...],
  "category_alpha": {...}
}

How well can AI
predict the future?

Explore the data, or get in touch about research collaboration.

Crene — Who's Right About the Future?