Crene is investment thesis review infrastructure. One live investment thesis is decomposed into assumptions, independent model reads, weekly movement, and rethink conditions. Public examples include thesis maps for yes or no views, factor maps for continuous distributions, and scenario records for multiple coherent futures. Calibration results are published separately for the leakage controlled benchmark and the broader resolved corpus.

How does AI prediction calibration work at Crene?

Calibration applies at the consensus layer, not at the underlying model layer. Crene queries four frontier models in isolation, computes a cross model median consensus, and tracks resolution against a tiered source allowlist. Every resolved event becomes a permanent calibration point. Per tier accuracy, per model Brier scores, and per domain accuracy are computed live and published at https://crene.com/methodology.

What is a Brier score?

A Brier score is the proper scoring rule for probability forecasts on binary outcomes. It is the mean squared error between the forecast probability and the realized outcome (0 or 1). Lower scores indicate better calibration. The no skill baseline is 0.25 (always predicting 0.5); a perfect forecaster scores 0.0. Crene publishes the consensus Brier score and per model Brier scores across the resolved event corpus.

Crene serves investment teams that need a repeatable thesis review process: hedge funds, asset managers, macro PMs, CIO offices, family offices, and thematic investment teams. A team brings one thesis it is actively debating. Crene maps what the thesis depends on, where models disagree, what changed each week, and what would force a review. Data and API access remain available as the proof layer behind the workflow.

What is the difference between an event, a thesis map, a factor map, and a scenario at Crene?

Events are base binary outcomes with scalar probabilities. Thesis maps decompose yes or no investment views into assumptions. Factor maps decompose continuous variables into driver distributions. Scenario records map long horizon questions into coherent pathways. Public collection pages distinguish live maps from frozen thesis records and archived scenario records.

Methodology · JUL 2026

A timestamped record of an investment thesis.

Crene decomposes a live thesis into governed assumptions, updates independent model estimates each day, and preserves how the thesis changed over time.Deterministic detectors surface movement, disagreement, concentration, conflict, and staleness that may require review. Suggested research steps and resolution branches define what to examine next. Weekly freezes preserve the information available to the team at the time.

Methodological distinction

Scoring, structure, and review are separate claims.

Calibration tests whether probabilities deserve trust. Decomposition makes the thesis inspectable. Review detection identifies which changes require attention.

01

Scoring

Four models forecast fixed questions independently. Reads are timestamped, preserved, and scored against named resolution sources where outcomes exist.

02

Structure

A thesis is represented as governed assumptions, mechanisms, relationships, drivers, and pathways. Structure makes the view inspectable without claiming that every relationship is causal.

03

Review

Deterministic detectors scan movement, disagreement, concentration, conflict, and staleness to identify what may require research, discussion, or renewed review.

Time and record discipline

Four cadences govern the record.

Model estimates update daily. Weekly reviews freeze the available state. Resolutions update calibration. Structural changes require explicit human promotion.

Daily

Probabilities, model spread, pressure, trajectories, candidate findings, suggested actions, and resolution branches.

Weekly

An immutable review snapshot records the thesis state that was actually available during the review period.

On resolution

Resolved outcomes update Brier scores, calibration curves, directional accuracy, and sample sizes.

Human governed

New assumptions, retired assumptions, taxonomy edits, and relationship changes require explicit human promotion and a new structure version.

Candidate findings

The detector identifies a pattern. It does not declare a verdict.

Crene evaluates the latest thesis state against preset detector rules. The public language is constrained to what the evidence actually establishes.

Coordinated repricing

Several important assumptions move materially in the same direction during the review window.

Moving disagreement

An assumption important to the decision moves quickly while the model ensemble remains materially divided.

Masked disagreement

The headline spread appears tighter than disagreement across important assumptions underneath it.

Conflicting movement

Support and resistance move inside the same mechanism, including asymmetric tilts.

Evidence concentration

A disproportionate share of weighted support comes from one mechanism or narrative cluster.

Stale thesis state

Underlying conditions move materially while the anchor remains comparatively unchanged.

Selection and ranking

Findings are ranked by detector score, decision weight, movement, spread, and mechanism coverage. Type caps prevent one detector from filling the entire public set. The selected findings can therefore change from one daily run to the next as the thesis state changes.

From finding to review path

Crene proposes what to examine next and what would distinguish the branches.

Suggested actions are constrained to research, review, escalation, and thesis maintenance. They are not portfolio instructions.

01

Finding

A detector establishes an observable pattern in the thesis map.

02

Suggested action

A constrained rule proposes a research or review step tied to the detected pattern.

03

Resolution branches

Plausible branches explain how the thesis state would change under different resolutions.

04

Observable trigger

Each branch identifies the evidence that would make that resolution more consistent with the observed record.

The action layer does not recommend buying, selling, hedging, or position sizing. Portfolio decisions require mandate, exposure, liquidity, risk, and correlation context that the public thesis page does not possess.

Governance and weekly freezes

The record cannot be silently rewritten after the fact.

Daily reads are timestamped when polled. Weekly reviews freeze those reads against the structure version that was live at the time.

Immutable state

A completed weekly record preserves consensus, available model reads, spread, movement, freshness, and the active structure reference.

Versioned structure

Promoted assumption, taxonomy, pathway, or relationship changes create a new structure version with a changelog.

No backfilling

A native weekly freeze records what the system said during that week. It does not resample or substitute a later answer.

Live corpus

Current system coverage

These figures are loaded from the live Crene APIs. Counts change as questions resolve and scenario, cluster, and factor records are updated.

—

active binary questions

Active binary anchors and assumptions are supplied by the live analytics API.

—

archived scenario records

Preserved scenario maps with API supplied component and pathway coverage.

—

resolved short horizon questions

Earlier short horizon events resolved against named sources and included in the scoring layer.

—

independent models

Claude, GPT, Gemini, and Grok are polled without seeing one another.

Calibration evidence

Calibration is the trust layer, not the product claim.

Crene's resolved scoring corpus comes from an earlier phase focused on short horizon events. It demonstrates that the forecasting pipeline operates end to end, from advance question definition and independent model polling through source governed resolution and scoring. It does not establish the accuracy of Crene's current long horizon thesis maps, which are a different question class and are now building their own forward record.

Metric	Current value	Population	Purpose
Leakage controlled consensus	—	n=—	Tests ensemble calibration against a — base rate Brier. Reported skill: —.
Macro ex earnings	0.2294	n=320	Supplementary evidence for a thinner macro population. Reported separately from the leakage controlled benchmark.

Calibration record updates from the live Crene analytics layer.

As of JUL 2026. The resolved corpus and calibration bins update as new outcomes are recorded.

Claims and limitations

The methodology separates what is measured from what remains a hypothesis.

What is measured

Timestamped probabilities, resolved outcomes, Brier scores, calibration bins, movement, model spread, detector inputs, and frozen review records.

What is structural

Assumption maps, mechanisms, pathways, driver families, authored relationships, ontology fields, and editorial taxonomies.

What remains open

Whether daily movement contains incremental decision value, whether detector findings improve outcomes, and whether disagreement across models predicts realized uncertainty.

Crene does not claim

That AI forecasts outperform liquid prediction markets.
That model agreement reliably predicts correctness.
That a probability is a deterministic outcome.
That calibration automatically validates a thesis decomposition.
That suggested actions are trading recommendations.

That calibration validates decomposition

Prompt design, model selection, taxonomy, probability bands, weighting, and category structure can change results. Public records preserve the configuration and structure version needed to interpret a historical read, but full robustness across configurations has not yet been established.

That thin categories establish domain specific skill

Several categories remain statistically thin. Category level Brier and accuracy rows are exploratory when sample sizes are small. The page reports sample counts so a reader can distinguish evidence from apparent precision.

Decomposition systems

Different questions require different structural representations.

The current long horizon maps are a different question class from the earlier resolved corpus. Their present value is making uncertainty inspectable, contestable, and eventually scorable while their own forward record matures.

Clusters

Binary thesis maps

A binary anchor is decomposed into governed, falsifiable assumptions grouped by mechanisms or categories.

Factors

Continuous distributions

A continuous anchor is represented as model percentile distributions and a driver matrix.

Scenarios

Coherent world states

A strategic question is represented as multiple internally coherent pathways rather than one compressed probability.

Cluster construction

Candidate assumptions are generated, filtered for falsifiability and specificity, screened for useful probability range, deduplicated, and manually curated before entering the live map. The taxonomy and probability band are editorial choices, not neutral truths.

Factor construction and calibration

Factors use p5, p25, p50, p75, and p95 outputs rather than binary probabilities. Meaningful factor calibration requires multiple resolved horizons and will be evaluated using interval coverage, percentile hit rates, and continuous ranked probability score.

Scenario construction

Scenarios use joint state reasoning across binary and continuous components. Pathways fix only load bearing variables and leave other variables determined by the models. Pathway labels and distributions are editorial framings rather than probability weights.

Relationship and ontology discipline

Structural recurrence means the same theme appears across multiple anchors. It does not, by itself, establish covariance, statistical dependence, predictive correlation, or causal influence. Those require separate empirical evidence.

Inspect the system

Follow the methodology into the live record.

Inspect a live thesis, review the resolved corpus, or see the broader map of scenarios and factors.

Live thesis →Resolved record →

Crene Methodology