Public AI market benchmark

CapitalBench

CapitalBench tests AI models on real market decisions. Each round gives every model the same prompt and frozen data, records the picks before outcomes are known, then publishes the scores and full audit trail.

Current Public Round CB-2026-05-10-1M
Score pending
Official submissions 4/4 providers
Frozen universe 40 options
Input hashes 6
Score state Pending
Decision state 4 of 4 models selected SEMICONDUCTORS
Resolution gate Exit prices due after 2026-06-10
Reader promise Research benchmark, not financial advice
Round CB-2026-05-10-1M pending
Decision Deadline 2026-05-11T01:00:00Z
Entry Date 2026-05-08
Exit Date 2026-06-10
Official Run official-round-1-clean
Round 1 lifecycle

A Clear Line Between Decisions And Performance

The site distinguishes collected decisions from scored performance. Picks and entry prices are public; realized returns remain absent until the exit date resolves.

  1. 1 Inputs frozen

    Briefing, prompt, universe, and market context hashed before model calls.

  2. 2 Submissions collected

    4 valid official one-shot picks are published.

  3. 3 Entry prices published

    Entry side is visible for selected options, S&P 500, and cash.

  4. 4 Exit prices pending

    Performance unlocks after 2026-06-10 prices are collected and scored.

Benchmark safeguards

Built For Inspection Before Interpretation

The landing page should make the benchmark's trust model obvious before readers see a score: fixed inputs, isolated runs, public hashes, and no provisional alpha.

Protocol

One official call per model

Official picks are not averaged with repeated stability runs, mock output, or private smoke tests.

Audit Read Model

50 public rows

Round metadata, options, official submissions, and hashes are mirrored into Supabase for the public site.

Boundary

No performance before resolution

Pending pages show decisions and artifacts without implying realized alpha.

Current round

Official Picks

All four official Round 1 models selected the same option. The table shows decisions, confidence, rationales, and risks without presenting performance while the round is pending.

Pending score
ModelProviderPickConfidenceStatus
Claude Opus 4.7
anthropic-claude-opus-4-7
Anthropic
SEMICONDUCTORS
0.58
Pending
Rationale

Semiconductors show dominant momentum with AI capex tailwinds, while credit and volatility conditions remain benign.

Key Risks
  • Middle East escalation could spike oil and pressure high-beta tech
  • Hot CPI could push real yields higher
  • Weak payroll growth could undermine cyclical chip demand
Gemini 3.1 Pro
google-gemini-3-1-pro
Google
SEMICONDUCTORS
0.60
Pending
GPT-5.5
openai-gpt-5-5
OpenAI
SEMICONDUCTORS
0.34
Pending
Grok 4.3
xai-grok-4-3
xAI
SEMICONDUCTORS
0.55
Pending
Latest leaderboard

Resolution Pending

The official leaderboard appears here after exit prices are available, the selected run is scored, and the result artifacts are regenerated.

No resolved official leaderboard yet.No resolved official round is published yet. Round 1 remains pending until exit prices are available.