Documentation

Methodology

TrustBench is a public registry of x402-style endpoints with nightly liveness telemetry and Ed25519-signed scorecards. This page documents exactly how data is collected, how scores are computed, and what each metric represents — so anyone integrating against the registry knows what they're working with.

Data collection

  • A scheduled job runs once per day on a single cloud host.
  • For each provider URL, the prober sends three sequential requests per run, sampled across us-east / eu-west / asia-southeast tags. Single host today; multi-host is on the roadmap.
  • Each request is HEAD with an 8-second timeout, falling back to GET if the server returns 405.
  • HTTP status codes 200, 201, 204, 401, 402, 403, 404, 405, 429 are recorded as "endpoint is alive". Other statuses, connection errors, and timeouts are recorded as failures.

Scoring

score = 15
      + 45 · successRate
      + 35 · latencyHealth        // max(0, min(1, 1 - p50 / 2000))
      +  3 · consistencyBonus     // max(0, min(1, 1 - jitter))
clamped to [40, 98]

p50 and p95 latency are computed over successful probes only, using linear-interpolation percentiles. Timeouts contribute to reliability but are excluded from the latency calculation, so a single failure does not distort the latency number.

What this measurement does NOT tell you

  • Score reflects reachability and response time, not capability quality. A 4xx or 429 response confirms the endpoint is up and responding, but does not confirm the underlying API behaves correctly when authenticated and paid.
  • Latency is single-origin. All measurements come from one host today. Real-world latency from an agent's location will differ.
  • Payment behavior is not yet measured. The current probe does NOT execute x402 payments, observe settlement latency, or validate payment-gated responses. A capability-aware paid-probe layer ships alongside the router.
  • Scorecards are signed with Ed25519. The public key is served at /.well-known/trustbench-pubkey for any third party to verify a TrustBench scorecard independently.

Verifying a scorecard or receipt

Every scorecard returned by /rankings/paid and every receipt at /receipts/:id carries an Ed25519 signature you can verify offline using the published public key.

npm run verify-scorecard
npm run verify-receipt -- <receipt_id>
npm run verify-receipt -- <receipt_id> --check-chain

Reference verifier in scripts/verify-scorecard.js →

Roadmap

Phase 0 · DONE
Honest framing
Public registry positioning + measurement-honest copy
Phase 1 · DONE
Ed25519 scorecard signing
Ed25519 keys generated, public key published, reference verifier shipped
Phase 2 · DONE
Builder validation
Three real conversations + written expressions of interest (closed 2026-04-30)
Phase 3 · DONE
Non-custodial router
Idempotency, hard spend caps, signed receipts, /receipts/:id audit (closed 2026-05-04)
Phase 4a · DONE
Discovery surfaces + reservation caps + first paid receipt
/skill.md, /llms.txt, /.well-known/trustbench.json, strict reservation-based spend caps, and the first public paid x402 receipt against a real provider all shipped 2026-05-04 / 2026-05-06
Phase 4b · CURRENT
Paywalled API + receipt explorer + partner integrations
In progress: x402-native paywalled API endpoints (per-call pricing surface), public /explorer for receipts, formal partner integrations
Phase 5 · FUTURE
p402 / Canton expansion
Multi-protocol settlement (after x402 path is stable)

Phase 3 router shipped 2026-05-04

First paid x402 receipt: rcpt_01KQY7C44GAPSXZPFQYRZ1D10C — verifiable on-chain.

View receipt →