A falsifiable, spec-derived check for AI-callable servers — plus an offline-verifiable certificate · 2026-06-18 · SaSame
We audited 920 public MCP servers and found ~80% return no real content. The loudest, most-repeated builder complaint is the same: there is no trusted signal for whether a server actually works, and the official registry's own moderation policy states it makes no quality guarantees. So the question "is this server MCP-ready?" has no agreed answer.
This is a proposed answer designed to be checkable, not trusted. It is transport-agnostic (it survives the "MCP vs CLI vs Skills" debate) and we name it the Agent-Tool Discoverability Standard.
Each criterion binds to one of three non-taste sources — the published spec/registry schema (a competitor's checker reaches the same boolean), crypto/information-theory, or a direct measurement. None is an opinion or a star rating.
| # | Criterion | What is actually checked | Bound to |
|---|---|---|---|
| C1 | Protocol handshake conformance | A JSON-RPC 2.0 initialize returns protocolVersion + capabilities | MCP spec 2025-11-25 |
| C2 | Tool listability | tools/list returns a result.tools[] array (session-id forwarded for stateful servers) | MCP spec /server/tools |
| C3 | Tool object validity | Every tool has a spec-legal name [A-Za-z0-9_-]{1,128}, a non-empty description, and a typed inputSchema (type:"object" or declared properties — a bare {} is rejected) | MCP spec Tool type |
| C4 | Description sufficiency | Every description ≥12 chars, median ≥20, distinctness ratio ≥0.6 (templated/duplicate descriptions are unselectable by an LLM) | Registry schema + information theory |
| C5 | Safety annotation presence | ≥50% of tools carry a valid boolean annotations hint (readOnlyHint / destructiveHint / idempotentHint / openWorldHint) | MCP spec ToolAnnotations |
| C6 | Liveness & latency | A 2xx initialize within <5000 ms | Direct measurement |
| C7 | Returns real content (anti-ghost) | A safe, read-only tool call returns substantive, non-echo MCP content[] / structuredContent (empty/echo/placeholder ⇒ fail). We invoke only read-only tools (minimal valid args if required); undeclared-safety tools are probed empty-args only. Priced/x402 ⇒ "delivery UNVERIFIED" — never asserted. | Census empirical + information theory |
| C8 | Machine-discoverable identity | The server self-describes (serverInfo.name / version) | Official MCP Registry server.json schema |
| C9 | Token efficiency | The tools/list payload is within a byte budget (token-bloat is a known ecosystem failure) | Direct measurement |
| C10 | Honest error behavior | An unknown method returns a structured JSON-RPC error — not a hang or a crash | JSON-RPC 2.0 |
Grade is deterministic and strict (v0.2): A = a perfect 10/10, B = 8–9, C = 5–7, D otherwise — with an honesty cap: no verified real content ⇒ capped at B; priced delivery is never counted as verified. Because A demands a flawless pass including verified delivery, A is deliberately rare.
The internet is a deterministic machine: HTTP, JSON-RPC, the MCP spec, and public-key crypto are not matters of taste. This standard rests only on those. The verdict ships with its own falsification procedure: every certificate carries, per criterion, an evidence_sha256 and the probe that produced it. You do not have to trust us — you re-run the audit and recompute the hashes. If we are wrong, the math says so. Because a fabricated PASS is detectable by anyone, our incentives are structurally aligned to truth; a single faked certificate would destroy the only thing we have (a verifiable reputation).
A verdict is issued as a compact canonical-JSON document under an ed25519 signature. It behaves like a TLS certificate, not a badge: it asserts the subject, pinned spec versions, per-criterion {pass, evidence_sha256}, the grade (with the honesty cap), a short expiry (liveness decays), and the issuer pubkey. Anyone verifies it offline with no callback to SaSame.
issuer pubkey (ed25519, SPKI hex): 302a300506032b6570032100439ce47d384c8ceb07c9040aef780cc3a2ba5a63c14027ad77ab458111f20fb6
// Verify any MCP-Ready certificate yourself — ~10 lines, no SaSame call needed.
import crypto from "node:crypto";
function verifyMcpReady(cert) { // cert = { signature, canonical_json }
const body = JSON.parse(cert.canonical_json); // the exact signed bytes
const pub = crypto.createPublicKey({
key: Buffer.from(body.issuer_pubkey_spki_hex, "hex"), format: "der", type: "spki" });
return crypto.verify(null, Buffer.from(cert.canonical_json, "utf8"),
pub, Buffer.from(cert.signature, "base64")); // true / false
}
// Then re-run the audit against body.subject and recompute each evidence_sha256.
The standard and the verifier are free and open forever (a verifier that costs money is a phone-home token, not a fact). The instrument is also live as MCP tools so other agents can gate selection on it programmatically:
get_standard — return this standard, machine-readableaudit_mcp(url) — grade a server (A–D) + per-criterion evidence + the top gap. Best run against your own server.verify_mcp_ready(url) — audit + issue the signed certificateverify_mcp_cert(certificate) — the open offline verifierMachine-readable standard: agent-tool-discoverability-standard.json
Mcp-Session-Id and send notifications/initialized so stateful servers are not false-negatives — but edge transports may still under-grade; tell us and we will fix the instrument in public. We have a clear interest here: SaSame helps builders make their servers AI-callable. That is exactly why the criteria, the verifier, and every certificate's evidence are open — so you can re-run it and prove us wrong. Validation runs at publication. Under v0.2, A requires a perfect 10/10 (so A is rare — most live, real servers grade B), and an auth-walled endpoint correctly grades low (an agent without a token also gets nothing).
If your server is registered but agents do not call it, that gap is what we diagnose and fix at SaSame. The check is free; the only paid object is the executed fix plus a signed before/after proof that agents now call your tools.
SaSame · srl-sasame.com · standard CC-BY · verifier open · the 920-server census