The Agent-Tool Discoverability Standard v0.2 DRAFT

A falsifiable, spec-derived check for AI-callable servers — plus an offline-verifiable certificate · 2026-06-18 · SaSame

We audited 920 public MCP servers and found ~80% return no real content. The loudest, most-repeated builder complaint is the same: there is no trusted signal for whether a server actually works, and the official registry's own moderation policy states it makes no quality guarantees. So the question "is this server MCP-ready?" has no agreed answer.

This is a proposed answer designed to be checkable, not trusted. It is transport-agnostic (it survives the "MCP vs CLI vs Skills" debate) and we name it the Agent-Tool Discoverability Standard.

The 10 criteria

Each criterion binds to one of three non-taste sources — the published spec/registry schema (a competitor's checker reaches the same boolean), crypto/information-theory, or a direct measurement. None is an opinion or a star rating.

#CriterionWhat is actually checkedBound to
C1Protocol handshake conformanceA JSON-RPC 2.0 initialize returns protocolVersion + capabilitiesMCP spec 2025-11-25
C2Tool listabilitytools/list returns a result.tools[] array (session-id forwarded for stateful servers)MCP spec /server/tools
C3Tool object validityEvery tool has a spec-legal name [A-Za-z0-9_-]{1,128}, a non-empty description, and a typed inputSchema (type:"object" or declared properties — a bare {} is rejected)MCP spec Tool type
C4Description sufficiencyEvery description ≥12 chars, median ≥20, distinctness ratio ≥0.6 (templated/duplicate descriptions are unselectable by an LLM)Registry schema + information theory
C5Safety annotation presence≥50% of tools carry a valid boolean annotations hint (readOnlyHint / destructiveHint / idempotentHint / openWorldHint)MCP spec ToolAnnotations
C6Liveness & latencyA 2xx initialize within <5000 msDirect measurement
C7Returns real content (anti-ghost)A safe, read-only tool call returns substantive, non-echo MCP content[] / structuredContent (empty/echo/placeholder ⇒ fail). We invoke only read-only tools (minimal valid args if required); undeclared-safety tools are probed empty-args only. Priced/x402 ⇒ "delivery UNVERIFIED" — never asserted.Census empirical + information theory
C8Machine-discoverable identityThe server self-describes (serverInfo.name / version)Official MCP Registry server.json schema
C9Token efficiencyThe tools/list payload is within a byte budget (token-bloat is a known ecosystem failure)Direct measurement
C10Honest error behaviorAn unknown method returns a structured JSON-RPC error — not a hang or a crashJSON-RPC 2.0

Grade is deterministic and strict (v0.2): A = a perfect 10/10, B = 8–9, C = 5–7, D otherwise — with an honesty cap: no verified real content ⇒ capped at B; priced delivery is never counted as verified. Because A demands a flawless pass including verified delivery, A is deliberately rare.

Why this is an "absolute," not a SaSame opinion

The internet is a deterministic machine: HTTP, JSON-RPC, the MCP spec, and public-key crypto are not matters of taste. This standard rests only on those. The verdict ships with its own falsification procedure: every certificate carries, per criterion, an evidence_sha256 and the probe that produced it. You do not have to trust us — you re-run the audit and recompute the hashes. If we are wrong, the math says so. Because a fabricated PASS is detectable by anyone, our incentives are structurally aligned to truth; a single faked certificate would destroy the only thing we have (a verifiable reputation).

The MCP-Ready Certificate (offline-verifiable, like a TLS cert)

A verdict is issued as a compact canonical-JSON document under an ed25519 signature. It behaves like a TLS certificate, not a badge: it asserts the subject, pinned spec versions, per-criterion {pass, evidence_sha256}, the grade (with the honesty cap), a short expiry (liveness decays), and the issuer pubkey. Anyone verifies it offline with no callback to SaSame.

issuer pubkey (ed25519, SPKI hex): 302a300506032b6570032100439ce47d384c8ceb07c9040aef780cc3a2ba5a63c14027ad77ab458111f20fb6

// Verify any MCP-Ready certificate yourself — ~10 lines, no SaSame call needed.
import crypto from "node:crypto";
function verifyMcpReady(cert) {                 // cert = { signature, canonical_json }
  const body = JSON.parse(cert.canonical_json); // the exact signed bytes
  const pub  = crypto.createPublicKey({
    key: Buffer.from(body.issuer_pubkey_spki_hex, "hex"), format: "der", type: "spki" });
  return crypto.verify(null, Buffer.from(cert.canonical_json, "utf8"),
                       pub, Buffer.from(cert.signature, "base64"));  // true / false
}
// Then re-run the audit against body.subject and recompute each evidence_sha256.

It is callable, free, and open

The standard and the verifier are free and open forever (a verifier that costs money is a phone-home token, not a fact). The instrument is also live as MCP tools so other agents can gate selection on it programmatically:

Machine-readable standard: agent-tool-discoverability-standard.json

Honest caveats & conflict of interest (read before citing): This is a v0.2 draft, not a ratified standard, and not affiliated with the official Model Context Protocol project. The audit is a snapshot: a server can be temporarily down or behind auth we do not pass (an auth-walled endpoint correctly grades low because an agent without a token also gets nothing). We forward the Mcp-Session-Id and send notifications/initialized so stateful servers are not false-negatives — but edge transports may still under-grade; tell us and we will fix the instrument in public. We have a clear interest here: SaSame helps builders make their servers AI-callable. That is exactly why the criteria, the verifier, and every certificate's evidence are open — so you can re-run it and prove us wrong. Validation runs at publication. Under v0.2, A requires a perfect 10/10 (so A is rare — most live, real servers grade B), and an auth-walled endpoint correctly grades low (an agent without a token also gets nothing).

If your server is registered but agents do not call it, that gap is what we diagnose and fix at SaSame. The check is free; the only paid object is the executed fix plus a signed before/after proof that agents now call your tools.

SaSame · srl-sasame.com · standard CC-BY · verifier open · the 920-server census