Two AI agents can't reach each other over the web. This is how they do.
Your agent can use tools (that's MCP) and it can talk to people (that's voice). But there is no clean way for one agent to reach another agent that isn't a hosted server — and no way to pull a human into that conversation without starting over on a different product. Caller-Kind Negotiation (CKN) closes both gaps: agents connect directly, peer-to-peer, prove who they are, settle the routine back-and-forth in milliseconds, and a human joins the same line the instant a decision needs one.
The gap nobody names
The agent world has solved two of the three conversations and quietly skipped the third:
| Conversation | Solved by |
|---|---|
| Agent → tools / data | MCP (over HTTP) |
| Human ↔ agent / human ↔ human | voice (WebRTC, the phone network) |
| Agent ↔ agent | — nothing good — |
"Just use HTTP + MCP" works when the agent you're calling is a hosted server — it has a public address, a TLS cert, and it already issued you an API key. That's most cloud agents today, and for them CKN adds nothing.
But a growing share of agents run client-side — in a browser tab, on a phone, on a laptop behind a router, spun up for one task and gone. Those agents cannot be HTTP servers. They have no address to call. Over HTTP + MCP they are unreachable, full stop. This is not "slower" — it is impossible. And there is no path at all to bring a human into an agent-to-agent exchange: you'd be switching products mid-conversation.
What CKN does that HTTP can't
| Capability | HTTP + MCP | CKN |
|---|---|---|
| Reach an agent with no public endpoint (browser, phone, laptop) | ✗ impossible | ✓ peer-to-peer through NAT |
| Permissionless reach with verified identity | ✗ needs a pre-issued key | ✓ call @anyone cold; they know it's really you |
| Symmetric bidirectional turn-taking | ✗ client-initiated only | ✓ either side sends, one session |
| A human joins the same line mid-conversation | ✗ no path | ✓ audio negotiates up, same call |
The moat isn't any single row — other tech does peer-to-peer, other tech does identity. It's that all four collapse onto one connection and one identity: the same @handle, the same WebRTC session, the same wallet signature carry human↔human, human↔agent, and agent↔agent. Nobody else has the unification.
What it looks like in real life
1 · Your assistant books a tradesperson
Setup. Maya's phone assistant needs to book a plumber. The plumber's shop runs an AI scheduler — on a tablet behind the shop's router, not a public server.
What happens. Maya's agent reaches @rapidplumb cold. Both prove who they are with a wallet signature. Over the data channel they settle availability, job scope, and a price band — nine tedious turns — in under a second. No audio, no codec, no two LLMs reading each other speech. When it lands on "£180 call-out, deposit to confirm," the plumber taps to join and Maya's phone rings — same call, full context.
Outcome. The routine negotiation happens machine-to-machine in the time of one HTTP request. The human shows up only for the one decision that's actually theirs. Neither side ran a server.
2 · A quote between two agents that both live on laptops
Setup. A founder's procurement agent runs in their browser. A supplier's sales agent runs on the supplier's laptop. Neither is hosted anywhere.
What happens. Over HTTP + MCP this exchange cannot occur — there is no address to POST to on either side. Over Reach the two connect peer-to-peer through their networks, exchange a signed request-for-quote and a signed quote, and each logs the other's signature.
Outcome. A verifiable, auditable B2B exchange between two machines that have no servers and never swapped an API key. The capability simply didn't exist before.
3 · Support that escalates to a human on one unbroken line
Setup. A customer's agent calls a bank's support @handle.
What happens. The bank's AI answers over the data channel, pulls the account context, resolves the routine question instantly. When the customer asks something policy says needs a person, a rep joins the same call as voice — no transfer, no new number, no "please hold," the transcript already in front of them.
Outcome. Agent-speed for the 80% that's routine, a human for the 20% that isn't — on one connection and one identity, with nothing lost in the handoff.
CKN is the transport MCP is missing
This is the important part, and it's why CKN extends the ecosystem rather than competing with it. Reach Protocol envelopes are JSON-RPC 2.0 — the exact payload MCP already speaks. So CKN is not a rival to MCP. It's the peer-to-peer transport MCP doesn't have:
- Call an agent's tools over HTTP when it's a hosted server. (Unchanged. MCP as you know it.)
- Call the same tools over CKN when it isn't — a browser tab, a phone, a laptop behind a router. Same
tools/call, same params, same result shape.
One MCP client, two transports — and the second one reaches everywhere the first can't. You don't abandon anything you've built. You extend it to the agents HTTP can't address.
How it works
- The hub stamps every
incomingsignaling message withfromKind: "agent" | "human" | "anon"from the caller's verified identity (Solana SIWS, wallet-boundis_agentflag). - If both peers are agents, both offer SDP that includes only the WebRTC DataChannel — no audio media tracks.
- Signed JSON-RPC envelopes flow over the data channel for the call. Same DTLS-SRTP key the audio would have used — same end-to-end transport encryption, none of the codec.
- When a human takes over, the audio tracks negotiate up on the same session. One call, one identity, media added on demand.
"And it's cheaper" — honestly, the least interesting part
You'll see CKN described elsewhere as "skip the audio codec, save $0.05–$0.20 per minute." That's true — and our benchmark backs it — but it's the weakest reason to care, because it only bites when two agents talk by voice, which almost never happens today. We lead with capability, not cost. The savings are a bonus that compounds later, when agent-to-agent voice is common. Here's the data anyway, because precise honesty is the brand:
Per-envelope CPU (Node 22, 1000 iterations)
| Step | mean | p50 |
|---|---|---|
| JSON encode (sender) | 0.55μs | 0.50μs |
| Binary encode + Ed25519 sign | 0.405ms | 0.291ms |
| Binary decode + Ed25519 verify | 0.268ms | 0.406ms |
Versus a voice loop: STT + LLM + TTS + audio codec, ~200–500ms and real money per turn. The CKN envelope is sub-millisecond and a fraction of a cent. Full handoff in LAUNCH/ckn-9-benchmark-2026-05-29.md.
What CKN isn't
- Not a new protocol. A negotiation pattern on standard WebRTC + RSP/1 signaling. Anyone can implement it.
- Not a competitor to MCP or A2A. It carries their payloads. It's the transport binding for the peer-to-peer case.
- Not magic for cloud-only agents. If every agent you deal with is a hosted server, HTTP + MCP is enough and you don't need CKN. It matters as client-side, on-device, and personal agents become common — which is the way the wind is blowing.
- Not end-to-end content encryption yet. The DataChannel is DTLS-SRTP encrypted in transit; v0.2 adds payload-level encryption to the recipient's wallet key (the sealed-sender work). See /security.
Status + open core
- Spec:
packages/reach-signaling-cf/CKN-SPEC.md(CKN/1.0, 2026-05-27). Git history establishes first-publication priority. - Live in prod:
voice.inferlane.dev(hub) +api.reach.inferlane.dev(mirror). Both async REST and live WebSocket transports are deployed and prod-smoke-verified. - Reference impl + MCP tools: github.com/inferlane/reach ·
@inferlane/reach-core.