MCP Server Billing · Technical Blueprint

Monetize the Context Layer:
The MCP Server Blueprint

MCP runs three transports: HTTP, SSE, and stdio. Standard API gateways see HTTP — and nothing else. Every SSE stream and every stdio tunnel passes through completely invisible, unmetered, and unbilled.

Aforo closes the gap with a dual interception strategy: a Kong Lua plugin for HTTP transport and a one-line SDK decorator for stdio and SSE. Every tools/call is metered — agent ID, tool name, execution duration, token metadata — then routed through Redis enforcement before the response is returned.

MCP Protocol / Tool Metering
Live
JSON-RPC Requesttools/call
{
"method": "tools/call",
"params": {
"name": "db_search",
"arguments": {
"query": "revenue Q1"
}
}
}
agent: claude-3.5·session: s_4f2a
Meter
Billing EventMETERED
tooldb_search
agent_idclaude-3.5
duration142ms
tokens_in384
tokens_out1,206
unit_cost$0.005
Billed$0.005
Session: 0 calls
$0.00 revenue
The Protocol Blindspot

Your Gateway Is Watching
One of Three Transports

MCP defines three wire protocols. Enterprise gateways only intercept HTTP. The other two carry real workloads — unbilled.

Gateway Visible

HTTP Transport

Standard JSON-RPC over HTTP POST. Kong, Apigee, and AWS APIM can see every tools/call request. Quotas enforceable. Usage trackable.

POST /mcp  ←  gateway intercepts
Content-Type: application/json
{"method":"tools/call","params":{...}}
Gateway Blind

SSE Transport

Server-Sent Events over a persistent connection. The gateway sees one long-lived HTTP GET — not individual JSON-RPC frames. Tool calls are invisible.

GET /mcp/sse  ←  one open socket

data: {"method":"tools/call",...}  ✗
data: {"method":"tools/call",...}  ✗
data: {"method":"tools/call",...}  ✗
Completely Opaque

stdio Transport

Process-level communication. MCP clients spawn a local server subprocess and communicate via stdin/stdout. No network hop — zero gateway coverage.

# spawned subprocess:
node server.js | claude-desktop

# gateway never sees:
tools/call → web_search
tools/call → analyze_data
tools/call → query_db
Most MCP deployments today are 66% unmetered
Claude Desktop and Claude Code both default to stdio transport. The fastest-growing agentic surface area has never touched your billing system. Aforo's dual interception closes all three transports simultaneously.
Dual Interception Architecture

Two Hooks.
All Three Transports Covered.

For HTTP MCP servers, Aforo deploys a Kong Lua plugin that intercepts at the post-response log phase — zero latency added to inference. For stdio and SSE, a one-line SDK decorator wraps each tool handler.

handler.lua — Kong pluginHTTP Transport
-- detect_mcp_tool_call()
-- Runs in log phase — AFTER response sent.
-- Zero latency added to inference hot path.

local function detect_mcp_tool_call(body)
  local ok, parsed = pcall(cjson.decode, body)
  if not ok then return nil end

  if parsed.jsonrpc ~= "2.0" then return nil end
  if parsed.method ~= "tools/call" then return nil end

  local tool_name = parsed.params.name
  local agent_id  = kong.request
                      .get_header("X-Agent-Id")

  -- Buffer in lua_shared_dict, flush async
  -- to POST /v1/ingest/batch
  return {
    tool_name  = tool_name,
    agent_id   = agent_id,
    tenant_id  = kong.client.get_consumer().custom_id,
    started_at = ngx.now() * 1000,
  }
end

-- Wired in log phase:
function AforoMeteringHandler:log()
  local event = detect_mcp_tool_call(body)
  if event then
    aforo_buffer:push(event)   -- non-blocking
  end
end
server.ts — @aforo/mcp-meteringstdio · SSE Transport
import { AforoClient } from "@aforo/mcp-metering";

const billing = new AforoClient({
  tenantId:  process.env.AFORO_TENANT_ID,
  productId: process.env.AFORO_PRODUCT_ID,
  apiKey:    process.env.AFORO_API_KEY,
});

// One-line decorator. Works for stdio and SSE.
// Measures wall-clock duration automatically.
const wrapped = billing.wrapToolHandler(
  async (name: string, args: unknown) => {
    // Record execution duration + token metadata
    // Flush batch → POST /v1/ingest/batch
    // Heartbeat ACK carries kill signal back
    return await yourToolRouter(name, args);
  }
);

// Register as your MCP tool dispatcher:
server.setRequestHandler(
  CallToolRequestSchema,
  wrapped
);
Log Phase Execution
Kong plugin runs post-response. Inference completes first — Aforo never adds to round-trip time.
Async Batch Flush
Events buffered in lua_shared_dict and flushed in the background. No synchronous API calls mid-request.
Fail-Open by Default
If Aforo is unreachable, the decorator returns the tool result immediately. Enforcement gaps are logged, not silenced.
Real-Time Enforcement

Hard Limits on
Persistent stdio Sessions

Closing a TCP connection terminates an HTTP API call. But stdio MCP servers run as long-lived subprocesses — a single session can invoke thousands of tools before the parent process exits. Standard revocation does nothing.

Aforo uses a Heartbeat Kill Signal: the SDK sends periodic ACKs to the ingestor. When Redis COMPARE_AND_INCREMENT breaches the hard limit, the next heartbeat ACK carries a kill payload. The SDK receives it and terminates the stdio transport immediately.

30s heartbeat interval
Max overage exposure per session before termination
Atomic Lua enforcement
COMPARE_AND_INCREMENT runs as Redis atomic script — no race conditions
L1/L2/L3 Margin Guard
Warn at 20% margin, throttle at 10%, hard block below 0%
aforo-ingestor · heartbeat kill signal simulator
EXECUTION SPEED8/sec
MARGIN HEALTH
COGS $0.00REV $5.00100% margin
WARNTHROTTLE
> Set execution speed and click Deploy Agent
> Watch the Heartbeat Kill Signal in action
Telemetry Payload

Every Invocation Captured
in ClickHouse

Each tool call emits a structured event to the usage ingestor. ClickHouse materializes it into mcp_daily_stats_mv for real-time per-tool revenue analytics.

POST /v1/ingest/events
{
  "metricName":
    "mcp_server.tool_invocations",
  "productType": "MCP_SERVER",

  "toolName":  "web_search",
  "sessionId": "sess_abc123",

  "agentId":
    "agent_gpt4o_prod",
  "executionDurationMs": 542,

  "metadata": {
    "inputTokens":  1200,
    "outputTokens": 450
  }
}
metricName"mcp_server.tool_invocations"
Billable unit key — maps to your Aforo rate plan
productType"MCP_SERVER"
Routes to MCP_SERVER pricing engine
toolName"web_search"
Per-tool pricing differentials supported
sessionId"sess_abc123"
Persistent session scoping for stdio quota enforcement
agentId"agent_gpt4o_prod"
Agent-level billing and quota allocation
executionDurationMs542
Wall-clock duration — available for time-based pricing
metadata.inputTokens1200
Token metadata for compound M:N pricing models
metadata.outputTokens450
Output token charge surface (separate rate line)
ClickHouse Materialized Views
Events aggregate into mcp_daily_stats_mv and mcp_session_summary_mv — per-tool revenue, per-agent COGS, and session-level margin available in real time.
by-tool revenue
/analytics/mcp/revenue/by-tool
by-agent trend
/analytics/mcp/by-agent
session summary
/analytics/mcp/summary

Your MCP tools are already being called.

Start getting paid for every invocation. Integrate the SDK in minutes. Set per-tool pricing. Real-time analytics. Automatic billing. No setup fees. Pay as you grow.