Token and context usage metrics
Hardproof includes a usage overlay in every scan report under usage_metrics. The overlay measures how much context a server consumes for an agent, especially around tools/list, schema payloads, and typical response sizes.
Why this exists
- Oversized tool catalogs crowd out the actual user task from the model context.
- Oversized responses make agents brittle (truncation, high cost, hard-to-diff regressions).
- Schema bloat increases the steady-state prompt budget for every tool call.
What’s measured
The report includes byte counts and token counts for:
- Tool catalog: size of
tools/listand its token footprint. - Descriptions and tool count: average and max description size, plus overall tool count.
- Schema footprint: total tool input schema size and token footprint.
- Response footprint: typical response payload token footprint (p50/p95).
- Metadata-to-payload ratio: how much schema and descriptor overhead the server adds compared with the actual payload it returns.
Truth classes
There is no universal single “real token count” for an MCP server unless you either choose a tokenizer family or ingest a real client trace. Hardproof makes this explicit in usage_metrics.usage_mode:
usage_mode=estimate: deterministic estimate fallback (default).usage_mode=tokenizer_exact: exact counts under a chosen tokenizer profile (for example--tokenizer openai:o200k_base).usage_mode=trace_observed: observed counts from a real client trace (--token-trace).usage_mode=mixed: per-metric mix of exact + observed when both are available.
Estimator metadata
In estimate mode, the usage overlay records estimator_family, estimator_version, and confidence next to the estimate fields. These values are deterministic comparison signals, not billing-grade truth.
Why two token estimates exist
The report keeps both cl100k and o200k tool-catalog estimates so consumers can compare context pressure across the model families that are commonly in use.
How to keep usage healthy
- Keep tool descriptions short and remove redundant examples.
- Prefer fewer tools with clearer names over many narrowly-scoped tools.
- Return only necessary fields; paginate and filter instead of returning “full objects”.
CI policy
Hardproof can gate on usage directly with thresholds such as --max-avg-tool-description-tokens, --max-tool-count, and --max-metadata-to-payload-ratio-pct.
Next
- Methodology: /hardproof/methodology
- Report format: /hardproof/report-format