Hardproof quality report pipeline (draft)

This is a draft outline for the first quality report pipeline. It is scaffolding for methodology and reproducible aggregation, not a public proof page yet.

Data flow (proposed)

Corpus selection: a checked-in manifest describing which targets are in-scope, with explicit exclusions and rationale.
Run manifests: pinned tool versions, baseline versions, and deterministic scan inputs (including trust inputs when required).
Execution: run hardproof corpus run to produce a stable per-target directory of scan artifacts.
Aggregation: an offline script/notebook that computes summary tables and charts from corpus outputs without scraping logs.
Publication: a report viewer page that embeds the methodology appendix and links to reproducible artifacts.

Methodology appendix (sections)

Score truth: publishable vs partial vs insufficient, and what evidence is required before a number is treated as publishable.
Confidence: which signals are estimate-grade (usage metrics) and which probes are bounded smoke signals (performance), including any confidence markers and sample counts.
Fairness: what comparisons are and are not meaningful across different server classes and transports.
Determinism: what makes corpus runs reproducible and what is intentionally excluded (non-deterministic load testing, LLM judging).
Interpretation: how to interpret warnings vs failures vs partial scores, and what follow-up evidence is expected before making stronger claims.