Hardproof quality report pipeline (draft)
This is a draft outline for the first quality report pipeline. It is scaffolding for methodology and reproducible aggregation, not a public proof page yet.
Data flow (proposed)
- Corpus selection: a checked-in manifest describing which targets are in-scope, with explicit exclusions and rationale.
- Run manifests: pinned tool versions, baseline versions, and deterministic scan inputs (including trust inputs when required).
- Execution: run
hardproof corpus runto produce a stable per-target directory of scan artifacts. - Aggregation: an offline script/notebook that computes summary tables and charts from corpus outputs without scraping logs.
- Publication: a report viewer page that embeds the methodology appendix and links to reproducible artifacts.
Methodology appendix (sections)
- Score truth: publishable vs partial vs insufficient, and what evidence is required before a number is treated as publishable.
- Confidence: which signals are estimate-grade (usage metrics) and which probes are bounded smoke signals (performance), including any confidence markers and sample counts.
- Fairness: what comparisons are and are not meaningful across different server classes and transports.
- Determinism: what makes corpus runs reproducible and what is intentionally excluded (non-deterministic load testing, LLM judging).
- Interpretation: how to interpret warnings vs failures vs partial scores, and what follow-up evidence is expected before making stronger claims.