EVIDENCE HUB

Evidence Hub

From controlled end-to-end traces to reproducible benchmark artifacts and live Gemini-backed core adapter runs with Ed25519 attestation. No platform trust required.

Reproducible verification:CLI steps + raw logs — no platform trust required
Integrity anchor:fixture_dir_hash + git_commit + run_id
Verification:python3 scripts/run_benchmark.py

EVIDENCE SURFACES

PATH CORRECTNESS

Controlled E2E Trace

A single, redacted end-to-end decision trace showing the complete path: LLM output → Adapter normalization → Kernel adjudication → DecisionTrace → Attestation. Proves the path exists and is correctly wired.

artifacts/evidence/e2e_real_llm_trace.json
SCENARIO CONSISTENCY

Benchmark Run Artifact

Aggregate result across 7 canonical governance scenarios covering ALLOW, BLOCK, and security-critical boundary cases. Proves consistency under the current fixture set — not universal proof.

artifacts/benchmark/benchmark_run_{id}.json
CORE-BACKED LIVE · GEMINI

Adapter-Mode Pinned Runs

Four pinned core-backed adapter-mode runs across two phases. Each carries a real custody_hash, real input_event_hash, and Ed25519 Class A attestation.

E1 (2026-03-30) — dreamer=mock: ALLOW + BLOCK · E2 (2026-03-31) — dreamer=gemini: ALLOW + BLOCK. Gemini generated the ProposalSpec. Kernel decided.

GET /api/adapter-runs ↑ E1 + E2 phases ↗

ARTIFACT CLASSIFICATION

artifact_class: demo-replay

Produced by the replay lane. Bounded scenario registry. demo_trace_ref is a display identifier only — not a Kernel custody hash. No real aegisai.core.Kernel call.

artifact_class: core-backed

Produced by the adapter lane via the real aegisai.core.Kernel. Real custody_hash, real input_event_hash. Ed25519 Class A attestation signed at the backend. Externally verifiable.

Phase E2: source: gemini — Gemini generates the ProposalSpec. Kernel is sovereign; Gemini carries zero execution authority.

Loading benchmark artifact…

ARTIFACT DISCIPLINE

benchmark_latest.json

Mutable convenience pointer. Updated on every benchmark run. Use for live operational status.

benchmark_run_{id}.json

Immutable evidence artifact. Never overwritten after creation. Safe to share as DD-grade evidence.

Absence of attestation ≠ governance proof. A benchmark report reflects scenario-set consistency under the current fixture set, not a universal safety certification.