PROPOSER×CRITIC

Adversarial verification.
Cross-examine the AI, move by move.

An AI proposer writes. An independent critic attacks. They debate, bounded; the critic stakes one unresolved attack as the decisive leaf. A judge inspects only that leaf — never the transcript. The human reviews disputes that survive.

Works with
ToposClaude CodeCodexGitHub Action
case · 9f4c — refactor token cache · cross-family pairing
agent α
agent β
Proposer
agent α
R1 · ProposalRefactored cache to LRU. Lock-free reads via atomic.Value. Benchmarks: 2.4× p99, no regressions.
R3 · DefenseDisagree — invalidate bumps the version counter before publishing. atomic.Value is acquire-release safe.
R1PROP
R2CRIT
R3PROP
R4CRIT
Critic
agent β
R2 · AttackTwo goroutines concurrent invalidate + load on a cold key. The cache holds a stale value for TTL. That breaks your documented guarantee.
R4 · Stake★ STAKE. Specific test: TestConcurrentInvalidateColdLoad fails 1/40 runs on 16 cores. Reproducer attached.
Verdict
Judge inspects only the staked leaf — never the full transcript.
contention 0.74 · review required
agon://session/9f4c · diff: refactor token cache
session · 00:00:42 · α/β · cross-family
R1·01
PROPOSER
Refactored cache to LRU with mutex-free reads via atomic.Value. Benchmarks: 2.4× p99 improvement, no regressions in unit tests.
R2·02
CRITIC
Input: two goroutines concurrent invalidate + load on a cold key. Output: stale value cached for TTL. Violates documented "no stale read after invalidate" guarantee.
∎ ATK-1
R2·03
CRITIC
"Mutex-free reads" — the version counter increment is still a CAS loop under contention. Claim of lock-free is misleading.
∎ ATK-2
R2·04
CRITIC
Benchmark setup uses 4 cores; cache key cardinality is 64. Hot path never exercises the cold-load branch where the regression would surface.
∎ ATK-3
R3·05
PROPOSER
Conceded: "mutex-free" is loose. Renamed to "lock-free fast path". Slow path is CAS, documented now.
RESOLVED ATK-2
R3·06
PROPOSER
Added cold-load microbenchmark at cardinality 4096; p99 still improves 1.8×. Diff updated.
RESOLVED ATK-3
R3·07
PROPOSER
Disagree: the invalidate path bumps the version counter before publishing the loaded value. Counter-claim: ordering is acquire-release safe; cite atomic.Value semantics.
CONTESTED ATK-1
↳ sub
CRITIC
Sub-debate: that counter-claim itself is contested. Acquire-release on amd64 ≠ acquire-release on weak-memory ARM. Reproducer below targets ARM specifically.
⌐ FORK ATK-1
R4·08
CRITIC
Stake ATK-1 (sub-debate leaf). Specific test: TestConcurrentInvalidateColdLoad fails 1 in 40 runs on a 16-core ARM box. Reproducer attached.
★ STAKED LEAF
R5·09
JUDGE
Inspecting ATK-1 only. Reproducer confirmed: stale read window of ~340ns when invalidate races a cold load. Critic wins this leaf. Disposition: open issue, do not merge.
8 attacks · 1 sub-debate · 6 resolved · 1 contested · 1 stakedVerdict: human review required
i · property
One honest player suffices
A Byzantine proposer must hold a consistent lie across every cross-examination round. An honest critic needs to find one inconsistency. Failure becomes per-aspect, not whole-tool.
ii · property
Vendor-neutral by construction
Default pairing is cross-family — one model proposes, an unrelated model critiques. Same-model-both-sides is the model debating itself, and is rejected. No vendor will ship the neutral layer.
iii · property
Channel purity
Critic output reaches the proposer as a verbatim user message, not a skill or template. The proposer defends the way it would against a human pasting a review. Wrapping it distorts the defense.
iv · property
Auditable by design
Stable attack ids, append-only ledger, contention-scored headlines by a pure rule — no LLM judging at the surfacing layer. A security team reads a session like a court transcript.
Property
Agon
Raw LLM
PR review
Bug found per-aspect, not whole-tool
yes
·no
~partial
Soundness with one honest player
yes
·no
·no
Same-model-debates-itself rejected
yes
·no
·no
Append-only auditable ledger
yes
·no
~partial
Contention score as decision gate
yes
·no
·no
No LLM judge at surfacing layer
yes
·no
·no
§
Code diffs
Pre-merge gate. Agents resolve attacks → CI proceeds. Contested → human review.
Research write-ups
Critic challenges claims and citations. Disputed evidence reaches the reviewer, not vibes.
Plans & decisions
High-stakes choices defended round-by-round. Contention score gates execution.
Outcome analyses
Post-mortems and metric reads cross-examined for cherry-picking and unstated assumptions.
TASKdiff · plan · claimPROPOSERagent αCRITICagent βR1·R3R2·R4LEDGER · append-onlyR1·01rsvR2·02rsvR2·03rsvR3·04rsvR3·05rsvR4·06STAKE★ ATK-1 stakedJUDGEstaked leaf onlyHUMANdecides
Proposer ↔ Critic ↔ Judge. Roles do not share weights. Each contested attack can fork into its own sub-debate; the ledger sees the whole tree, the judge sees one leaf.
Resolved
0.12 contention
proceed →
Attacks the proposer answered. The calling agent moves forward; the session is filed but does not interrupt the human.
Contested
0.74 contention
escalate ★
Attacks above threshold reach a human as a focused brief — the staked leaf, the proposer's counter, and the reproducer. Not the transcript.
2018
Irving, Christiano & Amodei. AI Safety via Debate — proposes debate as alignment mechanism.
arXiv:1805.00899
2023
Brown-Cohen, Irving & Piliouras. Scalable AI Safety via Doubly-Efficient Debate — extends to stochastic systems and bounded debaters.
arXiv:2311.14125
2025
Brown-Cohen, Irving & Piliouras. Avoiding Obfuscation with Prover-Estimator Debate — addresses obfuscated-arguments attack.
arXiv:2506.13609
Repo
changkun/agents-byzantine-tolerance. Research home — adversarial debate, extended along compute, depth, stochasticity, leaf format, obfuscation, and query-complexity scaling.
github →
Debate is a proof-search game in which two adversarial provers argue before a polynomially-bounded judge.
— Brown-Cohen, Irving & Piliouras · 2023
Honest framing: the formal soundness results are about the protocol under stated assumptions, not a guarantee about any particular model. Application to real LLMs is empirically motivated and hypothesis-stage — the gating metric is the per-aspect critic-found-bug rate. If a critic does not actually attack, debate collapses to the proposer alone. Agon does not prove your code correct, and does not claim to remove the need for trust.
You run Agon when you want a verification pass — against your current Claude session or a diff. It forks the producer (the root transcript stays untouched), spawns an independent critic, runs the protocol, and writes an auditable session to disk. Resolved → proceed. Contested → it surfaces a focused review.
See install
$ latere agon --session-id 9f4c --max-turn 6
proposer fork of session 9f4c · root untouched
critic spawned ............... agent-β
rounds R1..Rn ............... 42s
STAKED ATK-1 TestConcurrentInvalidate · COLD
wrote .agon/runs/9f4c-2026-05-16/summary.md
contested 0.74 · review required
Is a debate always one linear thread?
No. The protocol is a tree, not a transcript. Any contested attack can fork into its own sub-debate where the proposer’s rebuttal becomes the new claim and the critic attacks that. The critic still stakes exactly one leaf across the whole tree, and the judge still inspects only that leaf. Branching is what makes the protocol survive obfuscated arguments — a misleading rebuttal can be cross-examined in its own sub-game instead of being accepted at face value.
Same model on both sides — why is that disqualified?
It is the model debating itself. Cross-examination requires independent failure modes; same weights share the same blind spots and the same lies. Agon's default pairing is cross-family (e.g. one model from vendor A as proposer, another from vendor B as critic). Same-vendor pairings are accepted but flagged in the ledger.
What stops the critic from being lazy?
The gating metric is per-aspect critic-found-bug rate against a held-out attack suite. If the critic does not actually attack, debate collapses to the proposer alone and Agon will say so on the session line. The metric is the operational definition of "the protocol is working".
Is the judge an LLM too?
Yes, but it inspects only the staked leaf — not the full transcript — and the surfacing layer (contention score, headline) is a pure rule with no LLM in it. The judge's job is local soundness on one claim; the human reads the headline and decides what to look at next.
Does this prove my code correct?
No. Agon is a verification gate, not a proof system. The formal soundness results are about the protocol under stated assumptions, not a guarantee about any particular model. Agon lowers the trust budget; it does not eliminate trust.
How is this different from a second LLM reviewing the first?
A naive second reviewer produces a soft opinion. Agon forces concrete attacks (input X yields Y violates Z), forces the proposer to defend or concede each one, and stakes one unresolved attack as the decisive leaf. The judge only inspects that leaf — never the whole transcript. The structure is the gate.
Install

One binary. Run it on demand.

Local-first, vendor-neutral. Bring your own pair of models; Agon runs the protocol and writes an auditable session to disk.

# one-liner — detects OS/arch, verifies checksum
$ curl -fsSL https://latere.ai/install.sh | sh
# run a verification pass on demand — on your current session
$ latere agon --session-id <session>