PROPOSER×CRITIC

Adversarial verification.
Cross-examine the AI, ★ move by move.

An AI proposer writes. An independent critic attacks. They debate, bounded; the critic stakes one unresolved attack as the decisive leaf. A judge inspects only that leaf — never the transcript. The human reviews disputes that survive.

Install Agon How it works →

Works with

ToposClaude CodeCodexGitHub Action

case · 9f4c — refactor token cache · cross-family pairing

agent α

agent β

Proposer

agent α

R1 · ProposalRefactored cache to LRU. Lock-free reads via atomic.Value. Benchmarks: 2.4× p99, no regressions.

R3 · DefenseDisagree — invalidate bumps the version counter before publishing. atomic.Value is acquire-release safe.

R1PROP

R2CRIT

R3PROP

R4CRIT

Critic

agent β

R2 · AttackTwo goroutines concurrent invalidate + load on a cold key. The cache holds a stale value for TTL. That breaks your documented guarantee.

R4 · Stake★ STAKE. Specific test: TestConcurrentInvalidateColdLoad fails 1/40 runs on 16 cores. Reproducer attached.

Verdict

Judge inspects only the staked leaf — never the full transcript.

contention 0.74 · review required

agon://session/9f4c · diff: refactor token cache

session · 00:00:42 · α/β · cross-family

R1·01

PROPOSER

Refactored cache to LRU with mutex-free reads via atomic.Value. Benchmarks: 2.4× p99 improvement, no regressions in unit tests.

R2·02

CRITIC

Input: two goroutines concurrent invalidate + load on a cold key. Output: stale value cached for TTL. Violates documented "no stale read after invalidate" guarantee.

∎ ATK-1

R2·03

CRITIC

"Mutex-free reads" — the version counter increment is still a CAS loop under contention. Claim of lock-free is misleading.

∎ ATK-2

R2·04

CRITIC

Benchmark setup uses 4 cores; cache key cardinality is 64. Hot path never exercises the cold-load branch where the regression would surface.

∎ ATK-3

R3·05

PROPOSER

Conceded: "mutex-free" is loose. Renamed to "lock-free fast path". Slow path is CAS, documented now.

RESOLVED ATK-2

R3·06

PROPOSER

Added cold-load microbenchmark at cardinality 4096; p99 still improves 1.8×. Diff updated.

RESOLVED ATK-3

R3·07

PROPOSER

Disagree: the invalidate path bumps the version counter before publishing the loaded value. Counter-claim: ordering is acquire-release safe; cite atomic.Value semantics.

CONTESTED ATK-1

↳ sub

CRITIC

Sub-debate: that counter-claim itself is contested. Acquire-release on amd64 ≠ acquire-release on weak-memory ARM. Reproducer below targets ARM specifically.

⌐ FORK ATK-1

R4·08

CRITIC

Stake ATK-1 (sub-debate leaf). Specific test: TestConcurrentInvalidateColdLoad fails 1 in 40 runs on a 16-core ARM box. Reproducer attached.

★ STAKED LEAF

R5·09

JUDGE

Inspecting ATK-1 only. Reproducer confirmed: stale read window of ~340ns when invalidate races a cold load. Critic wins this leaf. Disposition: open issue, do not merge.

8 attacks · 1 sub-debate · 6 resolved · 1 contested · 1 stakedVerdict: human review required

i · property

One honest player suffices

A Byzantine proposer must hold a consistent lie across every cross-examination round. An honest critic needs to find one inconsistency. Failure becomes per-aspect, not whole-tool.

ii · property

Vendor-neutral by construction

Default pairing is cross-family — one model proposes, an unrelated model critiques. Same-model-both-sides is the model debating itself, and is rejected. No vendor will ship the neutral layer.

iii · property

Channel purity

Critic output reaches the proposer as a verbatim user message, not a skill or template. The proposer defends the way it would against a human pasting a review. Wrapping it distorts the defense.

iv · property

Auditable by design

Stable attack ids, append-only ledger, contention-scored headlines by a pure rule — no LLM judging at the surfacing layer. A security team reads a session like a court transcript.

Property

Agon

Raw LLM

PR review

Bug found per-aspect, not whole-tool

✓yes

·no

~partial

Soundness with one honest player

✓yes

·no

Same-model-debates-itself rejected

✓yes

·no

Append-only auditable ledger

✓yes

·no

~partial

Contention score as decision gate

✓yes

·no

No LLM judge at surfacing layer

✓yes

·no

Code diffs

Pre-merge gate. Agents resolve attacks → CI proceeds. Contested → human review.

Research write-ups

Critic challenges claims and citations. Disputed evidence reaches the reviewer, not vibes.

⊞

Plans & decisions

High-stakes choices defended round-by-round. Contention score gates execution.

∮

Outcome analyses

Post-mortems and metric reads cross-examined for cherry-picking and unstated assumptions.

Proposer ↔ Critic ↔ Judge. Roles do not share weights. Each contested attack can fork into its own sub-debate; the ledger sees the whole tree, the judge sees one leaf.

Resolved

0.12 contention

proceed →

Attacks the proposer answered. The calling agent moves forward; the session is filed but does not interrupt the human.

Contested

0.74 contention

escalate ★

Attacks above threshold reach a human as a focused brief — the staked leaf, the proposer's counter, and the reproducer. Not the transcript.

2018

Irving, Christiano & Amodei. AI Safety via Debate — proposes debate as alignment mechanism.

arXiv:1805.00899

2023

Brown-Cohen, Irving & Piliouras. Scalable AI Safety via Doubly-Efficient Debate — extends to stochastic systems and bounded debaters.

arXiv:2311.14125

2025

Brown-Cohen, Irving & Piliouras. Avoiding Obfuscation with Prover-Estimator Debate — addresses obfuscated-arguments attack.

arXiv:2506.13609

Repo

changkun/agents-verification. Research home — adversarial debate, extended along compute, depth, stochasticity, leaf format, obfuscation, and query-complexity scaling.

github →

Debate is a proof-search game in which two adversarial provers argue before a polynomially-bounded judge.

— Brown-Cohen, Irving & Piliouras · 2023

Honest framing: the formal soundness results are about the protocol under stated assumptions, not a guarantee about any particular model. Application to real LLMs is empirically motivated and hypothesis-stage — the gating metric is the per-aspect critic-found-bug rate. If a critic does not actually attack, debate collapses to the proposer alone. Agon does not prove your code correct, and does not claim to remove the need for trust.

You run Agon when you want a verification pass — against your current Claude session or a diff. It forks the producer (the root transcript stays untouched), spawns an independent critic, runs the protocol, and writes an auditable session to disk. Resolved → proceed. Contested → it surfaces a focused review.

See install

$ latere agon --session-id 9f4c --max-turn 6

↳ proposer fork of session 9f4c · root untouched

↻ critic spawned ............... agent-β

↻ rounds R1..Rn ............... 42s

★ STAKED ATK-1 TestConcurrentInvalidate · COLD

⇣ wrote .agon/runs/9f4c-2026-05-16/summary.md

⇡ contested 0.74 · review required

Is a debate always one linear thread?

No. The protocol is a tree, not a transcript. Any contested attack can fork into its own sub-debate where the proposer’s rebuttal becomes the new claim and the critic attacks that. The critic still stakes exactly one leaf across the whole tree, and the judge still inspects only that leaf. Branching is what makes the protocol survive obfuscated arguments — a misleading rebuttal can be cross-examined in its own sub-game instead of being accepted at face value.

Same model on both sides — why is that disqualified?

It is the model debating itself. Cross-examination requires independent failure modes; same weights share the same blind spots and the same lies. Agon's default pairing is cross-family (e.g. one model from vendor A as proposer, another from vendor B as critic). Same-vendor pairings are accepted but flagged in the ledger.

What stops the critic from being lazy?

The gating metric is per-aspect critic-found-bug rate against a held-out attack suite. If the critic does not actually attack, debate collapses to the proposer alone and Agon will say so on the session line. The metric is the operational definition of "the protocol is working".

Is the judge an LLM too?

Yes, but it inspects only the staked leaf — not the full transcript — and the surfacing layer (contention score, headline) is a pure rule with no LLM in it. The judge's job is local soundness on one claim; the human reads the headline and decides what to look at next.

Does this prove my code correct?

No. Agon is a verification gate, not a proof system. The formal soundness results are about the protocol under stated assumptions, not a guarantee about any particular model. Agon lowers the trust budget; it does not eliminate trust.

How is this different from a second LLM reviewing the first?

A naive second reviewer produces a soft opinion. Agon forces concrete attacks (input X yields Y violates Z), forces the proposer to defend or concede each one, and stakes one unresolved attack as the decisive leaf. The judge only inspects that leaf — never the whole transcript. The structure is the gate.

Install

One binary. Run it on demand.

Local-first, vendor-neutral. Bring your own pair of models; Agon runs the protocol and writes an auditable session to disk.

# one-liner — detects OS/arch, verifies checksum$ curl -fsSL https://latere.ai/install.sh | sh

# run a verification pass on demand — on your current session$ latere agon --session-id <session>

View on GitHub Read the docs

Adversarial verification.Cross-examine the AI, ★ move by move.

One binary. Run it on demand.

Adversarial verification.
Cross-examine the AI, ★ move by move.