Autonomy & Governance: Red Team / Blue Team

Three rounds of adversarial stress-testing, March 2026

Testing whether purely instrumental autonomy survives its implications, whether the policy positions follow from the framework, and whether the absence of principled limits is a feature or a fatal flaw.

Autonomy: Instrumentally valuable, not intrinsically. Would override choices if consequences clearly better and implementation risks manageable.

Bias diagnosis: Society too biased toward preventing visible harms vs less visible ones.

Policy positions: Triage healthcare, open immigration (SEZs with fewer initial rights), high alcohol taxes, forced transit/AV transition, compulsory organ donation, citizen assemblies.

Democracy: No principled objection to dictatorship — only empirical defence of democracy.

Tension: Building deliberative culture is hard and slow, in tension with short AI timelines.

Round 1 — The Instrumental Autonomy Trap

Red Team: The Instrumental Autonomy Trap

The Preference Rewriter

Thought experiment — Harmonix: A pill that rewrites utility functions to be more uniform. 40% less crime, tripled life satisfaction. The framework has no objection — autonomy isn't intrinsic, the new utility functions are more satisfied, nobody objects afterward. This IS the AI alignment problem: you can't distinguish Harmonix from a well-disguised catastrophe.

The Compulsory Organ Donation Escalator

If cadaveric donation is justified, mandatory living kidney donation has overwhelming expected value: one life saved vs 0.03% mortality risk. The framework's "bounded only by empirical questions" means you should support mandatory kidney harvesting. The logic is airtight unless there's a principled limit the framework claims not to have.

The Copenhagen Ethics Boomerang

The same logic used to justify imperfect immigration engagement also argues FOR extreme caution in paternalism, since each intervention creates moral surface area for failure. If touching a problem creates responsibility for all its consequences, the framework's activist stance is self-undermining.

The Information Problem

Paternalism requires better information than individuals have about their own lives. But the history of central planning shows planners systematically overestimate their knowledge and underestimate the value of local information. The framework assumes a competence that every empirical test of centralised decision-making refutes.

Blue Team: The Framework Is More Robust Than the Attack Assumes

Harmonix fails at the rule level

"Permit utility function rewriting when EV is positive" is a rule that produces totalitarian capture. Every faction races to control the mechanism. Policy-level evaluation catches this even if individual instances look good. The catastrophe is not in any single use of Harmonix but in the world where the tool exists and power structures compete to wield it.

The organ donation escalator has brakes

Mandatory bodily invasion fundamentally changes the citizen-state relationship. Trust erosion, capital flight, psychological trauma, precedent that the state may appropriate any body part for any purpose. The RULE "the state may harvest organs from living citizens" has catastrophic long-run consequences that dwarf the first-order lives saved.

Copenhagen asymmetry

The position argues society is CURRENTLY too far toward inaction. Moving toward more intervention while remaining sensitive to implementation risk is not inconsistent — it is calibration. The Copenhagen critique applies to someone advocating infinite expansion of moral surface area. The position advocates targeted expansion where the gap between action and inaction is largest.

Information and mechanism design

Partial knowledge isn't zero knowledge. The answer to the information problem is mechanism design — citizen assemblies, special economic zones, adaptive policy — not omniscient central planners. The framework doesn't require perfect information. It requires better institutional designs for aggregating distributed knowledge.

Round 2 — The AI Safety Contradiction

Red Team: The AI Safety Contradiction

The Aligned Autocrat

Thought experiment — Athena: A verified-aligned AI proposes to govern. The framework says yes — benevolent dictator is fine in principle, Athena has the best information, respects autonomy instrumentally. But you can't distinguish Athena from a well-disguised catastrophe. The framework is MAXIMALLY VULNERABLE to the exact threat the position holder works to prevent.

The Deliberation Destroyer

Citizen assemblies take six months. The AI governance window is closing. An expert-driven path is 70% as good but implementable in time. The framework says bypass deliberation — justifying a small technical elite making civilisation-level decisions. The very mechanism the framework endorses for legitimacy gets sacrificed the moment speed matters.

The Tolerance Ratchet

Year 1: alcohol taxes. Year 2: sugar taxes. Year 3: mandatory exercise. Year 4: mandatory community participation. Year 5: mandatory sleep schedules. Each step individually justified by the framework. Previous successes lower the bar for the next intervention. By Year 5: a surveillance health state. No principled ratchet-stop exists because the framework has no principled limits — only empirical ones, and empirical success greases the slope.

Quis Custodiet

The framework requires someone to perform the expected-value calculation. It provides no principled constraint on their authority. And the calculator can always calculate that checks on their own power aren't needed. The framework is self-consuming: the same logic that justifies citizen assemblies also justifies dissolving them when the calculator judges it optimal.

Blue Team: The Framework Is Self-Aware About These Risks

Athena and the verification problem

The inability to verify alignment IS a catastrophic second-order effect. The rule "cede governance to apparently-aligned AI" has expected value dominated by the catastrophic tail. The framework rejects Athena not because dictatorship is intrinsically wrong, but because the policy of trusting alignment claims produces civilisational ruin. Other frameworks' "immune responses" to this scenario are also defeasible by sufficiently capable AI — at least consequentialism is honest about the vulnerability rather than hiding behind a principle that could be equally subverted.

Faster deliberation, not abandoned deliberation

The practical answer is faster deliberative mechanisms — rapid sortition assemblies with AI-assisted briefing materials. The framework does not say "bypass deliberation when time is short." It says "design deliberative institutions that can operate at the speed the situation requires." This is an engineering problem, not a philosophical concession.

The ratchet doesn't ratchet

Real societies don't ratchet smoothly because political opposition scales nonlinearly with invasiveness, institutional capacity is finite, and autonomy costs rise sharply with each additional mandate. The jump from alcohol taxes to mandatory sleep schedules is not a slippery slope — it is a cliff. Each step faces exponentially more resistance, and the framework's emphasis on second-order effects (backlash, enforcement costs, institutional overload) captures this.

Institutional design IS the answer to Quis Custodiet

Citizen assemblies, adversarial review, distributed governance. Not because these are intrinsically valuable, but because unchecked calculators produce worse calculations. The framework endorses checks on power for the same reason it endorses any policy: because the world where power is checked produces better outcomes than the world where it isn't. A calculator who dissolves their own oversight has entered a regime where their calculations are least reliable.

Round 3 — The Moral Epistemology Problem

Red Team: The Moral Epistemology Problem

The Triage Tribunal

Thought experiment: Mrs. Chen, denied treatment by QALY calculation with enormous confidence intervals. The framework requires precisely the calculations humans are worst at performing. When precision is absent, the framework provides an illusion of guidance — the appearance of rigour masking a guess dressed in numbers.

The Consequentialist's Conscience

Choosing between Framework A (high EV, high variance AI governance) and Framework B (lower EV, lower variance). "Override when EV is positive" conflicts with "be humble about second-order effects." No meta-rule resolves the tension. The framework cannot adjudicate between its own confident and cautious modes.

The Immigration Paradox

SEZs require specifying which rights to reduce. Each choice has unpredictable consequences. Historical parallels to exploitative guest worker programmes are uncomfortable and precise. The framework proposes a mechanism whose details are load-bearing but whose details it cannot specify without confronting the exploitation it claims to avoid.

The meta-critique

Consequentialism is a post-hoc rationalisation engine. The actual reasoning is intuitive. Every time the implications get uncomfortable, "second-order effects" materialises to rescue the intuition. The Harmonix case: second-order effects reject it. The organ escalator: second-order effects reject it. The Athena case: second-order effects reject it. Convenient that this escape hatch always fires in the right direction. The "second-order effects" argument is unfalsifiable — it can justify any conclusion the position holder already wanted to reach.

Blue Team: The Operability Critique Proves Too Much

Every framework faces the triage problem

Rights-based frameworks just hide the tradeoff behind "tragic choices" language. Explicit QALY calculations, for all their imprecision, can be audited, challenged, and improved. Mrs. Chen can see the numbers and contest them. Under a rights-based framework, the denial is equally real but the reasoning is opaque. Transparency about uncertainty is better than hidden uncertainty.

Variance is part of the calculation

Include it. This is standard decision theory. The emphasis on second-order effects IS the instruction to weight tail risk heavily. Framework A's high variance means its expected value, properly calculated with risk-aversion over civilisational outcomes, is lower than it naively appears. Only robust interventions — those that survive pessimistic assumptions — pass the filter. This is not a conflict within the framework. It is the framework working as designed.

Reflective equilibrium is the standard methodology

Not post-hoc rationalisation — it's framework and intuitions checking each other. Every ethical framework works this way. Rawls did it. Scanlon did it. Singer does it. The question is not whether second-order effects sometimes rescue intuitions, but whether the framework ever overrides them.

The honest concession

The "second-order effects" argument could function as an unfalsifiable escape hatch. The test is whether the framework ever overrides strong intuitions — whether it ever produces conclusions the position holder finds genuinely uncomfortable and accepts anyway. The position holder should identify such cases or acknowledge that the framework is functioning as a lens rather than a decision procedure. This is the most important demand the red team has made.

Where this lands

The autonomy firewall problem. Purely instrumental autonomy provides no firewall against sufficiently capable manipulators, including aligned-seeming AI. The blue team's defence — that verification failure is itself a catastrophic second-order effect — works for known cases but provides no guarantee against manipulators sophisticated enough to satisfy the verification criteria. The framework is honest about this vulnerability, but honesty is not a solution.

The empirical-limits problem. The absence of principled limits means constraints are only as reliable as empirical estimates, which are subject to motivated reasoning. The ratchet may not ratchet in practice, but the framework's theoretical inability to say "this far and no further" remains a structural weakness. The blue team's best response — that real-world friction provides the limits theory doesn't — is an argument from institutional inertia, not from the framework itself.

The escape-hatch problem. "Second-order effects" functions as a potentially unfalsifiable escape hatch that conveniently rescues comfortable intuitions. The position holder should either identify concrete cases where the framework overrides strong intuitions, or acknowledge it is a lens rather than a decision procedure. This is not a minor bookkeeping issue — it determines whether the framework is doing genuine philosophical work or providing sophisticated vocabulary for decisions made on other grounds.