Back to the framework

AI: Red Team / Blue Team

Three rounds of adversarial stress-testing, March 2026

Testing whether the AI views are internally consistent, whether the equilibrium between effort and hopelessness is framework-justified, and whether the acknowledged tensions are worse than they appear.

Position: Pro most individual AI uses; aggregate trajectory probably net negative (xrisk, disempowerment, s-risk).

Timelines & outlook: Short timelines, fairly hopeless outlook, still trying.

Action: Works on AI safety; hasn't found obvious high-leverage intervention; money isn't the binding constraint.

Emotional weight: Significant; tried dedicating fully, burned out.

Current equilibrium: Working on safety while staying functional, accepting the gap.

Tensions: Techno-optimism elsewhere vs AI pessimism; "can't find where effort helps" is unfalsifiable.

Round 1 — The Comfortable Catastrophist
Red Team: The Comfortable Catastrophist

The doctor-and-plague analogy

A doctor who believes a plague will kill 90% of humanity but treats flu patients and gives 10% to malaria nets. Actions don't match beliefs. If you genuinely believe the stakes are existential, the comfortable equilibrium requires explanation.

The "equilibrium" framing reveals the problem

If the stakes are genuinely existential, forces should not balance. Calling it equilibrium smuggles in commensurability of personal wellbeing with existential outcomes. The word itself is a tell — it implies forces have found a resting point, when the correct response to existential risk is that personal comfort shouldn't register on the same scale.

The unfalsifiable load-bearing beam

"Can't find where marginal effort helps" is the single load-bearing beam holding up the equilibrium. It's unfalsifiable, which is reason to trust it less, not more. Any proposed intervention can be dismissed as "not obviously high-leverage." The belief conveniently justifies exactly the level of effort that is comfortable.

The techno-optimism contradiction

Techno-optimism elsewhere (cryonics, AVs) suggests a strong prior that technology works out. AI pessimism requires explaining why those arguments don't apply. Maybe the AI pessimism is more socially constructed than epistemically grounded — shaped by the community she's embedded in rather than by independent analysis.

The counterfactual test

Thought experiment: If she woke to proof AI will go well, what changes? If removing the belief doesn't require major restructuring, it wasn't shaping decisions much.

The "entertainment" critique extends

On short timelines, 10% to charity, "staying functional," the entire equilibrium — these are consumption goods. They're what you'd do if you wanted to feel like a good person while the world ends, not what you'd do if you were actually trying to prevent it.

Blue Team: Constrained Agency Under Catastrophic Uncertainty

The plague analogy fails

In a plague, interventions are well-understood. Distribute vaccines, quarantine the sick, fund research with clear endpoints. In AI safety, nobody knows what to do. Leading researchers disagree on theory of change. "Can't find where effort helps" is an accurate field description, not motivated reasoning — ask anyone working in alignment what the single best marginal intervention is and watch them struggle.

The demand collapses into "try harder"

Red team demands "do something more radical" — but what? Physical proximity isn't the constraint. Career risks toward what? The demand is structurally empty. It has the form of a devastating critique but offers no actionable alternative, because there may not be one.

The firefighter analogy

A firefighter who loves firefighting and would be happy as a chef isn't insincere about fire risk. She has restructured her life — 90% of resources to AI safety. The question isn't whether she's doing enough relative to some hypothetical maximum, but whether more would actually help.

AI is categorically different

AI is categorically different from cryonics and AVs: bounded-domain tech with local failure modes vs systems that match or exceed human cognition across domains including strategic planning. Demanding someone who enjoys swimming must enjoy tsunamis conflates scale and kind.

Total self-sacrifice produces zero output

She tried intensity, it broke her. Functional person working steadily beats burned-out person producing nothing. This isn't a hypothetical — it's an empirical result from her own life.

The belief is real

She cries about it. Humans aren't designed to sustain genuine existential dread while remaining functional. Functioning despite it is resilience, not insincerity. The demand that beliefs be reflected in constant crisis-mode action misunderstands how human psychology works under sustained stress.

Round 2 — The Diversification Fallacy and Timeline Contradiction
Red Team: The Diversification Fallacy and Timeline Contradiction

Person A vs Person B

Thought experiment: An asteroid is coming. Person A maintains hobbies and equilibrium. Person B liquidates retirement accounts and works in controlled urgency. Consequentialist framework clearly favors B. The "burnout" defense rests on a sample size of one.

The massive EV miscalculation

If short timelines are real, expected value of global health donations is tiny. 10% redirected to speculative AI safety bets has enormous expected value. Diversification disguises a massive EV miscalculation — it's the financial equivalent of hedging your portfolio while your house is on fire.

The time-allocation mismatch

She says short timelines make most moral reasoning "somewhat irrelevant" — then continues investing in ethical frameworks. Complete mismatch between stated beliefs and time allocation.

"Money isn't the binding constraint" as excuse

"Money isn't the binding constraint" should lead to identifying what is the constraint and acting on it. Instead it functions as another excuse. Together, the excuses justify exactly the life she'd live without the catastrophic belief.

Crying as release valve

The crying may be part of the equilibrium — the release valve that processes dissonance as sadness rather than a demand for change. Emotion that doesn't produce action is just consumption of the feeling of caring.

The killer question

What would it look like if she didn't believe AI was going catastrophically wrong? If "roughly what she's doing now minus the crying," the beliefs aren't decision-relevant.

Blue Team: The Tyranny of the Hypothetical Maximizer

The demand for total war

This is consequentialism's oldest failure mode: the logic that eats everything. Person C tries total war, has mental breakdown, produces less than Person A over 10 years. The hypothetical maximizer always looks better in the thought experiment than in reality.

Higher intensity doesn't produce higher output

In creative and research domains, moderate sustainable effort with intellectual cross-pollination outperforms grinding. This is not a cope — it's a well-documented feature of how difficult intellectual work actually functions.

Diversification under Knightian uncertainty

Under genuine Knightian uncertainty (not calculable risk), diversification is the rational strategy. The 10% is insurance against timeline estimates being wrong. If timelines are long, the global health donations did real good. If timelines are short, the safety work was the right call. Demanding all-in commitment requires certainty the position explicitly disclaims.

Ethical reasoning IS AI safety work

Alignment is fundamentally an ethics problem. Training moral reasoning capacity isn't separate from the mission — it's a core component. The time-allocation mismatch dissolves if you recognise that thinking carefully about values is directly relevant to building systems that respect them.

The career-level counterfactual

Removing the AI belief changes almost everything — she wouldn't work in safety, wouldn't orient her career around it, wouldn't talk to alignment researchers. The belief operates at the strategic level, not tactical. Red team looks at daily life and sees similarity; the structural differences are enormous.

The unanswered question

Red team never answers: "What would convince her to try harder again?" If no answer exists, the critique is as unfalsifiable as what it attacks. A demand with no satisfiable conditions is not an argument — it's a trap.

Round 3 — The Recursive Trap
Red Team: The Recursive Trap — When Hopelessness Becomes Self-Fulfilling

The dam engineer thought experiment

Version 1 says "it's fairly hopeless, I cry about it." Version 2 presents a ranked intervention list with costs and probabilities. Same beliefs, different outcomes. Version 2 treats communication as a consequential act. A second-order consequentialist should be Version 2.

Broadcasting despair has consequences

A second-order consequentialist should evaluate the effects of expressing hopelessness. "Fairly hopeless, still trying" is a demotivator — people update toward "smart people think it's hopeless, why bother?" Broadcasting despair while believing broadcasts have consequences is a first-order failure of the framework she endorses.

Hopelessness as epistemic attractor

Hopelessness is an epistemic attractor state: bad news confirms it, good news gets explained away. Low motivation from hopelessness reduces energy for seeking disconfirming evidence. Classic feedback loop. The position may be empirically correct, but the epistemic hygiene around maintaining it is poor.

Moral learned helplessness

"I tried hard, burned out, therefore trying hard doesn't work" generalises from past failure to future impossibility. Feels like wisdom, might be giving up. The sample size problem cuts both ways — one person's burnout doesn't prove the approach is wrong.

The critical question

What would convince her to try harder again? If she can't specify, the position is unfalsifiable.

The 2035 thought experiment

If things go well in 2035, it was because of people who maintained urgency despite uncertainty, not people who rested in sad equilibria. History favours the people who kept pushing.

Blue Team: The Morality of Sustainable Resistance

The argument that produced the burnout

The red team's argument is the exact reasoning that produced the burnout. Its practical effect, if accepted, would destroy the person it claims to help. This is not a rebuttal — it's an empirical observation about what happens when this logic is actually applied.

The dam analogy assumes listable interventions

AI safety is more like a strange illness no one has seen — demanding a ranked list produces false confidence. A dam engineer knows the failure modes. Nobody knows the failure modes of superintelligent AI. Demanding the form of engineering certainty is demanding the wrong thing.

False hope has bad second-order effects too

Reduced urgency, reduced political will for regulation, reduced accuracy. Honest fear can motivate — the perception that smart, sincere people are scared is motivating for recruitment. The second-order effects of honest pessimism are not obviously worse than the second-order effects of performed optimism.

The surgeon analogy

Extreme beliefs don't require extreme action — they require well-calibrated action. Optimal action under extreme beliefs is the action most likely to produce good outcomes, which may look moderate from the outside. A surgeon facing a 90% mortality case doesn't operate more frantically; they operate more carefully.

Persistence over heroism

Most real problems are solved by large numbers of people working competently and persistently, not by a few heroes pushing beyond their limits. The model of individual sacrifice is romantic but historically inaccurate.

The unanswered question persists

Red team never answered: what would convince her to try harder? If no answer exists, the critique is a trap regardless of how wise it feels. A demand that cannot be satisfied is not wisdom — it's cruelty dressed as rigour.

Where this lands

The position survives as a defensible but imperfect response to a genuinely terrible situation. Its greatest vulnerability is not logical inconsistency but the risk that pragmatic accommodation becomes permanent surrender — and the person holding it may not be able to tell the difference from the inside.

The most important unresolved question: what specific evidence or changed circumstances would justify re-evaluating the equilibrium?