Decision Theory: From Pascal to FDT

I. Classical Foundations of Decision Theory (1654–1961)

The story of decision theory begins with gamblers, theologians, and mathematicians trying to answer a deceptively simple question: when facing uncertainty, what should you do?

Over three centuries, the answer evolved from intuitive rules of thumb into one of the most elegant mathematical frameworks in all of philosophy. But cracks appeared almost immediately—and those cracks eventually split the field wide open.

Pascal's Wager and the Birth of Expected Value (1654–1670)

1654

In the summer of 1654, Blaise Pascal and Pierre de Fermat exchanged a series of letters about the "problem of points"—how to fairly divide the stakes of an interrupted gambling game. In working out the answer, they invented probability theory.

Pascal then applied this new mathematics to the biggest bet of all. In his Pensées (published posthumously in 1670), he presented what is now called Pascal's Wager: a decision-theoretic argument for believing in God.

Pascal's Wager (Simplified)

Either God exists or God doesn't. You can either believe or not believe. If God exists and you believe, you gain infinite happiness. If God exists and you don't believe, you suffer. If God doesn't exist, the cost of belief is finite. Therefore, for any positive probability that God exists, the expected value of believing is infinite.

The Wager is often dismissed today, but its significance for decision theory is enormous: it was the first explicit use of what we now call expected value reasoning applied to a practical choice under uncertainty. Pascal introduced the idea of multiplying the probability of each outcome by its value and summing the results—the core operation that would eventually become expected utility theory.

Stanford Encyclopedia of Philosophy: Pascal's Wager Comprehensive entry covering all versions of the argument, the infinite utility problem, and the many-gods objection.

Bernoulli and the St. Petersburg Paradox (1713–1738)

1713

In 1713, Nicolas Bernoulli posed a puzzle in a letter to Pierre Raymond de Montmort that would haunt decision theory for centuries. His cousin Daniel Bernoulli published the definitive analysis in 1738.

The St. Petersburg Game

A fair coin is flipped repeatedly until it lands tails. If the first tails appears on flip $n$, you receive $2^n$ dollars. What is this game worth to you?

The expected monetary value is:

$$E[\text{payoff}] = \sum_{n=1}^{\infty} \frac{1}{2^n} \cdot 2^n = \sum_{n=1}^{\infty} 1 = \infty$$

Yet no one would pay a million dollars for a single play.

Daniel Bernoulli's solution was revolutionary: people don't maximize expected money—they maximize expected utility. He proposed that utility is a concave function of wealth, specifically logarithmic:

$$u(w) = \ln(w)$$

Under this assumption, the expected utility of the St. Petersburg game is finite, resolving the paradox. More importantly, Bernoulli introduced the foundational concept of decision theory: the utility function—a mathematical representation of subjective value that can differ from objective monetary value.

SEP: The St. Petersburg Paradox Full history from Nicolas Bernoulli through modern resolutions, including bounded utility and risk-weighting approaches.

Ramsey: Subjective Probability from Preferences (1926)

1926

Frank Ramsey, a Cambridge mathematician who died at age 26, wrote a paper that was decades ahead of its time. In "Truth and Probability" (written 1926, published posthumously 1931), Ramsey showed how to derive both probability and utility simultaneously from an agent's preferences over bets.

His key insight: if you prefer bet A to bet B, and bet B to bet C, these preferences implicitly reveal both how much you value the outcomes and how likely you think the relevant events are. There's no need to start with an objective probability—probability emerges from the structure of coherent preferences.

Ramsey also introduced the Dutch Book argument: if your degrees of belief don't satisfy the probability axioms, a bookie can construct a series of bets that guarantee you lose money regardless of what happens. This was the first rigorous argument that rational beliefs must obey probability theory.

Ramsey's work was largely ignored until Leonard Savage rediscovered and extended it in the 1950s. Today he is recognized as the true pioneer of subjective probability.

Von Neumann and Morgenstern: The Axiomatization (1944)

1944

In their monumental Theory of Games and Economic Behavior (1944), John von Neumann and Oskar Morgenstern provided the first rigorous axiomatization of expected utility theory. Their result, the VNM utility theorem, showed that if your preferences over lotteries satisfy four seemingly innocuous axioms, then you must be acting as if you are maximizing expected utility for some utility function.

The four axioms:

Completeness: For any two lotteries $L_1, L_2$, either $L_1 \succeq L_2$ or $L_2 \succeq L_1$ (you can always compare).
Transitivity: If $L_1 \succeq L_2$ and $L_2 \succeq L_3$, then $L_1 \succeq L_3$ (no preference cycles).
Continuity: If $L_1 \succeq L_2 \succeq L_3$, there exists some probability $p \in [0,1]$ such that $pL_1 + (1-p)L_3 \sim L_2$ (no infinitely good or bad outcomes).
Independence: If $L_1 \succeq L_2$, then for any $L_3$ and $p \in (0,1]$: $pL_1 + (1-p)L_3 \succeq pL_2 + (1-p)L_3$ (mixing with an irrelevant lottery doesn't change the ranking).

The Independence axiom is the crucial one—and the one that would be challenged by Allais. It says that your preference between two options shouldn't change just because you mix both with the same third option.

The theorem: If preferences satisfy these four axioms, there exists a utility function $u$ (unique up to positive affine transformation) such that:

$$L_1 \succeq L_2 \iff \mathbb{E}[u(L_1)] \geq \mathbb{E}[u(L_2)]$$

This is a representation theorem: it doesn't say you should maximize expected utility; it says that any "coherent" preferences can be described as expected utility maximization. The normative force comes from arguing that the axioms are requirements of rationality.

Yudkowsky: "Coherent Decisions Imply Consistent Utilities" (LessWrong/Arbital) An accessible, non-technical introduction to why violating the VNM axioms means "leaving money on the table." Written for the LessWrong audience. "Why You Must Maximize Expected Utility" (LessWrong) A more mathematical walk-through of the VNM theorem and Dutch Book arguments.

The Allais Paradox: Cracks in the Foundation (1953)

1953

At a famous 1952 Paris conference attended by many of the founders of decision theory, French economist Maurice Allais presented a set of choices designed to embarrass expected utility theory. Consider:

The Allais Paradox

Choice 1:

1A: $1 million for certain
1B: 89% chance of $1M, 10% chance of $5M, 1% chance of nothing

Choice 2:

2A: 11% chance of $1M, 89% chance of nothing
2B: 10% chance of $5M, 90% chance of nothing

Most people prefer 1A over 1B (the certainty is appealing), but also prefer 2B over 2A (might as well go for the bigger prize). This combination violates the Independence axiom.

The algebraic proof is clean. If you prefer 1A to 1B:

$$u(1M) > 0.89 \cdot u(1M) + 0.10 \cdot u(5M) + 0.01 \cdot u(0)$$

Rearranging: $0.11 \cdot u(1M) > 0.10 \cdot u(5M) + 0.01 \cdot u(0)$

But preferring 2B to 2A implies:

$$0.10 \cdot u(5M) + 0.90 \cdot u(0) > 0.11 \cdot u(1M) + 0.89 \cdot u(0)$$

Which gives: $0.10 \cdot u(5M) + 0.01 \cdot u(0) > 0.11 \cdot u(1M)$

This directly contradicts the first inequality. The common choice pattern is algebraically inconsistent with any utility function.

Legend has it that Leonard Savage himself displayed the Allais pattern when first confronted with the choices, then revised his answers upon seeing the proof. The Allais Paradox demonstrates the certainty effect: people overweight outcomes that are certain relative to outcomes that are merely probable, violating the Independence axiom. This later inspired Kahneman and Tversky's Prospect Theory.

Savage: Subjective Expected Utility (1954)

1954

Leonard Savage's The Foundations of Statistics (1954) is perhaps the most important single work in decision theory. Savage synthesized the ideas of Ramsey (subjective probability from preferences) and von Neumann-Morgenstern (expected utility from axioms) into a unified framework that simultaneously derives both a subjective probability function and a utility function from preferences.

Savage's framework has three primitives: states of the world (things beyond the agent's control), consequences (outcomes the agent cares about), and acts (functions from states to consequences—i.e., what happens depends on what you do and how the world is).

His seven axioms (P1–P7) include the celebrated Sure-Thing Principle:

If you would prefer act $f$ to act $g$ both when event $E$ obtains and when event $E$ does not obtain, then you should prefer $f$ to $g$ unconditionally.

This is Savage's version of the Independence axiom, and it's the principle that both the Allais Paradox and Newcomb's Problem put under pressure.

Savage's representation theorem: Preferences satisfying his axioms uniquely determine a probability measure $P$ over states and a utility function $u$ over consequences such that act $f$ is preferred to act $g$ if and only if:

$$\mathbb{E}_P[u(f)] = \int u(f(s)) \, dP(s) \geq \int u(g(s)) \, dP(s) = \mathbb{E}_P[u(g)]$$

A crucial feature of Savage's framework: states are assumed to be probabilistically independent of acts. This assumption is exactly what Evidential Decision Theory would later challenge.

SEP: Decision Theory Comprehensive survey of normative decision theory, covering Savage's framework, the VNM theorem, and the debates they spawned.

The Ellsberg Paradox: Ambiguity Aversion (1961)

1961

Daniel Ellsberg (yes, the Pentagon Papers Ellsberg—he was an economist before he was a whistleblower) presented another challenge to expected utility theory in his 1961 paper "Risk, Ambiguity, and the Savage Axioms."

Ellsberg's Paradox

An urn contains 30 red balls and 60 balls that are either green or blue, in unknown proportions. You can bet on drawing a specific color.

Bet A: Win $100 if red (known: 30/90 = 1/3 chance)
Bet B: Win $100 if green (unknown chance)

Most people prefer A. But now consider:

Bet C: Win $100 if red or blue
Bet D: Win $100 if green or blue (known: 60/90 = 2/3 chance)

Most people prefer D. But preferring A to B implies $P(\text{red}) > P(\text{green})$, and preferring D to C implies $P(\text{green or blue}) > P(\text{red or blue})$, i.e., $P(\text{green}) > P(\text{red})$—a contradiction.

The Ellsberg Paradox reveals ambiguity aversion: people prefer known risks over unknown risks, even when no assignment of probabilities can rationalize their preferences. This violates Savage's framework, which requires a unique subjective probability over all events.

This matters for our story because it shows that the classical foundations, while beautiful, don't fully capture human reasoning about uncertainty. And the biggest challenge to those foundations was just around the corner.

"The Savage Theorem and the Ellsberg Paradox" (LessWrong) A clear walkthrough of how the Ellsberg choices violate Savage's Sure-Thing Principle, with discussion of whether ambiguity aversion is rational.

II. Newcomb's Problem: The Bomb That Split Decision Theory (1969)

1969

In 1963, at a cocktail party, Harvard philosopher Robert Nozick heard about a puzzle from mathematician Martin Kruskal, who had learned it from physicist William Newcomb at Lawrence Livermore Laboratory. Nozick later called it "the most consequential party I have attended."

In 1969, Nozick published the problem, and it detonated like a bomb in the middle of decision theory. Decades later, the rubble is still being sorted.

Newcomb's Problem

A superintelligent being called Omega (a near-perfect predictor) presents you with two boxes:

Box A (transparent): Contains $1,000.
Box B (opaque): Contains either $1,000,000 or $0.

You may take both boxes ("two-box") or only Box B ("one-box").

The catch: Omega has already predicted your choice. If it predicted you'd one-box, it placed $1,000,000 in Box B. If it predicted you'd two-box, Box B is empty. Omega has been correct in every observed case.

The Two Arguments

The argument for two-boxing (dominance): Box B already contains whatever it contains. Your choice can't change the past. Whatever is in Box B, taking both boxes gets you $1,000 more. Two-boxing strictly dominates.

$$\forall S: \quad U(\text{two-box} \mid S) > U(\text{one-box} \mid S)$$

The argument for one-boxing (expected utility): One-boxers walk away with $1,000,000. Two-boxers walk away with $1,000. If Omega's prediction accuracy is $p$:

$$EU(\text{one-box}) = p \cdot \$1{,}000{,}000 + (1-p) \cdot \$0$$ $$EU(\text{two-box}) = (1-p) \cdot \$1{,}001{,}000 + p \cdot \$1{,}000$$

For $p = 0.99$: one-boxing yields $\$990{,}000$ in expectation vs. two-boxing's $\$11{,}000$. The break-even accuracy is remarkably low: $p \approx 0.5005$.

Why It's So Devastating

Both arguments seem airtight, and yet they give opposite answers. As Nozick observed, when he presented the problem to people, they "divide almost evenly, with large numbers thinking that the [other side is] just being silly."

The problem forces a choice between two bedrock principles of rational decision-making—dominance reasoning and expected utility maximization—and in doing so, it splits decision theory into two camps that are still arguing today.

Nozick, "Newcomb's Problem and Two Principles of Choice" (1969, PDF) The original paper that started it all. Presents the problem, both arguments, and Nozick's (tentative) two-boxing position. Yudkowsky, "Newcomb's Problem and Regret of Rationality" (LessWrong) Yudkowsky's influential argument for one-boxing: "If it's stupid but it works, it's not stupid." SEP: Causal Decision Theory The main encyclopedia entry covering Newcomb's Problem in depth, including formal setups and both sides of the debate.

Poll Results

How do people actually split? The 2020 PhilPapers Survey of professional philosophers found: 39% two-box, 31.2% one-box, ~30% other/undecided. Martin Gardner's 1973 Scientific American column generated reader mail running ~71% for one-boxing. The LessWrong rationalist community overwhelmingly one-boxes, influenced by FDT-style reasoning.

Oesterheld, "A Survey of Polls on Newcomb's Problem" Meta-survey compiling results from multiple polls of philosophers, students, and the general public.

III. Evidential Decision Theory (1965–present)

1965

Evidential Decision Theory traces to Richard Jeffrey's The Logic of Decision (1965, revised 1983), though Jeffrey didn't use the term "EDT"—the label was crystallized later when Gibbard and Harper (1978) drew the EDT/CDT distinction.

The Core Idea

EDT's prescription is elegantly simple: choose the action that is the best news you could learn about yourself. More formally: choose the action with the highest conditional expected utility.

$$V(A) = \sum_{s} P(s \mid A) \cdot U(A, s)$$

where $P(s \mid A)$ is your conditional probability of state $s$ given that you perform act $A$, and $U(A,s)$ is the utility of the outcome.

The key word is conditional. Your action is treated as evidence about the state of the world. If one-boxing is evidence that Box B contains a million dollars (because Omega predicted your one-boxing), then one-boxing has high conditional expected utility.

Jeffrey's Framework

Jeffrey's major innovation was treating acts, states, and outcomes as propositions in a single Boolean algebra, rather than maintaining Savage's tripartite distinction. This means acts can be probabilistically dependent on states—precisely what's needed for Newcomb-like reasoning.

Jeffrey's desirability formula:

$$\text{Des}(A) = \sum_{i} P(S_i \mid A) \cdot \text{Des}(A \wedge S_i)$$

Jeffrey also developed probability kinematics (Jeffrey conditionalization), a generalization of Bayesian updating for uncertain evidence:

$$P_{\text{new}}(H) = \sum_{i} P_{\text{old}}(H \mid E_i) \cdot P_{\text{new}}(E_i)$$

This plays a supporting role in EDT by providing the epistemological foundation for how agents update beliefs.

EDT on the Key Problems

Newcomb's Problem

EDT: One-box. ($1,000,000)

One-boxing is evidence that Omega predicted one-boxing, so Box B almost certainly contains $1M. EDT follows the evidence.

Smoking Lesion

EDT (naive): Don't smoke. EDT (with tickle defense): Smoke.

Naive EDT says smoking is evidence of the cancer-causing lesion. But the tickle defense (Eells 1982) argues that once you know your own desire to smoke, the act itself provides no additional evidence about the lesion. Sophisticated EDT smokes.

Twin Prisoner's Dilemma

EDT: Cooperate. ($5 each)

Your cooperation is evidence that your twin cooperates. EDT cooperates, aligning with Hofstadter's "superrationality."

Defenders of EDT

The most prominent contemporary EDT defender is Arif Ahmed, whose Evidence, Decision and Causality (Cambridge, 2014) mounts a systematic defense. Ahmed argues EDT is more parsimonious than CDT (it requires only probability, not a theory of causation), and presses the "Why Ain'cha Rich?" argument: if two-boxing is rational, why do two-boxers end up with $1,000 while one-boxers get $1,000,000?

Ahmed also presents the Betting on the Past scenario, where CDT recommends a predictably losing bet, and argues this is a reductio of CDT.

Summary of Ahmed's "Evidence, Decision and Causality" (EA Forum) Detailed chapter-by-chapter summary of the most important contemporary defense of EDT.

The "Managing the News" Charge

David Lewis famously accused EDT of recommending "an irrational policy of managing the news"—choosing actions that give you good news about the world rather than actions that actually make the world better. EDT's defenders reply that in Newcomb-like cases, the good news is the good outcome: one-boxers really do end up richer.

Jeffrey, The Logic of Decision (U. Chicago Press, 1983) The foundational text. Readable and elegant. The 1983 edition adds the crucial discussion of ratifiability.

IV. Causal Decision Theory (1978–present)

1978

Causal Decision Theory emerged as a direct response to EDT's recommendation in Newcomb's Problem. Its founders—Allan Gibbard, William Harper, David Lewis, and Brian Skyrms—argued that rational decision-making should attend to what your actions cause, not merely what they're evidence for.

The Core Formalism

CDT replaces EDT's conditional probabilities with causal or counterfactual probabilities. The key question becomes: "What would happen if I were to do $A$?"—using a subjunctive conditional rather than conditioning on evidence.

Lewis's formulation (1981) uses dependency hypotheses:

$$U(A) = \sum_K P(K) \cdot V(A \wedge K)$$

where $K$ ranges over dependency hypotheses—maximally specific propositions about how outcomes depend on actions, held fixed while evaluating the action.

Gibbard-Harper formulation (1978) uses counterfactual conditionals:

$$U(A) = \sum_S P(A \boxright S) \cdot V(S)$$

where $A \boxright S$ means "if $A$ were performed, $S$ would obtain." The probability $P(A \boxright S)$ is evaluated using Stalnaker's closest-world semantics, not by conditioning on $A$ as evidence.

There's also a deep connection to Judea Pearl's do-calculus, which formalizes causal reasoning using interventions in structural causal models. Pearl's $P(Y \mid \text{do}(X))$ captures exactly the causal probability CDT needs, distinguishing seeing that $X$ happened from making $X$ happen.

CDT on the Key Problems

Newcomb's Problem

CDT: Two-box. ($1,000)

Your choice cannot causally affect Omega's past prediction. Box B's contents are fixed. Two-boxing strictly dominates.

Smoking Lesion

CDT: Smoke. (Correct)

Smoking doesn't cause cancer (the lesion does). CDT correctly ignores the evidential correlation and recommends the pleasurable, causally harmless action.

Twin Prisoner's Dilemma

CDT: Defect. ($1)

Your choice can't causally affect your twin's independent decision. Defection is the dominant strategy. CDT defects—and both twins end up with $1 each.

The Key Defender: James Joyce

Joyce's The Foundations of Causal Decision Theory (Cambridge, 1999) provides the most comprehensive book-length defense of CDT, including a representation theorem showing that both CDT and EDT are instances of a more general conditional decision theory.

Problems with CDT

CDT's Achilles' heel is Newcomb's Problem itself. CDT two-boxes and gets $1,000 while one-boxers get $1,000,000. Gibbard and Harper themselves noticed this concern, and CDT defenders have struggled with the "Why Ain'cha Rich?" retort ever since.

Beyond Newcomb's, CDT faces problems with:

Parfit's Hitchhiker: CDT can't credibly commit to paying the driver, so the CDT agent dies in the desert.
Transparent Newcomb: CDT two-boxes even when you can see the million dollars in the box—and CDT agents systematically see empty boxes.
Death in Damascus: CDT enters an infinite regret loop where no pure strategy is self-ratifying.
The Psychopath Button (Egan 2007): CDT presses a button that kills psychopaths, even though pressing it is strong evidence that you are a psychopath.

Lewis, "Causal Decision Theory" (1981, PDF) The classic paper. Lewis introduces dependency hypotheses and mounts the most influential defense of CDT. "Decision Theories: A Less Wrong Primer" (LessWrong) Covers EDT, CDT, TDT, UDT, and FDT with worked examples and comparison. Great starting point.

V. The Thought Experiment Arena

Decision theories are tested by thought experiments the way physical theories are tested by experiments. Each problem probes a different aspect of rational choice. Here's the bestiary, with what each theory recommends.

The Smoking Lesion

Setup

A genetic lesion causes both (a) a desire to smoke and (b) lung cancer. Smoking itself does not cause cancer. Smoking is pleasurable. Should you smoke?

Theory	Answer	Reasoning
EDT (naive)	Don't smoke	Smoking is evidence of the lesion/cancer
EDT (tickle)	Smoke	Desire screens off the act from the lesion
CDT	Smoke	Smoking doesn't cause cancer
FDT	Smoke	The lesion isn't computing your decision function

This is CDT's showcase problem and EDT's embarrassment. It's also the mirror image of Newcomb's: in Newcomb's, EDT gets it "right"; here, EDT gets it "wrong." FDT claims to handle both correctly.

Death in Damascus

Setup (Gibbard & Harper, 1978)

Death, a perfect predictor, tells you in Damascus that you have an appointment tomorrow. Death's appointment book (written in advance) lists the city where you will die. You can stay in Damascus or flee to Aleppo. Death will be waiting wherever the book says.

CDT enters an infinite regret loop: if you plan to stay, Death is in Damascus, so you want to flee; but if you plan to flee, Death is in Aleppo, so you want to stay. No pure strategy is self-ratifying under CDT. CDT typically resolves this via mixed strategies (flip a fair coin).

FDT recognizes that either way you're doomed (Death predicted correctly), so it stays to save the trouble of traveling. Levinstein & Soares (2020) provide a careful analysis distinguishing different versions of Death's prediction rule.

Levinstein & Soares, "Cheating Death in Damascus" (2020, PDF) Published in The Journal of Philosophy. Distinguishes versions of Death in Damascus and shows how FDT handles each. The most rigorous academic treatment of FDT.

Parfit's Hitchhiker

Setup (Parfit, Reasons and Persons, 1984)

You're dying in the desert. A driver will save you if and only if she predicts (with ~99% accuracy via reading micro-expressions) that you will pay her $100 upon reaching town. Once you're safely in town, should you pay?

Theory	Answer	Result
CDT	Don't pay	Dies in desert (can't credibly commit)
EDT	Pay	Survives (paying is evidence of the type the driver picks up)
FDT	Pay	Survives (decision algorithm determines both prediction and action)

This is structurally equivalent to the Transparent Newcomb Problem and is one of the clearest cases where CDT's inability to make credible commitments is costly.

The Toxin Puzzle

Setup (Kavka, 1983)

A billionaire will pay you $1,000,000 tomorrow morning if at midnight tonight you genuinely intend to drink a mildly unpleasant (but harmless) toxin tomorrow afternoon. You need not actually drink it—you just need to form a genuine intention. Can you?

The puzzle isolates intentions from actions. Both CDT and EDT struggle: a CDT agent who knows it won't drink can't form a genuine intention. FDT-style agents can intend to drink because they evaluate the policy of "intend and follow through" as superior to "try to game it."

Transparent Newcomb

Setup

Newcomb's Problem, but both boxes are transparent—you can see whether Box B contains $1M before choosing. If you see the $1M, should you take just Box B or both boxes?

If you see $1M in Box B, one-boxing means knowingly leaving $1,000 on the table. Yet FDT one-boxes: the reason you see the million is that you're the kind of agent who one-boxes. If you were a two-boxer, you'd be staring at an empty box. FDT agents seeing $1M in the box get $1M; CDT agents consistently see $0.

Counterfactual Mugging

Setup (Nesov, 2009)

Omega flips a fair coin. Heads: Omega gives you $10,000 if it predicts you'd pay $100 on tails. Tails: Omega asks you for $100. You see tails. Do you pay?

Theory	Answer	Expected Value of Policy
CDT	Don't pay	$0 (never gets the $10,000)
EDT	Don't pay	$0 (after updating on tails, paying is pure loss)
TDT	Struggles	Updates on tails, may refuse
UDT/FDT	Pay	$4,950 (= 0.5 × $10,000 − 0.5 × $100)

This is the problem that broke TDT and motivated UDT. The "pay" policy dominates when evaluated from the prior (before seeing the coin), but all "updateful" theories (CDT, EDT, TDT) refuse to pay after seeing tails.

"Counterfactual Mugging" (LessWrong, original post by Vladimir Nesov) The original formulation. Sparked extensive debate about updatelessness and the nature of rational commitment.

XOR Blackmail

Setup (Yudkowsky & Soares, 2017)

You hear a rumor about $1M termites. A greedy predictor sends a letter: "I sent this iff exactly one of: (i) no termites and you pay me $1,000, or (ii) termites and you don't pay." You received the letter. Pay?

EDT pays (it's "good news" about not having termites). CDT and FDT refuse—the termites are already there or not, independent of your decision algorithm. XOR Blackmail is a clean counterexample to EDT that doesn't involve the Smoking Lesion's common-cause structure.

The Psychopath Button

Setup (Egan, 2007)

Paul can press a button that kills all psychopaths. He believes only a psychopath would press it. Paul strongly prefers living to a world without psychopaths.

CDT presses (pressing doesn't cause you to be a psychopath), which likely kills Paul. EDT doesn't press (pressing is evidence of psychopathy). FDT doesn't press either—your decision algorithm determines both your action and your character, and the algorithm that outputs "don't press" is evidence of non-psychopathy.

Summary Comparison

Problem	EDT	CDT	FDT
Newcomb's Problem	One-box ($1M)	Two-box ($1K)	One-box ($1M)
Smoking Lesion	Don't smoke*	Smoke	Smoke
Death in Damascus	Unstable	Mixed strategy	Stay (context-dep.)
Twin PD	Cooperate ($5)	Defect ($1)	Cooperate ($5)
Parfit's Hitchhiker	Pay (survives)	Don't pay (dies)	Pay (survives)
Toxin Puzzle	Can't intend	Can't intend	Intends & drinks
Transparent Newcomb	Two-box ($0)	Two-box ($0)	One-box ($1M)
Counterfactual Mugging	Don't pay	Don't pay	Pay
XOR Blackmail	Pay (wrong)	Don't pay	Don't pay
Psychopath Button	Don't press	Press (dies!)	Don't press

*EDT with the tickle defense smokes. The table shows naive EDT.

The pattern: EDT gets Newcomb-like problems right (where prediction tracks your algorithm) but fails on "medical Newcomb" problems (where correlation doesn't track your algorithm). CDT gets medical Newcomb right but fails on standard Newcomb. FDT claims to get both classes right by asking: is the correlation mediated by something computing your decision function?

VI. Timeless Decision Theory (2009–2010)

2009

By the late 2000s, the EDT-CDT stalemate had persisted for decades. The first serious attempt at a third way came from the rationalist community around LessWrong.

Predecessors

Douglas Hofstadter's "Superrationality" (1983) planted the seed. In the Scientific American essay later collected in Metamagical Themas (1985), Hofstadter argued that identical reasoners in a symmetric Prisoner's Dilemma should cooperate: "Whatever I decide, my opponent decides the same thing, so I'm choosing between mutual cooperation and mutual defection."

Gary Drescher's Good and Real (2006) developed this further with the concept of subjunctive means-end relations—non-causal links between actions and outcomes that are stronger than mere evidence. Drescher argued for "acausal" counterfactual reasoning where it makes sense to act as if your choice affects conditions preceding the choice, even without any causal link.

Drescher, Good and Real (MIT Press, 2006, PDF via Gwern) The conceptual ancestor of logical decision theories. Develops "subjunctive means-end relations" and applies them to Newcomb's Problem, the PD, and ethics.

Yudkowsky's TDT

In 2010, Eliezer Yudkowsky published "Timeless Decision Theory" through MIRI. The central thesis:

Agents should decide as if they are determining the output of the abstract computation that they implement, including the output of all other instantiations and simulations of that computation.

This is the fundamental insight of the "logical decision theory" family. You're not choosing an action—you're choosing the output of an algorithm. Since the same algorithm may be instantiated in your brain, simulated by Omega, running in your twin's brain, etc., choosing its output simultaneously determines what happens everywhere it runs.

Formal Framework: Timeless Decision Diagrams

TDT extends causal Bayesian networks with computation nodes representing abstract computations. The agent's decision is modeled as a computation node that influences both:

The agent's physical action (causally downstream)
Any predictor that simulated this computation (also downstream in the logical/computational sense, even if temporally prior)

The algorithm:

Identify the node representing your decision algorithm's output
For each possible output $X$: compute the logical counterfactual "if this computation outputs $X$, what are the downstream consequences?"
Choose the output that maximizes expected utility across these counterfactuals

What TDT Gets Right

TDT one-boxes on Newcomb's (like EDT), smokes on the Smoking Lesion (like CDT), cooperates in the Twin PD, and pays in Parfit's Hitchhiker. It was designed to be the theory that wins—and it does, on these cases.

Where TDT Breaks: Counterfactual Mugging

TDT's critical flaw: it updates on observations before computing expected utility. After seeing tails in Counterfactual Mugging, TDT reasons within the "tails branch" and may refuse to pay—missing the cross-branch benefits of the "always pay" policy.

This flaw directly motivated the development of Updateless Decision Theory.

Yudkowsky, "Timeless Decision Theory" (MIRI, 2010, PDF) The original TDT paper. 16 pages. Develops timeless decision diagrams and proves TDT's reflective consistency. Yudkowsky, "TDT: Problems I Can't Solve" (LessWrong) Yudkowsky's own list of open problems and limitations of TDT. Refreshingly honest about the theory's incompleteness.

VII. Updateless Decision Theory (2009–2012)

2009

In March 2009, Wei Dai posted "Towards a New Decision Theory" on LessWrong—partly as "a guess about Timeless Decision Theory" since "there seems to be little hope that Eliezer will publish his TDT any time soon." The result was arguably more important than TDT itself.

The Core Idea

UDT's principle is startlingly simple:

The optimal agent commits to the best policy—the best mapping from observations to actions—as estimated by its prior beliefs, before any observations are made.

In Wei Dai's words: "We give up the idea of 'conditioning on the blue box' and instead just choose the action that will maximize the unconditional expected utility."

The key word is unconditional. Standard decision theories (CDT, EDT, and even TDT) update on observations and then choose actions. UDT never updates. It commits to a policy evaluated from the standpoint of the prior.

Policy Selection vs. Action Selection

The deepest distinction between UDT and earlier theories:

Action selection (CDT, EDT, TDT): "Given my current situation, what action should I take?"
Policy selection (UDT): "What complete mapping $\pi: \text{Observations} \to \text{Actions}$ should I implement?"

UDT selects the policy that maximizes expected utility according to the prior:

$$\pi^* = \arg\max_{\pi} \sum_{w \in W} P(w) \cdot U(\text{outcome}(w, \pi))$$

Then upon receiving observation $o$, simply executes $\pi^*(o)$—the action prescribed by the optimal policy.

UDT 1.0 vs. UDT 1.1

UDT 1.0 optimized each action independently: for each observation, find the best action. Wei Dai discovered this could produce globally suboptimal policies (the agent fails to "coordinate with itself" across different observations).

UDT 1.1 ("Explicit Optimization of Global Strategy") fixes this by iterating over complete policies rather than individual actions. UDT 1.1 finds the globally optimal policy first, then looks up what it prescribes for the current observation.

UDT on Counterfactual Mugging

UDT handles Counterfactual Mugging trivially:

Policy "pay on tails": $\text{EU} = 0.5 \times \$10{,}000 - 0.5 \times \$100 = \$4{,}950$
Policy "don't pay on tails": $\text{EU} = \$0$

The "pay" policy dominates. Upon observing tails, UDT simply executes the optimal policy and pays. No agonizing needed.

UDT as Universal Precommitment

UDT functions as an automatic commitment device. In game theory, commitment devices are external mechanisms that bind you to future actions. UDT achieves this without any external mechanism: an agent that selects the globally optimal policy from the prior automatically behaves as if it has made all beneficial precommitments.

Problems with UDT

UDT's elegance comes at a steep price:

Logical omniscience: UDT requires evaluating the expected utility of every possible policy across all possible worlds—computationally intractable for bounded agents.
The problem of logical priors: There's no known "natural" prior over logical/mathematical statements. If the agent's initial logical probabilities are bad, it may lock in a terrible policy forever.
Commitment races: If two UDT agents are bargaining, each has incentive to commit to a policy before the other—creating an arms race toward premature commitment.
Formal specification: The full formalization requires a theory of logical counterfactuals that doesn't yet exist.

These problems remain open. As one LessWrong post put it: "Formalising decision theory is hard."

Wei Dai, "Towards a New Decision Theory" (LessWrong, 2009) The original UDT post. Crisp and foundational. Introduces policy selection and updatelessness. "What is Wei Dai's Updateless Decision Theory?" (LessWrong) A community-written explanation of UDT aimed at newcomers.

VIII. Functional Decision Theory (2017)

2017

In 2017, Eliezer Yudkowsky and Nate Soares published "Functional Decision Theory: A New Theory of Instrumental Rationality" through MIRI (also on arXiv). FDT was intended as an umbrella framework capturing the shared insights of TDT and UDT in a more accessible formulation.

The Core Principle

An agent should treat its decision as the output of a fixed mathematical function and choose the output that maximizes expected utility, taking into account all the consequences of that function outputting that value.

The key shift from CDT: rather than asking "What would happen if I did action $A$?" (intervening on the physical action), FDT asks "What would happen if my decision algorithm output $A$?" This seemingly subtle change has dramatic consequences when the algorithm is modeled, simulated, or predicted elsewhere.

The Three Operators

The paper distinguishes three expected utility calculations via the operator used:

EDT uses conditional probability:

$$\text{EDT}(P, G) = \arg\max_a \sum_o P(o \mid a) \cdot G(o)$$

CDT uses causal probability (do-operator):

$$\text{CDT}(P, G) = \arg\max_a \sum_o P(o \| a) \cdot G(o)$$

FDT uses the subjunctive/dagger operator:

$$\text{FDT}(P, G) = \arg\max_a \sum_o P(o \mathbin{\dag} a) \cdot G(o)$$

where $P(o \mathbin{\dag} a)$ represents the probability of outcome $o$ were the agent's decision function to output $a$. The dagger captures subjunctive dependence: connections that flow through shared computational structure.

Subjunctive Dependence: The Key Concept

FDT's central innovation is the concept of subjunctive dependence:

It includes causal dependence as a special case (your action causes effects in the world).
It also includes "acausal" logical connections: if Omega has a perfect model of your decision algorithm, Omega's prediction subjunctively depends on your algorithm even though you don't cause Omega's prediction.
It excludes mere statistical correlation without shared computational structure. In the Smoking Lesion, the genetic lesion is not computing the same function as your decision algorithm, so there's no subjunctive dependence.

This is what lets FDT handle both classes of problems:

Newcomb-like problems (prediction via simulation): Subjunctive dependence exists. FDT one-boxes, cooperates, pays.
Medical Newcomb problems (correlation via common cause): No subjunctive dependence. FDT smokes, ignores blackmail, acts as CDT would.

FDT in Slogan Form

From the paper: FDT "manages consequences, not news."

EDT manages news: selects actions that are the best evidence for good outcomes.
CDT manages causal consequences: selects actions that cause the best outcomes through physical causal chains.
FDT manages all consequences: selects the decision function output that produces the best outcomes through all channels—both causal and logical/subjunctive.

The "Source Code" Framing

A helpful restatement: "Questions in decision theory are not questions about what choices you should make with some sort of unpredictable free will. They are questions about what type of source code you should be running."

Yudkowsky & Soares, "Functional Decision Theory" (arXiv, 2017) The original FDT paper. Introduces subjunctive dependence, the dagger operator, and works through all major problems. "An Intuitive Introduction to Functional Decision Theory" (LessWrong) A gentler introduction with worked examples. Good starting point before the paper. MIRI Announcement: Functional Decision Theory MIRI's blog post introducing FDT with summary and context.

The Intellectual Lineage

Hofstadter (1983): Superrationality | Drescher (2006): Subjunctive means-end relations | +---> Yudkowsky (2010): Timeless Decision Theory (TDT) | | | +---> Problem: Counterfactual Mugging | +---> Wei Dai (2009): Updateless Decision Theory (UDT) | +---> UDT 1.1 (2010): Global policy optimization | +---> Problems: Logical omniscience, logical priors | +---> Yudkowsky & Soares (2017): Functional Decision Theory (FDT) | +---> Open: Logical counterfactuals, embedded agency

IX. Critiques and Modern Debates (2017–present)

FDT has drawn both serious academic criticism and vigorous community debate. The theory remains an arXiv preprint—it was rejected from journal publication after revisions—though the companion paper "Cheating Death in Damascus" was published in The Journal of Philosophy.

Wolfgang Schwarz's Critique (2018)

Philosopher Wolfgang Schwarz identifies three fundamental problems:

"Insane recommendations": In standard blackmail (not XOR), FDT recommends refusing to pay even when paying $1 prevents ruin, because it evaluates what would happen "if the type of agent using FDT were never to pay blackmail." Schwarz finds this irrelevant to an agent who is already being blackmailed.
Unsubstantiated performance claims: Whether FDT agents outperform CDT agents depends on the environment and how "outperforming" is measured. In some scenarios, CDT agents fare better.
Six formal deficiencies: Including the mathematical impossibility of reasoning about what a deterministic function "would have" output differently (the counterpossible problem), incoherent probability functions after supposition of impossibilities, and unclear scope of subjunctive suppositions.

Will MacAskill's Critique

MacAskill's critique (prompted by Carl Shulman) raises several concerns:

Ambiguity in what FDT evaluates: Is it evaluating acts, agents, characters, or decision procedures? This matters for determining which theory "does best."
Subjunctive dependence doesn't require functional equivalence: A predictor can predict you well using brain scans or behavioral history without literally running your algorithm. Where does FDT draw the line?
"Implausible discontinuities": FDT's recommendations may jump discontinuously depending on whether the predictor uses a "simulation" vs. "mere statistics," even when both achieve the same prediction accuracy.

The Logical Counterfactuals Problem

Perhaps the deepest open problem in the entire field. FDT requires reasoning about what would happen if a deterministic function produced a different output than it actually does. But in a deterministic setting, the function's output is a mathematical fact—asking "what if $f(x) \neq y$?" is supposing a logical impossibility.

Standard counterfactual semantics (Lewis/Stalnaker) typically treat counterpossibles as vacuously true, which would make all FDT calculations trivial. MIRI's research program identified logical counterfactuals as a core open problem, and Scott Garrabrant's "Logical Induction" (2016) was a significant step toward a theory of logical uncertainty, but the full problem remains open.

Garrabrant et al., "Logical Induction" (arXiv, 2016) MIRI's formal framework for logical uncertainty. Defines a logical inductor that assigns probabilities to mathematical sentences in a way that satisfies a strong coherence criterion.

The Performance Metrics Debate

Caspar Oesterheld argues there's no theory-neutral metric for comparing decision theories. The causal metric (expected payoff from replacing an agent's action) favors CDT; the evidential metric (expected payoff given observation of agent's action) favors EDT; FDT implicitly uses a "subjunctive metric" that isn't independently motivated. This challenges claims that any theory objectively "outperforms" the others.

Academic Reception

The gap between the rationalist community and academic philosophy remains wide. Most academic decision theorists view FDT's problems as either already handled by sophisticated versions of CDT/EDT or as not genuinely problematic. Meanwhile, the rationalist community largely treats FDT (or some successor) as the correct approach. Bridging this gap remains an open social and intellectual challenge.

Oesterheld, "Decision Theory Research Overview" Excellent overview of the field from an AI alignment perspective, covering all major theories and open problems. "Dissolving Confusion around Functional Decision Theory" (LessWrong) Community response to common critiques, clarifying FDT's claims and addressing misunderstandings.

X. Adjacent Topics and Open Frontiers

Decision Theory and AI Alignment

Why does MIRI care so much about decision theory? Because if we build AI systems that make decisions, they implicitly use some decision theory. And the choice of decision theory has profound implications for AI safety:

Corrigibility: An AI using FDT-style reasoning might cooperate with operators whose behavior subjunctively depends on the AI's cooperation, without being exploitable by arbitrary threats.
Self-modification stability: CDT agents might self-modify in harmful ways (e.g., precommitting to two-boxing). FDT is designed to be reflectively consistent—agents using it wouldn't modify themselves to use a different theory.
Counterfactual reasoning about correction: An AI that reasons using bad counterfactuals might decide it shouldn't let humans correct its flaws.

Soares & Fallenstein, "Toward Idealized Decision Theory" (arXiv, 2015) Motivates decision theory as necessary for AI alignment. Demonstrates shortcomings of CDT and EDT for AI, and explores policy selection and logical counterfactuals.

Embedded Agency

Traditional decision theory assumes the agent is separate from the environment—a "Cartesian" setup. But real agents (including AIs) are embedded within the world they're reasoning about. Demski & Garrabrant (2018) identified four sub-problems:

Decision theory: What should embedded agents do? (The subject of this guide.)
Embedded world-models: How should agents model worlds they're part of?
Robust delegation: How should agents oversee agents more capable than themselves?
Subsystem alignment: How do we ensure an agent's subsystems stay aligned?

Demski & Garrabrant, "Embedded Agency" (Alignment Forum sequence) MIRI's research agenda. Identifies the four core problems for agents that are part of the world they reason about.

Infra-Bayesianism

Vanessa Kosoy's Infra-Bayesianism (developed on LessWrong, 2020–present) proposes a decision-theoretic framework based on credal sets (sets of probability distributions rather than a single distribution) and a maximin decision rule. This connects to Ellsberg-style ambiguity aversion and Knightian uncertainty, and may provide foundations for naturalized induction—an agent reasoning about a universe it's embedded in.

Acausal Trade and Cooperation

If FDT-style reasoning is correct, agents can "cooperate" with other agents they've never met, as long as both reason about each other's decision algorithms. This leads to exotic concepts like:

Acausal trade: Two agents in separate parts of the multiverse cooperate by each implementing the policy that would be mutually beneficial if both adopted it.
Multiverse-wide cooperation: Oesterheld (2017) explores how agents in different parts of a multiverse could coordinate their actions acausally.
Program equilibrium: Tennenholtz (2004) showed that if agents can inspect each other's source code, cooperative equilibria become possible in games where they'd normally be impossible.

Tennenholtz, "Program Equilibrium" (Games and Economic Behavior, 2004) The foundational game theory paper showing how agents with visible source code can achieve cooperation impossible under classical Nash equilibrium.

Open Problems (as of 2026)

Logical counterfactuals: No formal account of "what would happen if this deterministic computation output something different."
Counterpossible semantics: How to evaluate probabilities conditioned on impossible propositions.
Subjunctive dependence criteria: When exactly does one system "compute the same function" as another?
Logical uncertainty: Assigning coherent probabilities to mathematical statements.
Commitment races: How to prevent arms races in precommitment between UDT-like agents.
Embedded decision theory: Decision theory for agents that are part of the world they model.
Practical implementation: Can FDT be implemented in real AI systems?

XI. Resources, Further Reading, and Similar Projects

Comprehensive Guides and Overviews

Muehlhauser, "Decision Theory FAQ" (LessWrong) A classic FAQ covering the basics of decision theory, the EDT/CDT debate, and Newcomb's Problem. A good starting point. "Decision Theories: A Less Wrong Primer" (LessWrong) Covers EDT, CDT, TDT, UDT, and FDT with worked examples and comparisons. Probably the best single community-written overview. "Comparison of Decision Theories" (Alignment Forum) Systematic comparison of CDT, EDT, TDT, UDT, and FDT across many problems, with focus on logical decision theories. Oesterheld, "Decision Theory Research Overview" Overview of the landscape from an AI alignment perspective. Covers all major theories, open problems, and connections to game theory. Issa Rice, "Timeline of Decision Theory" Chronological timeline of key events, papers, and developments in decision theory. Excellent for getting the historical picture. Gwern's Decision Theory Notes Gwern's extensive curated collection of decision theory links, notes, and commentary. 300+ items.

Key Papers (Chronological)

Year	Paper	Significance
1969	Nozick, "Newcomb's Problem and Two Principles of Choice"	Introduced Newcomb's Problem
1978	Gibbard & Harper, "Counterfactuals and Two Kinds of Expected Utility"	Founded CDT; V vs. U distinction
1979	Lewis, "Prisoners' Dilemma is a Newcomb Problem"	Connected PD to Newcomb's
1981	Lewis, "Causal Decision Theory"	Classic CDT defense
1983	Kavka, "The Toxin Puzzle" (Analysis 43(1))	Isolated intention from action
1999	Joyce, The Foundations of Causal Decision Theory	Definitive book-length CDT defense
2006	Drescher, Good and Real	Subjunctive means-end relations
2009	Wei Dai, "Towards a New Decision Theory"	Introduced UDT
2010	Yudkowsky, "Timeless Decision Theory"	TDT formal paper
2014	Ahmed, Evidence, Decision and Causality	Major EDT defense
2016	Garrabrant et al., "Logical Induction"	Formal logical uncertainty framework
2017	Yudkowsky & Soares, "Functional Decision Theory"	Introduced FDT
2020	Levinstein & Soares, "Cheating Death in Damascus"	FDT in Journal of Philosophy