A comprehensive guide to the theories, thought experiments, and debates that shape how we think about rational choice
The story of decision theory begins with gamblers, theologians, and mathematicians trying to answer a deceptively simple question: when facing uncertainty, what should you do?
Over three centuries, the answer evolved from intuitive rules of thumb into one of the most elegant mathematical frameworks in all of philosophy. But cracks appeared almost immediately—and those cracks eventually split the field wide open.
In the summer of 1654, Blaise Pascal and Pierre de Fermat exchanged a series of letters about the "problem of points"—how to fairly divide the stakes of an interrupted gambling game. In working out the answer, they invented probability theory.
Pascal then applied this new mathematics to the biggest bet of all. In his Pensées (published posthumously in 1670), he presented what is now called Pascal's Wager: a decision-theoretic argument for believing in God.
Either God exists or God doesn't. You can either believe or not believe. If God exists and you believe, you gain infinite happiness. If God exists and you don't believe, you suffer. If God doesn't exist, the cost of belief is finite. Therefore, for any positive probability that God exists, the expected value of believing is infinite.
The Wager is often dismissed today, but its significance for decision theory is enormous: it was the first explicit use of what we now call expected value reasoning applied to a practical choice under uncertainty. Pascal introduced the idea of multiplying the probability of each outcome by its value and summing the results—the core operation that would eventually become expected utility theory.
Stanford Encyclopedia of Philosophy: Pascal's Wager Comprehensive entry covering all versions of the argument, the infinite utility problem, and the many-gods objection.In 1713, Nicolas Bernoulli posed a puzzle in a letter to Pierre Raymond de Montmort that would haunt decision theory for centuries. His cousin Daniel Bernoulli published the definitive analysis in 1738.
A fair coin is flipped repeatedly until it lands tails. If the first tails appears on flip $n$, you receive $2^n$ dollars. What is this game worth to you?
The expected monetary value is:
$$E[\text{payoff}] = \sum_{n=1}^{\infty} \frac{1}{2^n} \cdot 2^n = \sum_{n=1}^{\infty} 1 = \infty$$Yet no one would pay a million dollars for a single play.
Daniel Bernoulli's solution was revolutionary: people don't maximize expected money—they maximize expected utility. He proposed that utility is a concave function of wealth, specifically logarithmic:
$$u(w) = \ln(w)$$Under this assumption, the expected utility of the St. Petersburg game is finite, resolving the paradox. More importantly, Bernoulli introduced the foundational concept of decision theory: the utility function—a mathematical representation of subjective value that can differ from objective monetary value.
SEP: The St. Petersburg Paradox Full history from Nicolas Bernoulli through modern resolutions, including bounded utility and risk-weighting approaches.Frank Ramsey, a Cambridge mathematician who died at age 26, wrote a paper that was decades ahead of its time. In "Truth and Probability" (written 1926, published posthumously 1931), Ramsey showed how to derive both probability and utility simultaneously from an agent's preferences over bets.
His key insight: if you prefer bet A to bet B, and bet B to bet C, these preferences implicitly reveal both how much you value the outcomes and how likely you think the relevant events are. There's no need to start with an objective probability—probability emerges from the structure of coherent preferences.
Ramsey also introduced the Dutch Book argument: if your degrees of belief don't satisfy the probability axioms, a bookie can construct a series of bets that guarantee you lose money regardless of what happens. This was the first rigorous argument that rational beliefs must obey probability theory.
Ramsey's work was largely ignored until Leonard Savage rediscovered and extended it in the 1950s. Today he is recognized as the true pioneer of subjective probability.
In their monumental Theory of Games and Economic Behavior (1944), John von Neumann and Oskar Morgenstern provided the first rigorous axiomatization of expected utility theory. Their result, the VNM utility theorem, showed that if your preferences over lotteries satisfy four seemingly innocuous axioms, then you must be acting as if you are maximizing expected utility for some utility function.
The four axioms:
The Independence axiom is the crucial one—and the one that would be challenged by Allais. It says that your preference between two options shouldn't change just because you mix both with the same third option.
The theorem: If preferences satisfy these four axioms, there exists a utility function $u$ (unique up to positive affine transformation) such that:
$$L_1 \succeq L_2 \iff \mathbb{E}[u(L_1)] \geq \mathbb{E}[u(L_2)]$$This is a representation theorem: it doesn't say you should maximize expected utility; it says that any "coherent" preferences can be described as expected utility maximization. The normative force comes from arguing that the axioms are requirements of rationality.
Yudkowsky: "Coherent Decisions Imply Consistent Utilities" (LessWrong/Arbital) An accessible, non-technical introduction to why violating the VNM axioms means "leaving money on the table." Written for the LessWrong audience. "Why You Must Maximize Expected Utility" (LessWrong) A more mathematical walk-through of the VNM theorem and Dutch Book arguments.At a famous 1952 Paris conference attended by many of the founders of decision theory, French economist Maurice Allais presented a set of choices designed to embarrass expected utility theory. Consider:
Choice 1:
Choice 2:
Most people prefer 1A over 1B (the certainty is appealing), but also prefer 2B over 2A (might as well go for the bigger prize). This combination violates the Independence axiom.
The algebraic proof is clean. If you prefer 1A to 1B:
$$u(1M) > 0.89 \cdot u(1M) + 0.10 \cdot u(5M) + 0.01 \cdot u(0)$$Rearranging: $0.11 \cdot u(1M) > 0.10 \cdot u(5M) + 0.01 \cdot u(0)$
But preferring 2B to 2A implies:
$$0.10 \cdot u(5M) + 0.90 \cdot u(0) > 0.11 \cdot u(1M) + 0.89 \cdot u(0)$$Which gives: $0.10 \cdot u(5M) + 0.01 \cdot u(0) > 0.11 \cdot u(1M)$
This directly contradicts the first inequality. The common choice pattern is algebraically inconsistent with any utility function.
Legend has it that Leonard Savage himself displayed the Allais pattern when first confronted with the choices, then revised his answers upon seeing the proof. The Allais Paradox demonstrates the certainty effect: people overweight outcomes that are certain relative to outcomes that are merely probable, violating the Independence axiom. This later inspired Kahneman and Tversky's Prospect Theory.
Leonard Savage's The Foundations of Statistics (1954) is perhaps the most important single work in decision theory. Savage synthesized the ideas of Ramsey (subjective probability from preferences) and von Neumann-Morgenstern (expected utility from axioms) into a unified framework that simultaneously derives both a subjective probability function and a utility function from preferences.
Savage's framework has three primitives: states of the world (things beyond the agent's control), consequences (outcomes the agent cares about), and acts (functions from states to consequences—i.e., what happens depends on what you do and how the world is).
His seven axioms (P1–P7) include the celebrated Sure-Thing Principle:
If you would prefer act $f$ to act $g$ both when event $E$ obtains and when event $E$ does not obtain, then you should prefer $f$ to $g$ unconditionally.
This is Savage's version of the Independence axiom, and it's the principle that both the Allais Paradox and Newcomb's Problem put under pressure.
Savage's representation theorem: Preferences satisfying his axioms uniquely determine a probability measure $P$ over states and a utility function $u$ over consequences such that act $f$ is preferred to act $g$ if and only if:
$$\mathbb{E}_P[u(f)] = \int u(f(s)) \, dP(s) \geq \int u(g(s)) \, dP(s) = \mathbb{E}_P[u(g)]$$A crucial feature of Savage's framework: states are assumed to be probabilistically independent of acts. This assumption is exactly what Evidential Decision Theory would later challenge.
SEP: Decision Theory Comprehensive survey of normative decision theory, covering Savage's framework, the VNM theorem, and the debates they spawned.Daniel Ellsberg (yes, the Pentagon Papers Ellsberg—he was an economist before he was a whistleblower) presented another challenge to expected utility theory in his 1961 paper "Risk, Ambiguity, and the Savage Axioms."
An urn contains 30 red balls and 60 balls that are either green or blue, in unknown proportions. You can bet on drawing a specific color.
Most people prefer A. But now consider:
Most people prefer D. But preferring A to B implies $P(\text{red}) > P(\text{green})$, and preferring D to C implies $P(\text{green or blue}) > P(\text{red or blue})$, i.e., $P(\text{green}) > P(\text{red})$—a contradiction.
The Ellsberg Paradox reveals ambiguity aversion: people prefer known risks over unknown risks, even when no assignment of probabilities can rationalize their preferences. This violates Savage's framework, which requires a unique subjective probability over all events.
This matters for our story because it shows that the classical foundations, while beautiful, don't fully capture human reasoning about uncertainty. And the biggest challenge to those foundations was just around the corner.
"The Savage Theorem and the Ellsberg Paradox" (LessWrong) A clear walkthrough of how the Ellsberg choices violate Savage's Sure-Thing Principle, with discussion of whether ambiguity aversion is rational.In 1963, at a cocktail party, Harvard philosopher Robert Nozick heard about a puzzle from mathematician Martin Kruskal, who had learned it from physicist William Newcomb at Lawrence Livermore Laboratory. Nozick later called it "the most consequential party I have attended."
In 1969, Nozick published the problem, and it detonated like a bomb in the middle of decision theory. Decades later, the rubble is still being sorted.
A superintelligent being called Omega (a near-perfect predictor) presents you with two boxes:
You may take both boxes ("two-box") or only Box B ("one-box").
The catch: Omega has already predicted your choice. If it predicted you'd one-box, it placed $1,000,000 in Box B. If it predicted you'd two-box, Box B is empty. Omega has been correct in every observed case.
The argument for two-boxing (dominance): Box B already contains whatever it contains. Your choice can't change the past. Whatever is in Box B, taking both boxes gets you $1,000 more. Two-boxing strictly dominates.
$$\forall S: \quad U(\text{two-box} \mid S) > U(\text{one-box} \mid S)$$The argument for one-boxing (expected utility): One-boxers walk away with $1,000,000. Two-boxers walk away with $1,000. If Omega's prediction accuracy is $p$:
$$EU(\text{one-box}) = p \cdot \$1{,}000{,}000 + (1-p) \cdot \$0$$ $$EU(\text{two-box}) = (1-p) \cdot \$1{,}001{,}000 + p \cdot \$1{,}000$$For $p = 0.99$: one-boxing yields $\$990{,}000$ in expectation vs. two-boxing's $\$11{,}000$. The break-even accuracy is remarkably low: $p \approx 0.5005$.
Both arguments seem airtight, and yet they give opposite answers. As Nozick observed, when he presented the problem to people, they "divide almost evenly, with large numbers thinking that the [other side is] just being silly."
The problem forces a choice between two bedrock principles of rational decision-making—dominance reasoning and expected utility maximization—and in doing so, it splits decision theory into two camps that are still arguing today.
Nozick, "Newcomb's Problem and Two Principles of Choice" (1969, PDF) The original paper that started it all. Presents the problem, both arguments, and Nozick's (tentative) two-boxing position. Yudkowsky, "Newcomb's Problem and Regret of Rationality" (LessWrong) Yudkowsky's influential argument for one-boxing: "If it's stupid but it works, it's not stupid." SEP: Causal Decision Theory The main encyclopedia entry covering Newcomb's Problem in depth, including formal setups and both sides of the debate.How do people actually split? The 2020 PhilPapers Survey of professional philosophers found: 39% two-box, 31.2% one-box, ~30% other/undecided. Martin Gardner's 1973 Scientific American column generated reader mail running ~71% for one-boxing. The LessWrong rationalist community overwhelmingly one-boxes, influenced by FDT-style reasoning.
Oesterheld, "A Survey of Polls on Newcomb's Problem" Meta-survey compiling results from multiple polls of philosophers, students, and the general public.Evidential Decision Theory traces to Richard Jeffrey's The Logic of Decision (1965, revised 1983), though Jeffrey didn't use the term "EDT"—the label was crystallized later when Gibbard and Harper (1978) drew the EDT/CDT distinction.
EDT's prescription is elegantly simple: choose the action that is the best news you could learn about yourself. More formally: choose the action with the highest conditional expected utility.
$$V(A) = \sum_{s} P(s \mid A) \cdot U(A, s)$$where $P(s \mid A)$ is your conditional probability of state $s$ given that you perform act $A$, and $U(A,s)$ is the utility of the outcome.
The key word is conditional. Your action is treated as evidence about the state of the world. If one-boxing is evidence that Box B contains a million dollars (because Omega predicted your one-boxing), then one-boxing has high conditional expected utility.
Jeffrey's major innovation was treating acts, states, and outcomes as propositions in a single Boolean algebra, rather than maintaining Savage's tripartite distinction. This means acts can be probabilistically dependent on states—precisely what's needed for Newcomb-like reasoning.
Jeffrey's desirability formula:
$$\text{Des}(A) = \sum_{i} P(S_i \mid A) \cdot \text{Des}(A \wedge S_i)$$Jeffrey also developed probability kinematics (Jeffrey conditionalization), a generalization of Bayesian updating for uncertain evidence:
$$P_{\text{new}}(H) = \sum_{i} P_{\text{old}}(H \mid E_i) \cdot P_{\text{new}}(E_i)$$This plays a supporting role in EDT by providing the epistemological foundation for how agents update beliefs.
EDT: One-box. ($1,000,000)
One-boxing is evidence that Omega predicted one-boxing, so Box B almost certainly contains $1M. EDT follows the evidence.
EDT (naive): Don't smoke. EDT (with tickle defense): Smoke.
Naive EDT says smoking is evidence of the cancer-causing lesion. But the tickle defense (Eells 1982) argues that once you know your own desire to smoke, the act itself provides no additional evidence about the lesion. Sophisticated EDT smokes.
EDT: Cooperate. ($5 each)
Your cooperation is evidence that your twin cooperates. EDT cooperates, aligning with Hofstadter's "superrationality."
The most prominent contemporary EDT defender is Arif Ahmed, whose Evidence, Decision and Causality (Cambridge, 2014) mounts a systematic defense. Ahmed argues EDT is more parsimonious than CDT (it requires only probability, not a theory of causation), and presses the "Why Ain'cha Rich?" argument: if two-boxing is rational, why do two-boxers end up with $1,000 while one-boxers get $1,000,000?
Ahmed also presents the Betting on the Past scenario, where CDT recommends a predictably losing bet, and argues this is a reductio of CDT.
Summary of Ahmed's "Evidence, Decision and Causality" (EA Forum) Detailed chapter-by-chapter summary of the most important contemporary defense of EDT.David Lewis famously accused EDT of recommending "an irrational policy of managing the news"—choosing actions that give you good news about the world rather than actions that actually make the world better. EDT's defenders reply that in Newcomb-like cases, the good news is the good outcome: one-boxers really do end up richer.
Jeffrey, The Logic of Decision (U. Chicago Press, 1983) The foundational text. Readable and elegant. The 1983 edition adds the crucial discussion of ratifiability.Causal Decision Theory emerged as a direct response to EDT's recommendation in Newcomb's Problem. Its founders—Allan Gibbard, William Harper, David Lewis, and Brian Skyrms—argued that rational decision-making should attend to what your actions cause, not merely what they're evidence for.
CDT replaces EDT's conditional probabilities with causal or counterfactual probabilities. The key question becomes: "What would happen if I were to do $A$?"—using a subjunctive conditional rather than conditioning on evidence.
Lewis's formulation (1981) uses dependency hypotheses:
$$U(A) = \sum_K P(K) \cdot V(A \wedge K)$$where $K$ ranges over dependency hypotheses—maximally specific propositions about how outcomes depend on actions, held fixed while evaluating the action.
Gibbard-Harper formulation (1978) uses counterfactual conditionals:
$$U(A) = \sum_S P(A \boxright S) \cdot V(S)$$where $A \boxright S$ means "if $A$ were performed, $S$ would obtain." The probability $P(A \boxright S)$ is evaluated using Stalnaker's closest-world semantics, not by conditioning on $A$ as evidence.
There's also a deep connection to Judea Pearl's do-calculus, which formalizes causal reasoning using interventions in structural causal models. Pearl's $P(Y \mid \text{do}(X))$ captures exactly the causal probability CDT needs, distinguishing seeing that $X$ happened from making $X$ happen.
CDT: Two-box. ($1,000)
Your choice cannot causally affect Omega's past prediction. Box B's contents are fixed. Two-boxing strictly dominates.
CDT: Smoke. (Correct)
Smoking doesn't cause cancer (the lesion does). CDT correctly ignores the evidential correlation and recommends the pleasurable, causally harmless action.
CDT: Defect. ($1)
Your choice can't causally affect your twin's independent decision. Defection is the dominant strategy. CDT defects—and both twins end up with $1 each.
Joyce's The Foundations of Causal Decision Theory (Cambridge, 1999) provides the most comprehensive book-length defense of CDT, including a representation theorem showing that both CDT and EDT are instances of a more general conditional decision theory.
CDT's Achilles' heel is Newcomb's Problem itself. CDT two-boxes and gets $1,000 while one-boxers get $1,000,000. Gibbard and Harper themselves noticed this concern, and CDT defenders have struggled with the "Why Ain'cha Rich?" retort ever since.
Beyond Newcomb's, CDT faces problems with:
Decision theories are tested by thought experiments the way physical theories are tested by experiments. Each problem probes a different aspect of rational choice. Here's the bestiary, with what each theory recommends.
A genetic lesion causes both (a) a desire to smoke and (b) lung cancer. Smoking itself does not cause cancer. Smoking is pleasurable. Should you smoke?
| Theory | Answer | Reasoning |
|---|---|---|
| EDT (naive) | Don't smoke | Smoking is evidence of the lesion/cancer |
| EDT (tickle) | Smoke | Desire screens off the act from the lesion |
| CDT | Smoke | Smoking doesn't cause cancer |
| FDT | Smoke | The lesion isn't computing your decision function |
This is CDT's showcase problem and EDT's embarrassment. It's also the mirror image of Newcomb's: in Newcomb's, EDT gets it "right"; here, EDT gets it "wrong." FDT claims to handle both correctly.
Death, a perfect predictor, tells you in Damascus that you have an appointment tomorrow. Death's appointment book (written in advance) lists the city where you will die. You can stay in Damascus or flee to Aleppo. Death will be waiting wherever the book says.
CDT enters an infinite regret loop: if you plan to stay, Death is in Damascus, so you want to flee; but if you plan to flee, Death is in Aleppo, so you want to stay. No pure strategy is self-ratifying under CDT. CDT typically resolves this via mixed strategies (flip a fair coin).
FDT recognizes that either way you're doomed (Death predicted correctly), so it stays to save the trouble of traveling. Levinstein & Soares (2020) provide a careful analysis distinguishing different versions of Death's prediction rule.
Levinstein & Soares, "Cheating Death in Damascus" (2020, PDF) Published in The Journal of Philosophy. Distinguishes versions of Death in Damascus and shows how FDT handles each. The most rigorous academic treatment of FDT.You're dying in the desert. A driver will save you if and only if she predicts (with ~99% accuracy via reading micro-expressions) that you will pay her $100 upon reaching town. Once you're safely in town, should you pay?
| Theory | Answer | Result |
|---|---|---|
| CDT | Don't pay | Dies in desert (can't credibly commit) |
| EDT | Pay | Survives (paying is evidence of the type the driver picks up) |
| FDT | Pay | Survives (decision algorithm determines both prediction and action) |
This is structurally equivalent to the Transparent Newcomb Problem and is one of the clearest cases where CDT's inability to make credible commitments is costly.
A billionaire will pay you $1,000,000 tomorrow morning if at midnight tonight you genuinely intend to drink a mildly unpleasant (but harmless) toxin tomorrow afternoon. You need not actually drink it—you just need to form a genuine intention. Can you?
The puzzle isolates intentions from actions. Both CDT and EDT struggle: a CDT agent who knows it won't drink can't form a genuine intention. FDT-style agents can intend to drink because they evaluate the policy of "intend and follow through" as superior to "try to game it."
Newcomb's Problem, but both boxes are transparent—you can see whether Box B contains $1M before choosing. If you see the $1M, should you take just Box B or both boxes?
If you see $1M in Box B, one-boxing means knowingly leaving $1,000 on the table. Yet FDT one-boxes: the reason you see the million is that you're the kind of agent who one-boxes. If you were a two-boxer, you'd be staring at an empty box. FDT agents seeing $1M in the box get $1M; CDT agents consistently see $0.
Omega flips a fair coin. Heads: Omega gives you $10,000 if it predicts you'd pay $100 on tails. Tails: Omega asks you for $100. You see tails. Do you pay?
| Theory | Answer | Expected Value of Policy |
|---|---|---|
| CDT | Don't pay | $0 (never gets the $10,000) |
| EDT | Don't pay | $0 (after updating on tails, paying is pure loss) |
| TDT | Struggles | Updates on tails, may refuse |
| UDT/FDT | Pay | $4,950 (= 0.5 × $10,000 − 0.5 × $100) |
This is the problem that broke TDT and motivated UDT. The "pay" policy dominates when evaluated from the prior (before seeing the coin), but all "updateful" theories (CDT, EDT, TDT) refuse to pay after seeing tails.
"Counterfactual Mugging" (LessWrong, original post by Vladimir Nesov) The original formulation. Sparked extensive debate about updatelessness and the nature of rational commitment.You hear a rumor about $1M termites. A greedy predictor sends a letter: "I sent this iff exactly one of: (i) no termites and you pay me $1,000, or (ii) termites and you don't pay." You received the letter. Pay?
EDT pays (it's "good news" about not having termites). CDT and FDT refuse—the termites are already there or not, independent of your decision algorithm. XOR Blackmail is a clean counterexample to EDT that doesn't involve the Smoking Lesion's common-cause structure.
Paul can press a button that kills all psychopaths. He believes only a psychopath would press it. Paul strongly prefers living to a world without psychopaths.
CDT presses (pressing doesn't cause you to be a psychopath), which likely kills Paul. EDT doesn't press (pressing is evidence of psychopathy). FDT doesn't press either—your decision algorithm determines both your action and your character, and the algorithm that outputs "don't press" is evidence of non-psychopathy.
| Problem | EDT | CDT | FDT |
|---|---|---|---|
| Newcomb's Problem | One-box ($1M) | Two-box ($1K) | One-box ($1M) |
| Smoking Lesion | Don't smoke* | Smoke | Smoke |
| Death in Damascus | Unstable | Mixed strategy | Stay (context-dep.) |
| Twin PD | Cooperate ($5) | Defect ($1) | Cooperate ($5) |
| Parfit's Hitchhiker | Pay (survives) | Don't pay (dies) | Pay (survives) |
| Toxin Puzzle | Can't intend | Can't intend | Intends & drinks |
| Transparent Newcomb | Two-box ($0) | Two-box ($0) | One-box ($1M) |
| Counterfactual Mugging | Don't pay | Don't pay | Pay |
| XOR Blackmail | Pay (wrong) | Don't pay | Don't pay |
| Psychopath Button | Don't press | Press (dies!) | Don't press |
*EDT with the tickle defense smokes. The table shows naive EDT.
The pattern: EDT gets Newcomb-like problems right (where prediction tracks your algorithm) but fails on "medical Newcomb" problems (where correlation doesn't track your algorithm). CDT gets medical Newcomb right but fails on standard Newcomb. FDT claims to get both classes right by asking: is the correlation mediated by something computing your decision function?
By the late 2000s, the EDT-CDT stalemate had persisted for decades. The first serious attempt at a third way came from the rationalist community around LessWrong.
Douglas Hofstadter's "Superrationality" (1983) planted the seed. In the Scientific American essay later collected in Metamagical Themas (1985), Hofstadter argued that identical reasoners in a symmetric Prisoner's Dilemma should cooperate: "Whatever I decide, my opponent decides the same thing, so I'm choosing between mutual cooperation and mutual defection."
Gary Drescher's Good and Real (2006) developed this further with the concept of subjunctive means-end relations—non-causal links between actions and outcomes that are stronger than mere evidence. Drescher argued for "acausal" counterfactual reasoning where it makes sense to act as if your choice affects conditions preceding the choice, even without any causal link.
Drescher, Good and Real (MIT Press, 2006, PDF via Gwern) The conceptual ancestor of logical decision theories. Develops "subjunctive means-end relations" and applies them to Newcomb's Problem, the PD, and ethics.In 2010, Eliezer Yudkowsky published "Timeless Decision Theory" through MIRI. The central thesis:
Agents should decide as if they are determining the output of the abstract computation that they implement, including the output of all other instantiations and simulations of that computation.
This is the fundamental insight of the "logical decision theory" family. You're not choosing an action—you're choosing the output of an algorithm. Since the same algorithm may be instantiated in your brain, simulated by Omega, running in your twin's brain, etc., choosing its output simultaneously determines what happens everywhere it runs.
TDT extends causal Bayesian networks with computation nodes representing abstract computations. The agent's decision is modeled as a computation node that influences both:
The algorithm:
TDT one-boxes on Newcomb's (like EDT), smokes on the Smoking Lesion (like CDT), cooperates in the Twin PD, and pays in Parfit's Hitchhiker. It was designed to be the theory that wins—and it does, on these cases.
TDT's critical flaw: it updates on observations before computing expected utility. After seeing tails in Counterfactual Mugging, TDT reasons within the "tails branch" and may refuse to pay—missing the cross-branch benefits of the "always pay" policy.
This flaw directly motivated the development of Updateless Decision Theory.
Yudkowsky, "Timeless Decision Theory" (MIRI, 2010, PDF) The original TDT paper. 16 pages. Develops timeless decision diagrams and proves TDT's reflective consistency. Yudkowsky, "TDT: Problems I Can't Solve" (LessWrong) Yudkowsky's own list of open problems and limitations of TDT. Refreshingly honest about the theory's incompleteness.In March 2009, Wei Dai posted "Towards a New Decision Theory" on LessWrong—partly as "a guess about Timeless Decision Theory" since "there seems to be little hope that Eliezer will publish his TDT any time soon." The result was arguably more important than TDT itself.
UDT's principle is startlingly simple:
The optimal agent commits to the best policy—the best mapping from observations to actions—as estimated by its prior beliefs, before any observations are made.
In Wei Dai's words: "We give up the idea of 'conditioning on the blue box' and instead just choose the action that will maximize the unconditional expected utility."
The key word is unconditional. Standard decision theories (CDT, EDT, and even TDT) update on observations and then choose actions. UDT never updates. It commits to a policy evaluated from the standpoint of the prior.
The deepest distinction between UDT and earlier theories:
UDT selects the policy that maximizes expected utility according to the prior:
$$\pi^* = \arg\max_{\pi} \sum_{w \in W} P(w) \cdot U(\text{outcome}(w, \pi))$$Then upon receiving observation $o$, simply executes $\pi^*(o)$—the action prescribed by the optimal policy.
UDT 1.0 optimized each action independently: for each observation, find the best action. Wei Dai discovered this could produce globally suboptimal policies (the agent fails to "coordinate with itself" across different observations).
UDT 1.1 ("Explicit Optimization of Global Strategy") fixes this by iterating over complete policies rather than individual actions. UDT 1.1 finds the globally optimal policy first, then looks up what it prescribes for the current observation.
UDT handles Counterfactual Mugging trivially:
The "pay" policy dominates. Upon observing tails, UDT simply executes the optimal policy and pays. No agonizing needed.
UDT functions as an automatic commitment device. In game theory, commitment devices are external mechanisms that bind you to future actions. UDT achieves this without any external mechanism: an agent that selects the globally optimal policy from the prior automatically behaves as if it has made all beneficial precommitments.
UDT's elegance comes at a steep price:
These problems remain open. As one LessWrong post put it: "Formalising decision theory is hard."
Wei Dai, "Towards a New Decision Theory" (LessWrong, 2009) The original UDT post. Crisp and foundational. Introduces policy selection and updatelessness. "What is Wei Dai's Updateless Decision Theory?" (LessWrong) A community-written explanation of UDT aimed at newcomers.In 2017, Eliezer Yudkowsky and Nate Soares published "Functional Decision Theory: A New Theory of Instrumental Rationality" through MIRI (also on arXiv). FDT was intended as an umbrella framework capturing the shared insights of TDT and UDT in a more accessible formulation.
An agent should treat its decision as the output of a fixed mathematical function and choose the output that maximizes expected utility, taking into account all the consequences of that function outputting that value.
The key shift from CDT: rather than asking "What would happen if I did action $A$?" (intervening on the physical action), FDT asks "What would happen if my decision algorithm output $A$?" This seemingly subtle change has dramatic consequences when the algorithm is modeled, simulated, or predicted elsewhere.
The paper distinguishes three expected utility calculations via the operator used:
EDT uses conditional probability:
$$\text{EDT}(P, G) = \arg\max_a \sum_o P(o \mid a) \cdot G(o)$$CDT uses causal probability (do-operator):
$$\text{CDT}(P, G) = \arg\max_a \sum_o P(o \| a) \cdot G(o)$$FDT uses the subjunctive/dagger operator:
$$\text{FDT}(P, G) = \arg\max_a \sum_o P(o \mathbin{\dag} a) \cdot G(o)$$where $P(o \mathbin{\dag} a)$ represents the probability of outcome $o$ were the agent's decision function to output $a$. The dagger captures subjunctive dependence: connections that flow through shared computational structure.
FDT's central innovation is the concept of subjunctive dependence:
This is what lets FDT handle both classes of problems:
From the paper: FDT "manages consequences, not news."
A helpful restatement: "Questions in decision theory are not questions about what choices you should make with some sort of unpredictable free will. They are questions about what type of source code you should be running."
Yudkowsky & Soares, "Functional Decision Theory" (arXiv, 2017) The original FDT paper. Introduces subjunctive dependence, the dagger operator, and works through all major problems. "An Intuitive Introduction to Functional Decision Theory" (LessWrong) A gentler introduction with worked examples. Good starting point before the paper. MIRI Announcement: Functional Decision Theory MIRI's blog post introducing FDT with summary and context.FDT has drawn both serious academic criticism and vigorous community debate. The theory remains an arXiv preprint—it was rejected from journal publication after revisions—though the companion paper "Cheating Death in Damascus" was published in The Journal of Philosophy.
Philosopher Wolfgang Schwarz identifies three fundamental problems:
MacAskill's critique (prompted by Carl Shulman) raises several concerns:
Perhaps the deepest open problem in the entire field. FDT requires reasoning about what would happen if a deterministic function produced a different output than it actually does. But in a deterministic setting, the function's output is a mathematical fact—asking "what if $f(x) \neq y$?" is supposing a logical impossibility.
Standard counterfactual semantics (Lewis/Stalnaker) typically treat counterpossibles as vacuously true, which would make all FDT calculations trivial. MIRI's research program identified logical counterfactuals as a core open problem, and Scott Garrabrant's "Logical Induction" (2016) was a significant step toward a theory of logical uncertainty, but the full problem remains open.
Garrabrant et al., "Logical Induction" (arXiv, 2016) MIRI's formal framework for logical uncertainty. Defines a logical inductor that assigns probabilities to mathematical sentences in a way that satisfies a strong coherence criterion.Caspar Oesterheld argues there's no theory-neutral metric for comparing decision theories. The causal metric (expected payoff from replacing an agent's action) favors CDT; the evidential metric (expected payoff given observation of agent's action) favors EDT; FDT implicitly uses a "subjunctive metric" that isn't independently motivated. This challenges claims that any theory objectively "outperforms" the others.
The gap between the rationalist community and academic philosophy remains wide. Most academic decision theorists view FDT's problems as either already handled by sophisticated versions of CDT/EDT or as not genuinely problematic. Meanwhile, the rationalist community largely treats FDT (or some successor) as the correct approach. Bridging this gap remains an open social and intellectual challenge.
Oesterheld, "Decision Theory Research Overview" Excellent overview of the field from an AI alignment perspective, covering all major theories and open problems. "Dissolving Confusion around Functional Decision Theory" (LessWrong) Community response to common critiques, clarifying FDT's claims and addressing misunderstandings.Why does MIRI care so much about decision theory? Because if we build AI systems that make decisions, they implicitly use some decision theory. And the choice of decision theory has profound implications for AI safety:
Traditional decision theory assumes the agent is separate from the environment—a "Cartesian" setup. But real agents (including AIs) are embedded within the world they're reasoning about. Demski & Garrabrant (2018) identified four sub-problems:
Vanessa Kosoy's Infra-Bayesianism (developed on LessWrong, 2020–present) proposes a decision-theoretic framework based on credal sets (sets of probability distributions rather than a single distribution) and a maximin decision rule. This connects to Ellsberg-style ambiguity aversion and Knightian uncertainty, and may provide foundations for naturalized induction—an agent reasoning about a universe it's embedded in.
If FDT-style reasoning is correct, agents can "cooperate" with other agents they've never met, as long as both reason about each other's decision algorithms. This leads to exotic concepts like:
| Year | Paper | Significance |
|---|---|---|
| 1969 | Nozick, "Newcomb's Problem and Two Principles of Choice" | Introduced Newcomb's Problem |
| 1978 | Gibbard & Harper, "Counterfactuals and Two Kinds of Expected Utility" | Founded CDT; V vs. U distinction |
| 1979 | Lewis, "Prisoners' Dilemma is a Newcomb Problem" | Connected PD to Newcomb's |
| 1981 | Lewis, "Causal Decision Theory" | Classic CDT defense |
| 1983 | Kavka, "The Toxin Puzzle" (Analysis 43(1)) | Isolated intention from action |
| 1999 | Joyce, The Foundations of Causal Decision Theory | Definitive book-length CDT defense |
| 2006 | Drescher, Good and Real | Subjunctive means-end relations |
| 2009 | Wei Dai, "Towards a New Decision Theory" | Introduced UDT |
| 2010 | Yudkowsky, "Timeless Decision Theory" | TDT formal paper |
| 2014 | Ahmed, Evidence, Decision and Causality | Major EDT defense |
| 2016 | Garrabrant et al., "Logical Induction" | Formal logical uncertainty framework |
| 2017 | Yudkowsky & Soares, "Functional Decision Theory" | Introduced FDT |
| 2020 | Levinstein & Soares, "Cheating Death in Damascus" | FDT in Journal of Philosophy |
For the mathematically-literate reader coming in fresh:
This guide was compiled in March 2026, synthesizing research from academic papers, the Stanford Encyclopedia of Philosophy, LessWrong, MIRI technical reports, and the broader rationalist and philosophical communities. All errors are the compiler's own. Corrections and suggestions welcome.
For comprehensive research notes underlying this guide, see the /research/ directory—organized by topic with full bibliographies, summaries, and link catalogs.