Back Home

My Ethical Framework

An attempt to write down what I actually believe, where it comes from, and where it falls apart.

This is a summary of an ethical framework extracted from a long conversation. It's a policy-level consequentialist view built on a handful of brute intuitions, held with both conviction and awareness that the foundations can't be proven. The framework takes second-order effects seriously, leans toward person-affecting population ethics, treats autonomy as instrumentally rather than intrinsically valuable, and is applied to concrete questions about AI, veganism, giving, and governance.

Each section includes its own limitations. The framework is honest about what it can't justify and where it falls apart under pressure.

Note: This page was AI-generated and has primarily been AI fact-checked, not personally verified by the author.

Metaethics

She thinks moral realism is sort of true and sort of false. Conditioned on caring about certain axioms (suffering is bad, pleasure is good), you can derive real moral truths, and those derivations do genuine work. But the axioms themselves have to be chosen, and the choice depends on the kind of agent you are and the environment you're in. There's no view from nowhere that tells you which axioms to start with. A lot of philosophers seem to endorse moral realism more strongly than this, but she's not entirely sure what they mean by it or whether the disagreement is substantive.

At the end of the day, choosing the axioms is vibes and intuition. She's grappled with nihilism seriously and concluded there's no real resolution, just a pragmatic decision to keep acting anyway. This doesn't undermine the project of trying to act well, but it means holding everything with awareness that the foundations are, ultimately, felt rather than proven.

✧ Limitations
  • Every axiom in the system has the same justification: "it seems true to me." There is no deeper floor.
  • The nihilism resolution is pragmatic rather than philosophical. She hasn't defeated it, just decided not to let it paralyse her. Arguably the most honest available response, but it means the whole structure could feel hollow under enough pressure.
  • "It all bottoms out in feelings" could function as a thought-terminating move, a way to avoid distinguishing between axioms that are better vs worse justified.
Red team / blue team on foundations, framework & axioms

Policy-Level Consequentialism

Consequences are what matter. She evaluates moral rules at the policy level: a rule is good if adopting it globally produces better outcomes than alternatives. She rejects act-by-act calculation because humans are bad at it and consistency has its own value. She'll break a rule if the stakes are high enough and her confidence is high enough, but the default is strong adherence to pre-committed policies.

Crucially, consequences get evaluated at multiple scales. A naive first-order accounting is usually wrong because it misses norm erosion, coordination effects, infrastructure-building, and defensive costs imposed on others. The real cost of a harmful action is typically much higher than the direct harm. This is why things like murder or eating meat can't be straightforwardly "offset": the full cost, once you include everything they break, is far higher than a first-order analysis suggests.

✧ Limitations
  • Radically sensitive to empirical estimates of second-order effects. Two people using this exact framework could disagree wildly just by having different estimates of how strongly norms propagate.
  • "Break the rule if stakes are high enough" requires a threshold that can't be precisely specified. The framework is underdetermined at exactly the moments when it matters most.
  • In practice, robust defaults do most of the work. The consequentialism is often doing less active work than it appears.

Key Axioms

Positions held firmly but not fully justifiable:

  • Suffering is bad. Close to definitional but still not provable.
  • Pleasure is good. Same status.
  • Death is a large negative in a way that non-existence is not. There's a real asymmetry between a person dying and a person never being born. This is a brute intuition. It's foundational to views on population ethics, xrisk, and probably the interest in cryonics. An important clarification: this is specifically about nonconsensual death. If someone genuinely wants to die after a full life, bodily autonomy should win. The axiom is about the badness of death imposed on someone, not about the obligation to keep living. Nobody gets to decide for someone else that they've had enough. If someone is enjoying their life at any age, it's wrong for them to be killed. And if everyone suddenly died, it would still be bad even if nobody was left to grieve.

The death asymmetry is probably the most load-bearing unjustifiable axiom. It determines population ethics, xrisk prioritisation, and rejection of the repugnant conclusion. If it's wrong, large parts of the framework shift.

It hasn't been fully explored whether this can be grounded in something more principled. It's marked as "brute intuition, moving on," which is honest but underexplored for something this foundational.

Population Ethics

She leans toward person-affecting views, though this is one of the areas where she's least confident. The general shape: fewer people living extraordinary lives seems preferable to many people living mediocre ones, and she rejects the repugnant conclusion but doesn't have a rigorous alternative. The concrete intuition pump: one person living 1000 years seems better than 10 people living 100 years in series. Creating new lives is probably not morally equivalent to improving existing ones. But these are impressions more than positions.

She acknowledges that longtermist views are probably more correct on the margin and that she leans more person-affecting than EA orthodoxy, which tends to see person-affecting views as incompatible with longtermism. She could think about it more.

There's also a far-future limit problem: population growth is exponential while galactic expansion is at most cubic, so eventually you'd need to either reduce everyone's quality of life (only possible finitely), restrict birth rates, or kill people already alive. She thinks restricting birth rates is clearly less bad than killing existing people who want to keep living. But this is a theoretical constraint, not a practical one. For the foreseeable human-scale future, we have a whole planet and solar system and galaxy of resources, so neither restriction is needed.

✧ Limitations
  • This is one of the least developed parts of the framework. Positions here are more tentative than elsewhere.
  • Person-affecting intuitions may be in tension with caring about future generations at all. If non-existent people don't have moral weight, the case for xrisk work weakens.
  • The "one person living 1000 years" intuition partly rests on a practical observation: in the limit of galaxy-scale populations over galaxy-scale timespans, people's lives probably aren't that novel anyway. A new person would likely end up living a fairly similar life to the one they'd replace. But if that empirical bet is wrong and new people really would have radically different experiences, the intuition gets weaker.
  • The consequentialism lacks a well-defined social welfare function, which is kind of the thing it needs most.
Red team / blue team on this section

Autonomy & Paternalism

Autonomy is instrumentally valuable, not intrinsically so. People have somewhat different utility functions, which gives practical reason to allow freedom on close calls. But she'd override people's choices if the consequences were clearly better and the implementation risks manageable. Society is currently too biased toward preventing visible harms at the expense of less visible ones.

Some concrete policy views:

  • Triage-based healthcare
  • Open immigration
  • High alcohol taxes with revenue directed at harm reduction
  • Forced transition to public transit & autonomous vehicles
  • Compulsory organ donation (with cryonics caveats)
  • Citizen assemblies for informed deliberation
On immigration specifically

In practice, "open immigration" probably means something like special economic zones with fewer initial rights for immigrants, which isn't truly unlimited. But the Copenhagen interpretation of ethics applies here: people prefer to leave others in complete poverty rather than interact with them and provide a better but imperfect standard of living. The downgrade to existing citizens is probably much smaller than the upgrade to immigrants' lives, and that's a cost worth bearing. Yes, it could create social problems, but the alternative is leaving people in far worse conditions entirely.

✧ Limitations
  • Tolerance for paternalism is bounded only by empirical questions about implementation. If the empirics supported it, the framework endorses very aggressive intervention, relying entirely on "authoritarians usually get it wrong in practice" as a safety rail.
  • Rights and autonomy are framed as a "bounded term in other people's utility functions." If most people don't actually value autonomy that much, this framework could endorse stripping it in ways that feel deeply wrong.
Red team / blue team on autonomy & governance

Governance

More anti-authoritarian than pro-authoritarian overall, but without principled loyalty to any system. Switzerland shows democracy can be excellent; Singapore shows authoritarianism can also work. The mechanism matters less than whether decisions are made seriously and with good information, and building that deliberative culture is very hard and slow. A sufficiently benevolent dictator would be fine in principle, but this rarely exists in practice.

There's no principled objection to dictatorship here. The only defence of democratic institutions is empirical. A sufficiently persuasive argument for a specific authoritarian intervention should, by these lights, be compelling.

"Building deliberative culture is hard and slow" is in tension with short AI timelines. There may not be time to build the governance infrastructure the framework needs.

AI

Pro most individual uses of AI. If you could get the benefits without the existential risks, it would be clearly good. But the aggregate trajectory is probably net negative because of xrisk, gradual disempowerment, s-risk, and resource-curse dynamics. Short timelines, fairly hopeless outlook, still trying.

She actively works on AI safety projects that are probably marginally good. She talks to people working on control, gradual disempowerment, and policy, looking for places to contribute more meaningfully. She hasn't found an obvious high-leverage place where additional effort clearly helps, and money doesn't seem to be the binding constraint. The emotional weight of this is significant. She's tried dedicating herself fully and it didn't work; she burned out without being more productive.

Feeling like she could be doing better, while recognising that this feeling probably doesn't account for what she can actually do.

The current equilibrium, working on safety while staying functional and accepting the gap between what she wishes she could do and what she can actually do, is not comfortable. It's what she arrived at after trying harder approaches and finding they didn't work.

✧ Limitations
  • Techno-optimism elsewhere (cryonics, AVs, LLMs for value aggregation) sits in tension with believing AI will probably go badly. Many "technology will fix this" answers depend on the same AI development she thinks is likely net negative.
  • "Can't find where marginal effort helps" is unfalsifiable from the inside. Hard to distinguish from "haven't looked hard enough."
Red team / blue team on the AI views

Veganism & Practical Commitments

She's vegan. The case is clear: personal taste enjoyment is vastly outweighed by animal suffering. But the deeper reason to maintain it as a strict commitment rather than case-by-case is multi-layered: consistency is easier than constant recalculation, the main value is in market signaling and infrastructure-building (every consistent vegan makes the next vegan's life easier), and she values being someone who keeps commitments.

On offsetting: she thinks it's probably fine in principle if you spend enough. The issue is that the real price is much higher than a naive calculation suggests, because it needs to include the counterfactual loss of all the second-order network effects (infrastructure-building, normalisation, making accommodations easier for the next person). It's not that harms are metaphysically non-fungible; it's that the full cost, once you account for norm erosion and coordination effects, is way more than just the direct suffering. This is the same logic that applies to why you can't just "offset" a murder by saving a life elsewhere: the individual act might cancel out on first-order accounting, but you're also breaking norms and imposing defensive costs on everyone. In practice, people who say they'll offset usually just don't.

There's a genuine tension she holds here, and she's honest about the contradiction. On one hand, she thinks badness is finite and goodness one can do scales (locally) roughly linearly with money/effort, so in theory any harm could be offset if you spent enough. On the other hand, if someone commits murder but saves 10 people from dying elsewhere, it still feels like they have done something wrong, even if the direct accounting works out. That's probably more emotional than logical, and the two beliefs do kind of contradict each other. Her practical resolution is "why can't you just do both good things instead of doing harm and then offsetting?" and a preference for strategies that generalise to "what if everyone did this." But she doesn't pretend the tension is fully resolved.

Relatedly, she disagrees with what's sometimes called the "Copenhagen interpretation of ethics," the intuition that interacting with a problem makes you more responsible for it than ignoring it entirely. She thinks there is some moral difference between action and inaction, but much less than most people feel. Not zero, but people massively overweight it.

She outsources emotionally costly tasks (like dealing with a mouse infestation) when the outcome is the same, managing her own psychology as a resource.

The infrastructure argument means the veganism is partially dependent on being visible and socially legible. If she were the last vegan on earth, the network-effects case collapses.

The "some harms feel uncancellable" intuition might be doing real work that the framework can't fully explain. Someone who wants to commit a terrible act but offers to fully offset all harm including norm damage would still seem like someone you'd view with suspicion. Whether that's just instrumental (their revealed preferences tell you something bad about them) or whether virtue ethics is doing genuine work beyond consequentialism is an open question.

The mouse outsourcing reveals that her emotional engagement with harm is somewhat aesthetic: she doesn't want to be the one doing it, even when she endorses it being done.

Red team / blue team on veganism, giving & motivation

Honesty & Lying

She really dislikes lying, on a personal level. It's stressful and she finds it hard to do. But the framework view is more nuanced than "never lie." Lying is bad as a general policy, and worse when it actively harms people, but there are cases where it's fine or even correct.

With outgroup or adversarial contexts: if someone asks a stupid question where the literal answer isn't what they actually want to know, or if there's a murderer at your door, lying is straightforwardly fine. (Arguably, answering the question someone meant to ask rather than what they literally asked isn't really lying at all, since there's no intent to deceive.) The obligation to be honest doesn't extend to people who would use truth against you or others. There are also bureaucratic contexts where the system effectively expects you to lie on forms, and you probably should.

With people she trusts and shares values with: lying in games or playful contexts where it's understood and agreed upon is fine. Outside of that, honesty is the strong default. There are maybe some edge cases where a lie is acceptable, but ideally you should be able to come clean afterwards. If a lie needs to stay hidden permanently from someone you're close to, that's a bad sign.

She also thinks lying is more of a continuum than a binary. Lying by omission is often a kind of lying, though it really depends on the specifics: what was the reasonable expectation of disclosure, what relationship are you in, what's the cost of sharing. It's not always dishonest to leave things unsaid, but it often is, and people are too quick to tell themselves it doesn't count.

The personal dislike of lying is doing a lot of the motivational work here, more than the framework analysis. If she didn't find it stressful she might be more permissive in practice, even if the policy-level view stayed the same.

The outgroup/ingroup distinction relies on judgements about who counts as adversarial and which questions are "stupid," which could slide in self-serving directions.

Motivation & Self-Knowledge

Mostly driven by self-image: wanting to feel like a good person who does good things. But her self-image includes wanting to be correct, which makes it partially self-correcting rather than purely self-serving. If evidence showed her commitments were wrong, her identity would push her toward updating. The vanity and the epistemics pull in the same direction, which is a lucky feature of her psychology.

Her moral feelings are something she values as a personal preference, almost aesthetically, not as a universal requirement.
✧ Limitations
  • "I want to be correct" is something everyone thinks about themselves. The real test is whether she updates when being correct is costly to her identity.
  • Her self-image optimises for "someone who is notably thoughtful" rather than "someone who actually follows her framework to its conclusions." Those are different targets.
  • The "I'm kind of selfish" honesty may itself be a form of self-flattery: it sounds bracingly honest and conveniently lets her off the hook for doing more.

Giving & Personal Sacrifice

She works in AI safety, so in a sense the vast majority of her resources go toward her highest-priority cause just by virtue of what she does for a living. The remaining 10% of income she donates, mostly to animal welfare and global health. This is deliberate diversification: she doesn't want 100% of her impact concentrated in one cause area given moral uncertainty. It amounts to a portfolio approach, most resources to the highest-priority thing through labour, some to other important things as a hedge.

She thinks 10% has value as a Schelling point, a legible coordination norm that loses value if people anchor below it. And if AIs learn from or model human norms, establishing a robust cultural norm of "intelligent agents give 10% to reduce suffering" might matter enormously in a post-transformative-AI world.

✧ Limitations
  • The specific split isn't derived from any principled calculation. It's roughly "what my job pays me" and "the standard EA giving norm."
  • The Schelling point argument justifies 10% as a minimum, not a ceiling. She admits she could donate more and mostly chooses not to, for selfish reasons.
  • The ETG-for-AIs argument is creative but unfalsifiable enough that it could be a post-hoc justification for a giving level she'd have chosen anyway.

Social Environment & Epistemic Independence

Her views are broadly aligned with the EA/rationalist community, but the causal story is nuanced. She formed key positions (veganism, cryonics) before encountering EA. The alignment is partly a selection effect: she was already the kind of person who'd end up there. Later views on xrisk prioritisation and giving norms are more plausibly downstream of EA circles, which she acknowledges.

She thinks EA should focus more on policy now that it's a bigger movement. It has outgrown its small-movement playbook of optimising individual donations and should be building political power. This is probably her strongest object-level divergence from EA consensus.

✧ Limitations
  • The selection effect defence weakens but doesn't eliminate the social environment concern. How much independent reasoning she's doing on positions adopted after joining EA remains open.
  • EA as a methodology probably has systematic blind spots around things that are hard to quantify: community cohesion, cultural value, institutional trust, dignity.
  • Many problems EA cares about most are fundamentally collective action and political problems, not donation problems. EA's instinct toward optimal individual actions may systematically underweight political approaches.

The Thing Underneath All of It

A person who thinks carefully, holds her beliefs with genuine conviction but also genuine humility, maintains commitments even when motivation fades, and is doing her best in a world she thinks is probably about to change dramatically in ways nobody is adequately prepared for. She works on trying to make that change go well, hasn't found the high-leverage path she's looking for, and carries the emotional weight of that gap between what she wishes she could do and what she can actually do. The framework is real and it does genuine work, but the load-bearing structure is a person trying to be decent and useful while suspecting it isn't enough, and who has chosen to keep going anyway rather than give in to either the nihilism or the paralysis.

Framework-Level Limitations

Beyond the section-specific limitations above, there are structural concerns with the framework as a whole:

Missing domains
  • Special obligations to family, friends, promises. The honest account is mostly selfish: she personally benefits from the people around her being happy, and caring more about her immediate circle than distant strangers is not really a principled moral position so much as a fact about her preferences. The framework doesn't have a deeper story for why these obligations might deserve extra weight.
  • Justice and fairness. Treated as instrumental. People probably enjoy fairness directly, get real pleasure from it, but in this framework that makes it a type of pleasure rather than a separate moral category. Whether that's reductive or just honest is an open question.
  • Environmental ethics. No account of whether ecosystems or species have value beyond their effects on sentient beings. Hasn't thought about it much.
  • Aesthetics and meaning. The framework would say these are types of pleasure or satisfaction, not a separate category. A world with art and beauty is better because people experience it as better, not because beauty has mind-independent value. This might be right, but it also might be doing too much work with a single concept.
Structural tensions
  • Short timelines vs long-term infrastructure. Many of the governance and norm-building solutions require decades, but the AI timeline may not allow that.
  • Empirical dependency. Almost every hard question is answered with "it depends on the empirics," which is honest but means the framework rarely gives determinate answers on contested questions.
  • Conversation-shaped. This framework was extracted from a conversation, which means it's biased toward what was asked about. The most important things may be what wasn't discussed.
ai gen