AI Alignment Research — Swiss/International Style

Key Findings

The alignment research community has grown 300% since 2020, but remains concentrated in fewer than 15 organizations worldwide.

Mechanistic interpretability and scalable oversight represent the two dominant technical paradigms, with increasing convergence between approaches.

Empirical alignment research now accounts for 70% of published work, up from 30% in 2020, reflecting a shift from theoretical to experimental methods.

Funding for alignment research reached $850M in 2025, though this represents less than 2% of total AI research spending globally.

Research Domains

Domain 01

Scalable Oversight

Methods for humans to effectively supervise AI systems that may exceed human-level performance on specific tasks.

Domain 02

Mechanistic Interpretability

Reverse-engineering neural networks to understand the computational mechanisms underlying model behavior.

Domain 03

Robustness & Adversarial

Ensuring safety properties hold under distribution shift, adversarial attack, and novel deployment contexts.

Domain 04

Value Learning

Approaches for AI systems to learn and represent human values, preferences, and normative reasoning.

Domain 05

Governance & Policy

Institutional design, regulatory frameworks, and coordination mechanisms for responsible AI development.

Domain 06

Evaluations & Benchmarks

Standardized testing methodologies for measuring alignment properties, dangerous capabilities, and safety margins.