A comprehensive mapping of the technical alignment research ecosystem, tracking key researchers, institutions, and methodological approaches shaping the field of AI safety.
Methods for humans to effectively supervise AI systems that may exceed human-level performance on specific tasks.
Read MoreReverse-engineering neural networks to understand the computational mechanisms underlying model behavior.
Read MoreEnsuring safety properties hold under distribution shift, adversarial attack, and novel deployment contexts.
Read MoreApproaches for AI systems to learn and represent human values, preferences, and normative reasoning.
Read MoreInstitutional design, regulatory frameworks, and coordination mechanisms for responsible AI development.
Read MoreStandardized testing methodologies for measuring alignment properties, dangerous capabilities, and safety margins.
Read More