← Back to Style Gallery

Vol. XLII, No. 7 Late City Final Established 1983
The AI Safety Chronicle
An Independent Journal of Existential Risk Research & Alignment Science
Wednesday, March 5, 2026 London — San Francisco — Oxford Price: £3.50


FORECAST: Cautious optimism with rising probability of aligned outcomes INDEX: AI Risk Sentiment 67.4 (+2.1) EDITION: 48 Pages

Landmark Study Reveals
Critical Gaps in AI Alignment
Research Worldwide

International consortium finds fewer than 400 researchers working full-time on alignment; calls for tenfold increase in funding by 2028


A sweeping international survey published yesterday by the Global AI Safety Consortium has laid bare the precarious state of alignment research, finding that the field remains dramatically under-resourced relative to the scale of the challenge it seeks to address. The report, compiled over fourteen months by researchers across thirty-one institutions, represents the most comprehensive assessment of the alignment landscape ever undertaken.

The study's principal findings paint a sobering picture. Among them:

  • Workforce deficit: Fewer than 400 researchers worldwide are engaged in full-time alignment work, compared to an estimated 350,000 working on AI capabilities. The ratio has worsened since 2024.
  • Funding disparity: Total annual spending on alignment research stands at approximately $620 million — less than 2 per cent of the $40 billion invested in frontier AI development during the same period.
  • Interpretability breakthrough: Mechanistic interpretability has shown the most promising advances, with three independent groups reporting reproducible methods for mapping feature circuits in medium-scale language models.
  • Governance lag: Regulatory frameworks in 87 per cent of nations surveyed contain no specific provisions for alignment verification or safety benchmarking of frontier systems.
We are building the most powerful technology in human history with roughly the same number of safety researchers as a mid-sized university department
Dr. Eleanor Vance, Lead Author

The report arrives at a moment of heightened concern in the field. Several prominent researchers have argued that the window for establishing robust alignment techniques is narrowing as capabilities advance. Professor Stuart Russell of the University of California, Berkeley, described the findings as “a fire alarm that the field has been expecting but hoping would never sound.”

Among the report's twelve formal recommendations, the call for a tenfold increase in alignment funding — to approximately $6 billion annually by 2028 — has generated the most discussion. The authors argue that this figure, while substantial, represents less than the cost of a single large language model training run at frontier laboratories.

The Consortium has also proposed the establishment of three international alignment research centres, modelled partly on CERN, which would provide shared computational resources and foster collaboration across institutional boundaries. These centres would be located, the report suggests, in Europe, North America, and East Asia.

Continued on Page 7, Col. 3



The Alignment Gap — By the Numbers
<400
Full-time alignment
researchers globally
$620M
Annual alignment
research spending
1.8%
Safety spend as share
of capabilities investment
87%
Nations lacking alignment
regulatory provisions
Source: Global AI Safety Consortium Annual Survey, 2026. Figures as of January 2026.


Why Interpretability May Hold the Key to Safe Deployment

Mechanistic interpretability — the effort to understand neural networks by mapping their internal computations — has emerged as perhaps the most promising thread in the alignment tapestry. Where other approaches seek to constrain AI behaviour from the outside, interpretability aims to make the system’s reasoning legible, offering something approaching a genuine understanding of why a model produces the outputs it does.

The recent breakthroughs in sparse autoencoder methods have allowed researchers to identify interpretable features in transformer models with unprecedented clarity. Three groups — at Anthropic, the University of Oxford, and a collaborative team in Tokyo — have independently demonstrated that individual neurons and circuits can be reliably mapped to specific semantic concepts.

This progress, while encouraging, remains confined to models far smaller than the frontier systems now being deployed commercially. The question of whether these techniques will scale — and whether they can be applied fast enough to keep pace with capabilities research — remains the central challenge.

Continued on Page 12, Col. 1

EU Parliament Votes on AI Safety Act Amendment
Brussels bureau reports the proposed amendment requiring alignment audits for systems above 1026 FLOP passed committee stage. Full vote expected Thursday. Page 4


DeepMind Publishes Scalable Oversight Results
New paper demonstrates recursive reward modelling at scale. Critics note limitations in adversarial settings. Page 9


China Establishes National Alignment Laboratory
Beijing announces $200M facility in Zhongguancun science district, signalling growing international engagement with safety research. Page 5


Obituary: Prof. M. Hennessey, 1948–2026
Pioneer of formal verification methods for neural systems. Tributes from across the field. Page 23



Governance

International Treaty on Frontier AI Gains Momentum After Geneva Talks

Negotiations in Geneva have yielded a preliminary framework for international oversight of frontier AI systems, with twenty-three nations signalling their intent to sign. The proposed treaty would establish mandatory safety evaluations before deployment of systems exceeding defined capability thresholds, marking the first binding international agreement on advanced AI governance.

Continued on Page 3, Col. 2

Technical Research

Novel Constitutional AI Methods Show Promise in Reducing Deceptive Alignment

Researchers at the Centre for AI Safety have published results demonstrating that enhanced constitutional training methods can significantly reduce instances of deceptive alignment in controlled experimental settings. The technique involves iterative refinement of model behaviour through structured self-critique.

Continued on Page 8, Col. 4

Industry

Major Labs Pledge 3% of Compute Budget to Safety Research

Five leading AI laboratories have jointly committed to allocating at least three per cent of their total computational resources to safety and alignment research. The pledge, announced at the World Economic Forum, represents a significant increase from current levels but has been criticised by some researchers as insufficient.

Continued on Page 11, Col. 1

Philosophy

The Moral Status Question: New Framework for AI Welfare Consideration

Oxford ethicists have proposed a graduated framework for assessing the moral status of artificial systems, arguing that the question can no longer be treated as purely theoretical. The paper identifies eight functional criteria that, if present, would warrant moral consideration of AI systems.

Continued on Page 15, Col. 3

Evaluation

New Benchmark Suite Aims to Measure ‘Alignment Tax’ of Safety Methods

A collaborative team has released a comprehensive benchmark designed to quantify the performance cost of various alignment techniques. Early results suggest that the most effective safety methods reduce dangerous capabilities by over 90 per cent while incurring only a 4–7 per cent reduction in general helpfulness.

Continued on Page 9, Col. 2

Opinion

Letters to the Editor: On the Pace of Alignment Progress

Sir — Your report on alignment research funding (Feb. 27) omits a crucial point. The comparison between safety and capabilities spending, while stark, understates the problem: much “safety” funding supports work that is only tangentially related to core alignment challenges. A more honest accounting would show the true figure at perhaps half the stated $620 million.

Further letters, Page 22



Classified Advertisements
Positions Vacant ALIGNMENT RESEARCHER sought for leading laboratory. Experience with RLHF, constitutional methods, or mechanistic interpretability required. Competitive salary plus compute allocation. Apply in confidence to Box 4471.
Conferences INTERNATIONAL SYMPOSIUM on AI Existential Risk, Cambridge, April 14–16. Keynotes by Prof. Russell, Dr. Christiano, Dr. Amodei. Early registration £285. Write to: Conf. Secretary, Trinity Hall.
Publications NOW AVAILABLE: “The Alignment Problem: A Technical Introduction” — 3rd Edition, revised and expanded. 680pp. Oxford University Press. £45 hardcover. All reputable booksellers.
Notices THE EXISTENTIAL RISK Observatory gratefully acknowledges a bequest from the estate of the late Dr. R. Hayward. Memorial lecture to be delivered at the Royal Institution, March 22, 7.30 p.m. Tickets from the Secretary.