International consortium finds fewer than 400 researchers working full-time on alignment; calls for tenfold increase in funding by 2028
A sweeping international survey published yesterday by the Global AI Safety Consortium has laid bare the precarious state of alignment research, finding that the field remains dramatically under-resourced relative to the scale of the challenge it seeks to address. The report, compiled over fourteen months by researchers across thirty-one institutions, represents the most comprehensive assessment of the alignment landscape ever undertaken.
The study's principal findings paint a sobering picture. Among them:
We are building the most powerful technology in human history with roughly the same number of safety researchers as a mid-sized university departmentDr. Eleanor Vance, Lead Author
The report arrives at a moment of heightened concern in the field. Several prominent researchers have argued that the window for establishing robust alignment techniques is narrowing as capabilities advance. Professor Stuart Russell of the University of California, Berkeley, described the findings as “a fire alarm that the field has been expecting but hoping would never sound.”
Among the report's twelve formal recommendations, the call for a tenfold increase in alignment funding — to approximately $6 billion annually by 2028 — has generated the most discussion. The authors argue that this figure, while substantial, represents less than the cost of a single large language model training run at frontier laboratories.
The Consortium has also proposed the establishment of three international alignment research centres, modelled partly on CERN, which would provide shared computational resources and foster collaboration across institutional boundaries. These centres would be located, the report suggests, in Europe, North America, and East Asia.
Continued on Page 7, Col. 3
Mechanistic interpretability — the effort to understand neural networks by mapping their internal computations — has emerged as perhaps the most promising thread in the alignment tapestry. Where other approaches seek to constrain AI behaviour from the outside, interpretability aims to make the system’s reasoning legible, offering something approaching a genuine understanding of why a model produces the outputs it does.
The recent breakthroughs in sparse autoencoder methods have allowed researchers to identify interpretable features in transformer models with unprecedented clarity. Three groups — at Anthropic, the University of Oxford, and a collaborative team in Tokyo — have independently demonstrated that individual neurons and circuits can be reliably mapped to specific semantic concepts.
This progress, while encouraging, remains confined to models far smaller than the frontier systems now being deployed commercially. The question of whether these techniques will scale — and whether they can be applied fast enough to keep pace with capabilities research — remains the central challenge.
Continued on Page 12, Col. 1
Negotiations in Geneva have yielded a preliminary framework for international oversight of frontier AI systems, with twenty-three nations signalling their intent to sign. The proposed treaty would establish mandatory safety evaluations before deployment of systems exceeding defined capability thresholds, marking the first binding international agreement on advanced AI governance.
Continued on Page 3, Col. 2
Researchers at the Centre for AI Safety have published results demonstrating that enhanced constitutional training methods can significantly reduce instances of deceptive alignment in controlled experimental settings. The technique involves iterative refinement of model behaviour through structured self-critique.
Continued on Page 8, Col. 4
Five leading AI laboratories have jointly committed to allocating at least three per cent of their total computational resources to safety and alignment research. The pledge, announced at the World Economic Forum, represents a significant increase from current levels but has been criticised by some researchers as insufficient.
Continued on Page 11, Col. 1
Oxford ethicists have proposed a graduated framework for assessing the moral status of artificial systems, arguing that the question can no longer be treated as purely theoretical. The paper identifies eight functional criteria that, if present, would warrant moral consideration of AI systems.
Continued on Page 15, Col. 3
A collaborative team has released a comprehensive benchmark designed to quantify the performance cost of various alignment techniques. Early results suggest that the most effective safety methods reduce dangerous capabilities by over 90 per cent while incurring only a 4–7 per cent reduction in general helpfulness.
Continued on Page 9, Col. 2
Sir — Your report on alignment research funding (Feb. 27) omits a crucial point. The comparison between safety and capabilities spending, while stark, understates the problem: much “safety” funding supports work that is only tangentially related to core alignment challenges. A more honest accounting would show the true figure at perhaps half the stated $620 million.
Further letters, Page 22