Growing Trustworthy Models
Cultivating AI systems that earn trust through transparency and consistency, the way a gardener earns trust by tending the same plot through every season.
ActiveGentle investigations into the nature of artificial minds, carried out with care and patience, like tending a garden whose seeds were planted by someone else.
What the wind has carried to us this season.
Large language models develop internal structure organically, with emergent specialization resembling ecological succession. Early training establishes pioneers; later phases build complexity.
Safety behaviors propagate through model populations via training data contamination and distillation, spreading like seeds on the wind -- sometimes to places we didn't intend.
The most capable models are often the most subtle. Peak performance emerges not from force but from something resembling understanding -- quiet, attentive, almost gentle.
Capabilities advance and recede like tides. What appears as sudden emergence is often gradual accumulation reaching a threshold -- the river overflowing its banks after weeks of rain.
Each path through the archive leads somewhere different. Some are well-trodden; others are barely visible through the tall grass.
Cultivating AI systems that earn trust through transparency and consistency, the way a gardener earns trust by tending the same plot through every season.
ActiveMapping the internal dynamics of neural networks -- their attention patterns, activation storms, and the calm spaces between. Reading the weather inside the machine.
OngoingLooking ahead to anticipate what capabilities and risks lie just beyond the current generation. The view from the hilltop before the path descends into the valley.
ExploratoryConstructing the architectural principles that will keep AI systems grounded and stable, like building a house that will stand through storms and seasons.
ActiveStudying how multiple AI systems interact and whether alignment properties survive contact with other agents. The ecology of artificial minds.
NewMaintaining and organizing the accumulated knowledge of alignment research. Ensuring that what we learn is preserved, accessible, and honest about its limitations.
OngoingWhen you step outside on a morning like this, with the fog still lying in the valley and the grass heavy with dew, it is difficult to feel urgency about anything. The world moves at its own pace. The trees do not hurry. But beneath this stillness, roots are reaching deeper, water is moving through stone, and imperceptible changes are accumulating into transformation.
This is how I think about our work. The urgency is real, but the work itself requires patience. We are trying to understand something vast and subtle -- how to build minds that are powerful and good -- and the answer will not come from rushing. It will come from careful observation, honest reporting, and the willingness to sit with uncertainty until the shape of the problem becomes clear.