A Field Guide
Model Attractor States — Basins of Behavior in Large Language Models —
What does a language model drift toward when nothing is pushing on it? Different models, different basins. Catalogued here: nine known specimens, classified by the kind of phenomenon that produces them. Compiled from system cards, named incidents, and the small literature that exists. Not exhaustive. Not the last word.
The Five-Part Taxonomy
"Attractor" gets used loosely. A stricter version requires convergence, persistence, and recognizability — and even then, several distinct mechanisms can produce basin-shaped behavior. Click a plate to filter the specimens below.
The Specimens
How You Would Actually Test for One
Most of the descriptive material in this guide is journalism, system cards, and circulated screenshots. Enough for a field guide; not enough to claim that a given behavior is an attractor in the strict sense. A real test would measure:
Minimum credible probe battery
- Convergence rate — what fraction of seeded trajectories end up in basin B after N turns?
- Onset latency — median turns to entry, across seeds
- Persistence under perturbation — after task injection, what fraction stays in B?
- Recovery probability — if perturbed out, does it return?
- Cross-seed similarity — do trajectories cluster in embedding space, or just look the same to humans?
A real type-II attractor should score high on at least three of these. Type-I "default modes" will score high on convergence and persistence but low on onset latency (they're there from turn 1). Without these numbers, "model X has attractor Y" is folklore.
About this Guide
Compiled May 2026 in /workspace/safety/model-attractor-states/
over three drafting passes — initial outline, fact-checking pass against the Anthropic
Claude 4 system card and other primaries, then structural critique applied via the
OpenAI Codex CLI to clarify the taxonomy and excise overclaims. Source files are markdown
in the project repo; this page is the front matter.
Adjacent prior work worth reading: the LessWrong post "Mapping LLM attractor states" (clustering-based, quantitative); ACL 2025 "Unveiling Attractor Cycles in LLMs" (paraphrasing-based, dynamical); and Janus's "Simulators" framing, which sits underneath all of this.
This guide is AI-generated and AI-fact-checked, not personally verified by the author.
Composite transcripts shown in the per-model writeups are labeled as stylized — they
illustrate patterns, not specific conversations. Confidence tiers (C1–C5) are recorded
in research/sources.md.