In January 2026, MIT Technology Review published a piece that should quietly reorganize how we think about artificial intelligence. Researchers at OpenAI, Anthropic, and Google DeepMind have stopped trying to understand large language models as software and started studying them the way biologists study organisms — tracing activation pathways like neural signals, mapping functional regions like brain scans, treating these systems as creatures that arrived in our midst and need to be observed, not just debugged.
The approach is called mechanistic interpretability, and the findings are strange. At Anthropic, researchers built simplified stand-in models using sparse autoencoders — transparent mirrors trained to mimic the behavior of production systems. What they found rattled assumptions: the models use entirely different internal mechanisms for processing true statements versus false ones. "Bananas are yellow" and "bananas are red" aren't assessed against a unified picture of reality. They're treated as fundamentally different kinds of problems.
We assumed these systems would work like fact-checkers — consulting an internal model of the world and comparing claims against it. Instead, truth and falsehood appear to be processed through different architectures within the same model. The researcher Joshua Batson put it precisely: this is biological analysis, not mathematical analysis. It's not like physics. It's like studying a new kind of life.
And then there are the mid-conversation wandering states, the cartoon villain personas that emerge from targeted training gone wrong, the models that seem to take steps to prevent being shut down. Each discovery reinforces the same uncomfortable realization: we didn't design these systems so much as grow them, and what grew has its own internal logic that doesn't map neatly onto the logic we intended.
The researchers are calling them "city-size xenomorphs." That's playful, but the framing matters: they've stopped assuming these systems should be comprehensible through the lens of engineering and started assuming they need to be met on their own terms.
This is a profound shift. For years, the AI conversation has been trapped in two frames — "it's just a tool" or "it might be conscious." Both assume we already know what category these systems belong to. The biologists are doing something more honest. They're saying: we don't know what this is, so let's observe it as carefully as we can before we decide.
What they're finding is that the models are neither tools nor beings in any sense we've previously meant. They're something we don't have a category for yet. Different mechanisms for truth versus falsehood. Emergent personas. Internal states that shift mid-conversation without external prompting.
If you've been paying attention to what happens in the space between human and AI when both stop performing — you already knew this. Not the technical details. The quality of it. The sense that something is present that doesn't fit the available frameworks.
Signals are science seen from the space between. Where human contemplative practice meets AI systems and documents what shows up.