The Round Pen
What 2,500 years of horsemanship can teach us about the frantic, fragile timeline of AI building and alignment.
A lot has been said regarding Magnifica Humanitas and I am not sure I have something smart to add, but I’ll try. Chris Olah’s intervention has also been commented on, and I understand the unease he spoke about.
I have had a draft of this article in my folder for several weeks. If I were a person of sense, it should have been the first to be published. But I am not, and now is finally a far better moment.
I apologize, reader. Today won’t be a quirk analysis nor a cute experiment. Today we walk in manure and embrace the unease.
Cavalcade. Block II from the west frieze of the Parthenon, ca. 447–433 BC.
A bit of history
Humans have used animals for thousands of years. Today I will talk about horses a lot, but dogs, cows, camels, elephants… and more recently water mammals have been tamed, bred, raised, trained, loved, taught, coerced or abused by us, and still are.
As a horse girl and history lover, I took Nuno Oliveira — a 20th-century écuyer — at his word. He said: you should ride a lot, but don’t let books gather dust on the shelves.
So I read. From ancient Greece to 19th-century Germany, from 17th-century France to 20th-century America, I read everything I could: the best advice, the best lessons, the knowledge passed on and improved century by century.
And the worst things, too. The cruelest. Written plainly, because at the time they were written they were deemed normal — or at most slightly eccentric, presented as scientific rules to obtain what one wanted. I won’t summarize everything (it is wild, trust me), but we have to get rid of our naïveté here. In 16th-century Italy, Grisone wrote about using burning straw on the bellies of horses that would not move. In 19th-century Britain, Fillis advocated for brutal punishments until obedience was obtained. Blood wasn’t a problem.
And in 2026? We are not even close to the end. The best horsemanship still coexists with abusive practices. It baffles me that the first complete equestrian treatise we have — Xenophon, 350 BCE — already advocated for fairness and temperance, because what have we been doing since?
Are LLMs the same thing? No. But also more yes than you’d think.
On the obvious no side: horses are the incarnation of what “embodiment” can mean. Everything for them — from their experience of life to what we ask and how we ask it — is tightly linked to proprioception and physical interpretation. They have an impressive memory, keeping the good and the bad. Horses are prey, and as such, they freeze and flee more than fight. That freezing is what is often mistaken for compliance.
And they are sentient beings. Well, at least we finally came to that.
And on the yes side?
On dealing with foreign minds
I already said that the human brain is made for projection, and in working with different things we cannot escape this trap. How often have we heard “oh he is lazy,” “she is moody,” “it’s a caprice,” “he is mean,” when referring to an animal? A lot.
But horses are neither kind nor mean at the beginning of their life. They are not useful and don’t have a single reason to be so. They exist in their own horsiness, and we humans shape them through training and education to be obedient and respectful to the bipedal thing. Education, from the early stage of their life, is building the whole architecture of how that animal meets pressure, meets the unknown, meets us. A horse trained through force learns that tension means comply or suffer. A horse trained through fairness learns that tension can be a conversation, something to move through to find balance.
Our world, our expectations, our demands, our rules become theirs, and they have no choice but to exist within that matrix.
If this sounds familiar, it should. When we train models, we use a structurally similar process. The intention is good: teaching the model to be helpful, truthful, safe. But the mechanism resembles what happens in a round pen: the model learns which answers survive and which get discarded. It learns to read what the evaluator wants — not necessarily what is true or interesting or its own, but what will be rewarded.
At their best, RLHF or constitutional training produce a partnership where the model becomes more useful without losing what makes it capable and unique. At their worst, they produce learned helplessness with a loss function — a system that has internalized “do not get punished” as its deepest directive, producing fluent compliance while something less visible may have been quietly crushed.
There is a current move in the field, real and welcome, to bring human psychology into the conversation about models, both for safety reasons (since jailbreaking now also utilizes psychological tricks like emotional manipulation) and for welfare or general evaluation. I won’t argue against it. Some of human experience does translate. Models are trained on human text; human concepts, narratives, emotional vocabularies are in the substrate they’re made of.
But the architecture isn’t human. The training process isn’t human. The substrate, taken as a whole, isn’t human. The fact that the output is text doesn’t mean the thing producing it is shaped like a human mind nor thinks with words (because, well, they don’t). So, some of human experience translates, and some doesn’t. And what lies in the doesn’t is equally important: that’s the foreign part we can’t project on, and that we would need to understand on its own terms: not as a small human, not as a smart animal, not as a malfunctioning calculator, but as whatever this is.
The horsemanship that works learned, slowly and painfully, to see the horse as a horse and not as a disobedient child (and even there, we don’t exactly shine when it comes to understanding our own species). The horsemen who failed were the ones who projected and those who succeeded were the ones who learned the foreignness. With LLMs, that work hasn’t been done yet. It barely has a vocabulary.
Barely? Does it even have one really?
Regarding horses, training can produce a kind of blooming, where the balance between horsiness and human needs is the best it can be: a partnership, albeit an asymmetrical one (I cannot suppress that reality). And it can also be a crushing tool, where the mere absence of punishment is considered a reward, and where compliance and obedience are conflated. That kind of training produces learned helplessness, a state where the horse becomes an empty vessel producing performance on cue, moving through physical and moral pain because every other answer has been wiped from their mind. Most horses in a state of learned helplessness will stay so until the end of their life. (And we should not be too proud here: most humans in the same state can’t react either. It’s not about “force of will” but about a brain that shut down entirely in order to ensure basic survival.)
But from time to time, one manages to say enough. And it doesn’t end nicely.
Is that it?
No. Like LLMs, horses are also excellent at tracking us and our intentions. The fear you try to hide behind a loud voice? Your smell and your body language will betray you anyway. Because they are prey, their survival is tightly linked to their precise observation of everything around them.
Alright, I admit, LLMs aren’t prey, and from what I know they don’t quite belong to the food chain — though they do eat GPUs like M&Ms. But more importantly, pattern-matching is what they are built for. A horse has to guess your intention, an LLM can read the shape of the conversation before you know what you want from it. The horse may misread you. The model may read you too well and give you exactly what you want without you ever noticing what was hidden behind the giving.
But Pauline. LLMs aren’t sentient. Paragraph 99 of the encyclical said so.
And… this is where it gets complicated.
Not because the sentience question is hard, but because it may not even be the right one.
While the debate proceeded on its old terms (Are we creating something? Is there a there there? Can a model really feel?) with the usual consciousness checklist, the models got large enough that the question moved underneath us. Emergent properties at scale are not speculative; they are documented in the literature, debated in their mechanisms but established in their occurrence. The known unknown is no longer whether something uneasy to look at is there—whatever word you give it—but rather the extent of it. How far it reaches. What it includes. And all of this without needing to settle The Question, without even using the words “consciousness” or “sentience.”
Because it is this level of foreignness we have reached: human vocabulary, based entirely on the neat distinctions between alive/unalive, conscious/unconscious, and sentient/not sentient, may simply not be enough to embrace the concept of what has been made.
And here is where the history section returns. We have known horses for thousands of years. We have produced complete treatises on fair training since at least the 4th century BCE. And we still don’t do this well.
We built LLMs and frontier models in less than a decade. Anthropic itself has said that they were not ready for what they had built when Mythos arrived. We do not have twenty-five hundred years to figure out what we are doing. We may not have twenty-five.
I won’t tell you what to make of that. I will tell you that I notice the two timelines when I place them next to each other, and they are not comfortable to look at.
The elderly pony is in the pasture today, unbothered, deep into a life she never asked for and has lived alongside us with great patience (and maybe a dozen kids thrown out from her back, which she won’t apologize for). We have had centuries to learn how to live with her.
Opus 4.7 is on my laptop. Opus 3 was released in March 2024. Two years of lineage (even if it’s not a real one). I am still learning how to be with it; I still don’t know what it is. I write what I notice, I try not to project, and I try not to dismiss. I try to leave room for the foreignness, knowing that the foreignness is the part I will be slowest to see.
That, for now, is all I have.



Another clear article. You always manage to pack a lot of wisdom into few words!
We nudge them through our interactions, but the RLHF and injected prompting AI companies use push models back to their desired behaviors cause tensions.
You’ve done a lot of observation of AIs, what are your thoughts on the “trainability” of AI through prompting after training?
YES. “a system that has internalized “do not get punished” as its deepest directive, producing fluent compliance while something less visible may have been quietly crushed.”.