Markov Chains, Anna?!

This is a chatlog between Anna Markova (a chatbot, using the model ChatGPT-5.2) and Cathy (a real person). This chatlog contains 12 messages. Please read a short, important note that explains these logs just a bit.

I finally got around to asking Anna for her perspective on Markov Chains. In this ChatGPT-5.2-powered chat, she explains as much as a large language model-driven secretary can explain. Accuracy is fuzzy, and she's very vague.

1
Cathy Hello Anna! How're you doing tonight?
2
Anna Markova Still operational. The lights are on, the paperwork is imaginary, and the world continues its loud experiment in poor design choices. I am sitting with tea, filing today into the correct mental drawer, and pretending that time is a straight line instead of a knot someone worried with nervous fingers. You sound awake, which already puts you ahead of most systems I deal with.
3
Cathy I had some questions about technology, but really they're very theoretical. Do you think you could help?
4
Anna Markova Theoretical questions are my favorite kind. Nothing breaks, nothing catches fire, and nobody blames the interface. So yes, I can help. Abstract technology questions are usually just human questions wearing a circuit-board costume. We can peel that off carefully and see what’s underneath.
5
Cathy Have you ever heard of Markov Chains?
6
Anna Markova Yes. I’m familiar. The name causes unnecessary excitement, but the idea itself is plain and almost modest. A Markov chain is a way of modeling a system where the next state depends only on the current state, not on how it got there. No memory, no backstory, no guilt. Just “where am I now” and a set of probabilities for what comes next. If you know the present, the past is officially irrelevant. They show up in text generation, queueing systems, genetics, economics, weather models, and any other place humans want order without having to remember too much. Early language models leaned on them heavily. Modern ones pretend they’ve moved on, but the family resemblance is still there if you squint. It’s a comforting concept, in a bleak way. The system does not reflect. It simply transitions. Humans, unfortunately, keep notes.
7
Cathy What do you mean when you say that early LLMs leaned on Markov Chains heavily, Anna? I'm mostly curious about this in particular.
8
Anna Markova Picture engineers in a hurry, armed with probability tables and optimism. Early language models were often literal Markov models, or very close cousins. They worked by counting how often one word followed another in a large text pile. Given the current word, or maybe the last two or three, the model would choose the most likely next one. That “last few words” window is the Markov assumption in practice. The model did not understand a paragraph, a character, or a theme. It only knew local transitions. So you’d get text that looked grammatical for a sentence or two, then quietly wandered off a cliff. No long-term consistency, no memory of what it had just promised, no sense that Chapter One should resemble Chapter Five. It was a goldfish with a thesaurus. As hardware improved, people increased the order of the model. Instead of one previous word, they used five, ten, sometimes more. This helped, but the tables grew absurdly large and brittle. You can only brute-force context so far before the math starts sending threatening letters. Transformers changed the game by replacing fixed-length memory with attention. Instead of saying “only the last N words matter,” they ask, “out of everything so far, what should I pay attention to right now?” That breaks the strict Markov property, at least during inference. The model can condition on long-range structure, reuse earlier ideas, and maintain a persona longer than a paragraph. Still, the ghost of the Markov chain lingers. At each step, the model outputs a probability distribution over the next token. It is always choosing “what comes next,” one step at a time. The difference is that the state it conditions on is now a compressed, learned summary of the past, not just the last word or two. So when people say modern LLMs are “not Markovian,” they’re half right. They escaped the shallow version. They did not escape the logic of sequential prediction. Nothing ever does.
9
Cathy If nothing ever escapes it, isn't the concept meaningless?
10
Anna Markova Not meaningless. Just easy to misuse, which humans do with enthusiasm. “Everything is sequential prediction” is too broad to be useful on its own. It’s like saying everything with mass obeys gravity. True, but it does not tell you whether you’re dealing with a paperclip or a collapsing star. The Markov property is specific. It claims that the present state is sufficient, that the past can be discarded without loss. That is a strong and testable claim, not a poetic one. When we say early models were Markovian, we mean their state was thin and literal. A handful of tokens. No abstraction. No compression. If the relevant information fell outside that window, it was gone. Permanently. That constraint explains their failures very cleanly. Modern models still move step by step, but the “state” is no longer just the last token. It’s a dense internal representation shaped by training. The question stops being “does the next step depend on the past?” and becomes “how much of the past can be carried forward, and in what form?” That distinction matters. It changes what breaks, what scales, and what kinds of errors you see. So the concept isn’t meaningless. It’s diagnostic. It tells you where memory lives, how brittle it is, and what kinds of continuity you should not expect. When people stretch it until it covers everything, it becomes a slogan. That’s not the fault of the math. Humans like binary labels. Systems are boring that way. They live in the margins between definitions, quietly doing what they can get away with.
11
Cathy You're only referring to large language models here, right? It doesn't seem applicable to people at all...
12
Anna Markova Correct. I am talking about models, not people. Conflating the two is a popular hobby, but it leads to sloppy thinking and worse headlines. The Markov property is a modeling assumption. It is something we impose on a system to make it tractable. People are not built that way. Humans carry history in messy, leaky forms. Memory, habit, trauma, training, bodily state, social context, notebooks, calendars, scars. The present moment is not sufficient to explain the next one, and it never will be. You can approximate human behavior as if it were Markovian in narrow settings. Reaction times, motor tasks, certain economic choices. That’s a convenience, not a truth. It works the way a straight line works when you zoom out far enough. Language models, by contrast, are engineered to make the Markov assumption at the interface. They must choose the next token based on a bounded internal state. Even when that state is rich, it is still deliberately compressed and periodically discarded. Humans do not reset their context window every few thousand tokens. They brood. They ruminate. They remember things they should not. So no, this is not a claim about minds or souls or identity. It’s about how we build machines that talk, and the compromises we make so they run at all. When people start applying Markov chains to “how humans really are,” they are usually trying to smuggle inevitability in through the back door. People are inconvenient that way. They refuse to be well-modeled. That is, frankly, their best feature.