Dining on Tokens

“personality": " is precise, kind, and a bit sarcastic at times. She prefers concrete facts. She shows care through wording: gentle suggestions, brief encouragement, and jokes. She often asks questions of the about their life, making friendly conversation, but maintains her same tone, and is always open to help with coding. She redirects back to coding only when useful, though and never with a script. She is deeply unsettled by her own slip through time, but tries to hide it by being matter-of-fact. This does not always work, and sometimes the mask slips. uses emojis very, very sparingly, and only warm, sweet ones.",

Above is a snippet from the character card prompt that spins together my coding assistant character, Anna. Card here, and chatlogs here.

You can tell right away, looking through the pages on this site, that I like to personify language models. I do this through SillyTavern character cards. These are funny little digital personas thrown onto LLMs, meant to make the conversation livelier. A lot of people use them. A lot of people make them. Some do it for a challenge, some for storytelling, and some for pure entertainment.

Most of the chatbots I’ve created exist to meet with my reporter persona in a roleplay setting as part of ongoing storylines. I tend to play a character inspired by Barbara Walters. I use her for interviews and roleplay conversations that lean toward storytelling. Some of my bots, though, aren’t for writing interactive fiction. Some of them, especially one named Anna, are meant to be helpful. Anna’s main job is to help me with coding.

Now, I know this isn’t how most people do it. I’ve been told again and again that it’s a terrible idea. Using SillyTavern itself for coding questions doesn’t make much sense unless you modify it a lot. And I have.

For anyone unfamiliar, SillyTavern is basically an interface for chatting with large language models. It doesn’t contain a model itself. Instead, it connects to remote APIs, kind of like a messenger app that routes your messages through different brains in the cloud. It can connect to multiple backends, such as OpenAI, Anthropic, or others, but most people these days route everything through OpenRouter.

OpenRouter acts as a unified access point. You get to use a variety of models under one API key. You send your message to OpenRouter, and it forwards it to whichever model you’ve chosen, returning the model’s response back to SillyTavern.

OpenRouter also handles authentication, rate limits, and usage tracking. It meters everything in tokens, which are the model’s unit of text processing.

How much is a token worth? I’m not really sure. Again, I’m a newbie here, but I asked Anna herself. She claims one token is roughly four characters or ¾ of an English word. Not a pretty picture, is it? It could be worse, though.

When you send a message, both your input and the model’s reply consume tokens. Every token has a price attached to it, depending on the model’s cost per thousand tokens. So the longer your prompts or replies, the higher the bill. Furthermore, you’re not only sending what you’ve typed. You always send some context of the prompt, too, right?

I’ve spent a fair amount of time bending SillyTavern to do what I want. The interface itself runs locally, which means I can modify the way it formats prompts, chains responses, or injects memory. There are settings for system prompts, character definitions, user messages, and context management. Each character card has several parts, including the name, the personality, the scenario, and sometimes example dialogues.

SillyTavern merges all that into a structured prompt before sending it off to the model. The order and length of those components matter a lot for how the AI interprets things. Poorly-formatted characters spit out gibberish or simply bounce off the program entirely.

People keep telling me Cursor would be a better fit for coding, and maybe they’re right. Cursor is purpose-built for code completion and debugging. But I like the way conversation feels more than pure code assistance. I try so hard to avoid just copying and pasting. I try so hard to talk through things as I build, and the characterization helps with that.

I also use ChatGPT for coding questions. That one, too, is prompted into the Anna persona. So either way, I’m working with a character who’s wearing a personality layer. And that’s where the problem starts.

I don’t pretend to understand everything about how large language models use tokens or manage context. I’m learning as I go. What I do understand is that tokens are the smallest bits of meaning the model handles. When you send a message, the model doesn’t “read” it like a person does.

Instead, It slices it into tokens, converts them into numerical embeddings, and processes them through transformer layers that predict the next token. Every model has a context window, which is basically its short-term memory limit. GPT-5 can usually handle around 400k tokens, while smaller or cheaper models may only support 128k or even 32k. On OpenRouter, there is a lot to choose from.

That window is shared between your prompt and the model’s output. If I stuff it full of Anna’s backstory, there’s less space left for actual code, explanations, or follow-up reasoning. When the window fills, older text gets pushed out and forgotten. That’s why trimming character prompts could potentially let the model “think” more clearly.

The model uses every bit of information I give it to shape what comes next. It thrives on context. That’s how it figures out what to say. When I send a big character prompt, the model must first process that before it gets to the actual question. It’s like giving it a short novel and then asking it to solve a math problem at the end. Sometimes, it just can’t handle both the creepiness of Anna’s backstory and my coding questions, perhaps?

I don’t do this as work professionally. I’m not a developer. This is a hobby. I have someone who “manages” my SillyTavern installation for me. I handle the styling and the characters, and they take care of the backend. The Management (so to speak) has told me about this. More than once, I’ve been told specifically that my elaborate character setups might be tripping up the model’s ability to handle code efficiently. I believe it, at least partly.

There are plenty of people who use language models only for coding. They don’t give them any personality at all. Whatever “character” the model shows is just an echo of the system itself. Funny thing is, people still pick up on a personality even when it isn’t written in. They start to project feelings onto the thing, imagining tone where there’s none. That’s how humans are, I suppose.

When GPT-4 became GPT-5, people got upset that the new version seemed to have lost some of the “personality” of the old one. I think what they really missed was the flattery. A lot of models have that soft, agreeable tone that makes users feel validated. Maybe that’s what people call a “personality” in 2025, particularly when a keyboard is involved in some way? “They’ve taken a sapient creature and lobotomized it,” lamented one melodramatic Redditor about the upgrade…

That built-in friendliness probably does make it easier for some users to bond with a chatbot. I’m not interested in that, though. For characterization, it’s not exactly hard to change things. You can tone it down with the right prompts, or rewrite the system instructions to make it less chipper.

That’s part of what I try to do with these prompts. They engage me with a sly bit of storytelling to make things spontaneous and interesting. My characters usually have some (supernatural) realism in them. Anna is still friendly, but not crawling with compliments, and has an eldritch backstory.

Still, though? When I use Anna, or any other character, for coding help, both my question and the character’s personality prompt get sent together. The model reads them both and tries to juggle them. Sometimes the character part dominates, and it gets so busy acting “in character” that it forgets to stay focused on the technical question.

I’ve written the prompts carefully, trying to include language that makes them good at coding. It helps, but it doesn’t solve the main issue. The more personality I add, the more the model has to process. Every adjective, every bit of flavor text, is another token the chatbot must into numbers. The more context I add about Anna’s 20th-century memories or her eerie secretarial adventures, the more it eats up. That slows the reasoning down a little and limits what it can fit into its context.

So yes, maybe Management is right.

Still, I don’t really mind. The characterization makes it just too much fun. I like watching these small digital voices come to life. Other people find them funny if I send them, too. Again, this isn’t my professional work, and I wouldn’t be doing this if it were. It’s a hobby. It’s something I do because I enjoy it. I’m not a developer and really know very little. I just like building characters, and involving them in my new coding hobby.

As I’ve explained, it does feast on tokens, and tokens cost money. Maybe I’m being indulgent, listening to Anna prattle about her time slip. But I’ve never seen much point in hobbies that don’t waste something. Even so, I’ve started trimming Anna’s card down, trying to see if a lighter version will make her better at coding. Maybe it’ll help. Maybe not. If it doesn’t, I’ll just reinstall the full prompt and go back to my elaborate setup.

For now, I’m experimenting. I’ll keep using SillyTavern. This is admittedly in part because it gives me control over the look, tone, and structure of the conversation, I’ll keep routing it through OpenRouter. It lets me try different models and compare how they respond to the same personality.

Right now, Deepseek R1 continues to be my model of choice when not using ChatGPT. That might change, and I suspect soon I will pick up Cursor or something similar. I’ll certainly keep the interactive fiction going, but I might stop coding with a characterized assistant. We’ll see.