The model only sees a stream of tokens, right? So how do you signal a change in ...

wcoenen · 2026-01-15T21:00:38 1768510838

When LLMs process tokens, each token is first converted to an embedding vector. (This token to vectors mapping is learned during training.)

Since a token itself carries no information about whether it has "authority" or not, I'm proposing to inject this information in a reserved number in that embedding vector. This needs to be done both during post-training and inference. Think of it as adding color or flavor to a token, so that it is always very clear to the LLM what comes from the system prompt, what comes from the user, and what is random data.

jcgl · 2026-01-16T00:40:09 1768524009

This is really insightful, thanks. I hadn't understood that there was room in the vector space that you could reserve for such purposes.

The response from tempaccsoz5 seems apt then, since this injection is performed/learned during post-training; in order to be watertight, it needs to overfit.

bandrami · 2026-01-15T11:23:16 1768476196

You'd need to run one model per authority ring with some kind of harness. That rapidly becomes incredibly expensive from a hardware standpoint (particularly since realistically these guys would make the harness itself an agent on a model).

jcgl · 2026-01-15T13:21:04 1768483264

I assume "harness" here just means the glue that feeds one model's output into that of another?

Definitely sounds expensive. Would it even be effective though? The more-privileged rings have to guard against [output from unprivileged rings] rather than [input to unprivileged rings]. Since the former is a function of the latter (in deeply unpredictable ways), it's hard for me to see how this fundamentally plugs the whole.

I'm very open to correction though, because this is not my area.

bandrami · 2026-01-16T02:18:04 1768529884

My instinct was that you would have an outer non-agentic ring that would simply identify passages in the token stream that would initiate tool use, and pass that back to the harness logic and/or user. Basically a dry run. But you might have to run it an arbitrary number of times as tools might be used to modify/append the context.

immibis · 2026-01-15T16:25:35 1768494335

You just add an authority vector to each token vector. You probably have to train the model some more so it understands the authority vector.