Stateless models, paid output, and the missing layer of control

Large language models are stateless systems. They do not retain memory, enforce permissions, or maintain continuity across interactions. Each request is evaluated from the context present at that moment.

At the same time, these systems are usually consumed through usage-based pricing: tokens in, tokens out. The economic event is generation itself.

That creates a structural mismatch. The system pays when the model runs, whether or not a response should have been allowed to exist, and the model does not contain the state or policy needed to decide that question beforehand.

Stateless by design

A language model does not carry forward permissions, remember prior approvals, or preserve a durable record of what should remain in scope across interactions.

From the model's perspective, each request is isolated. It operates on input context, statistical inference, and generation. There is no inherent distinction between a valid request, an invalid request, and a request that should have been denied before generation.

Output is always produced

Given input, the model produces output. That is its default behavior.

It does not refuse based on external policy unless policy has already been imposed by the surrounding system. It does not validate access rights on its own. It does not determine whether generation should occur before the call is made. Unless the surrounding system stops it, it generates a response.

This is not an implementation bug. It is a design property of the model.

The incentive structure

Most systems using LLMs are billed on token-based pricing AI models: tokens processed and tokens generated. Cost is incurred when output is produced, even if the output is wrong, unusable, or should never have been generated at all.

That means the LLM cost structure does not inherently distinguish between valid generation and invalid generation. Once the request reaches the model, the cost event has already begun.

What is missing

In a stateless, usage-based system, generation happens first, evaluation happens after, and cost is incurred regardless. What is missing is a layer that can decide whether generation should happen before the model is invoked.

That layer has to evaluate whether access is permitted, whether retrieval is valid, and whether the request belongs inside the system's allowed operating boundary at all.

Why validation after the fact is insufficient

Many systems try to solve this with post-processing: filtering outputs, adding guardrails after generation, or reviewing responses once they have already been produced.

But post-hoc control does not change the underlying sequence. By the time validation occurs, output has already been generated, cost has already been incurred, and potentially invalid context has already been used to produce the result.

Validation after the fact can reduce exposure in some cases. It cannot replace pre-execution control.

The boundary before generation

A controlled system has to introduce a boundary before retrieval, before generation, and before cost is incurred. That boundary determines whether access is allowed, whether retrieval is appropriate, and whether generation should proceed at all.

Only after those conditions are satisfied should the model be invoked. Otherwise the system is paying for activity before it has established whether a response belongs inside the system's own rules.

Stateless models require stateful control

Because models are stateless, the missing properties have to exist outside the model. They cannot be inferred from generation alone.

A control layer has to maintain state, enforce policy, track access, and govern execution across interactions. This is what allows continuity, permissions, and constraints to persist even when the model itself treats each call as isolated.

That is the deeper point: stateless models require stateful control.

The cost of not doing this

Without a control layer, systems pay for generation that should not occur, content may be accessed outside permitted scope, responses may exist without sufficient permission, and behavior cannot be reliably audited.

This is not just inefficient. It is structurally unsound in environments where control matters, because the system has no reliable way to distinguish between what can be produced and what should be allowed before cost and generation are already underway.

A different model

A governed system introduces pre-execution validation, constrained retrieval, and controlled invocation of models. In that model, generation is conditional, cost is incurred only when appropriate, and responses are produced only when the system permits them.

The model still generates. But it does not decide when generation should occur. That decision belongs to the system around it.

Related writing

Stateless by design

Output is always produced

The incentive structure

What is missing

Why validation after the fact is insufficient

The boundary before generation

Stateless models require stateful control

The cost of not doing this

A different model

Continue through the argument.

Governed execution for AI systems working with private and licensed knowledge

The difference between retrieval and use

AI systems need execution boundaries, not just better models

Why model output is not permission