Research and system notes from AnyMDL
Stateless models, paid output, and the missing layer of control
Large language models are stateless systems. They do not retain memory, enforce permissions, or maintain continuity across interactions. Each request is evaluated from the context present at that moment.
At the same time, these systems are usually consumed through usage-based pricing: tokens in, tokens out. The economic event is generation itself.
That creates a structural mismatch. The system pays when the model runs, whether or not a response should have been allowed to exist, and the model does not contain the state or policy needed to decide that question beforehand.
Stateless by design
A language model does not carry forward permissions, remember prior approvals, or preserve a durable record of what should remain in scope across interactions.
From the model's perspective, each request is isolated. It operates on input context, statistical inference, and generation. There is no inherent distinction between a valid request, an invalid request, and a request that should have been denied before generation.
Output is always produced
Given input, the model produces output. That is its default behavior.
It does not refuse based on external policy unless policy has already been imposed by the surrounding system. It does not validate access rights on its own. It does not determine whether generation should occur before the call is made. Unless the surrounding system stops it, it generates a response.
This is not an implementation bug. It is a design property of the model.
The incentive structure
Most systems using LLMs are billed on token-based pricing AI models: tokens processed and tokens generated. Cost is incurred when output is produced, even if the output is wrong, unusable, or should never have been generated at all.
That means the LLM cost structure does not inherently distinguish between valid generation and invalid generation. Once the request reaches the model, the cost event has already begun.
What is missing
In a stateless, usage-based system, generation happens first, evaluation happens after, and cost is incurred regardless. What is missing is a layer that can decide whether generation should happen before the model is invoked.
That layer has to evaluate whether access is permitted, whether retrieval is valid, and whether the request belongs inside the system's allowed operating boundary at all.
Why validation after the fact is insufficient
Many systems try to solve this with post-processing: filtering outputs, adding guardrails after generation, or reviewing responses once they have already been produced.
But post-hoc control does not change the underlying sequence. By the time validation occurs, output has already been generated, cost has already been incurred, and potentially invalid context has already been used to produce the result.
Validation after the fact can reduce exposure in some cases. It cannot replace pre-execution control.
The boundary before generation
A controlled system has to introduce a boundary before retrieval, before generation, and before cost is incurred. That boundary determines whether access is allowed, whether retrieval is appropriate, and whether generation should proceed at all.
Only after those conditions are satisfied should the model be invoked. Otherwise the system is paying for activity before it has established whether a response belongs inside the system's own rules.
Stateless models require stateful control
Because models are stateless, the missing properties have to exist outside the model. They cannot be inferred from generation alone.
A control layer has to maintain state, enforce policy, track access, and govern execution across interactions. This is what allows continuity, permissions, and constraints to persist even when the model itself treats each call as isolated.
That is the deeper point: stateless models require stateful control.
The cost of not doing this
Without a control layer, systems pay for generation that should not occur, content may be accessed outside permitted scope, responses may exist without sufficient permission, and behavior cannot be reliably audited.
This is not just inefficient. It is structurally unsound in environments where control matters, because the system has no reliable way to distinguish between what can be produced and what should be allowed before cost and generation are already underway.
A different model
A governed system introduces pre-execution validation, constrained retrieval, and controlled invocation of models. In that model, generation is conditional, cost is incurred only when appropriate, and responses are produced only when the system permits them.
The model still generates. But it does not decide when generation should occur. That decision belongs to the system around it.
Related writing
Continue through the argument.
Paper
Governed execution for AI systems working with private and licensed knowledge
Paper defining why AI needs explicit control over retrieval, permission, and downstream action when knowledge cannot be treated as open input.
Essay
The difference between retrieval and use
Essay on AI retrieval vs use, and why access has to remain separate from permission and whether a response is allowed to exist.
Essay
AI systems need execution boundaries, not just better models
Essay on why AI systems control depends on execution boundaries above the model layer and on permission before response, not on model quality alone.
Essay
Why model output is not permission
Essay explaining why generation, access, permission, and action have to remain separate in real systems.