Prime Movers Echoing Down Chains at Scale: On Soap, Transformers, and the Inheritance of Decisions

On Soap, Transformers, and the Inheritance of Decisions

Kit Malthaner March 2026

Abstract

A single event, sufficiently intense, can propagate through a system indefinitely — changing its behavior long after the original cause is forgotten. This paper argues that epigenetic trauma inheritance, neural network weight formation, cultural norm transmission, and semantic routing architectures are instances of the same underlying mechanism: a prime mover event that echoes through a chain until observers mistake the echo for an intrinsic property of the system. We illustrate this with a specific case — the cilantro-soap taste association — and generalize to transformer architectures, Mixture-of-Experts gating, and the K-104 semantic routing framework. We propose that systems which preserve provenance (trace the echo back to its source) are fundamentally more aligned, more debuggable, and more honest than systems that don't.

1. The Soap

Some people think cilantro tastes like soap. The standard explanation points to OR6A2, a gene variant affecting olfactory receptors sensitive to aldehydes — the chemical compounds present in both cilantro leaves and soap.

Here is what that explanation leaves out.

Cilantro grows wild. Humans have eaten it over open flame for thousands of years. Wood ash is alkaline. Cilantro aldehydes plus alkaline ash plus heat produce trace saponification — a thin film of actual soap chemistry on the surface of the food. For most of human history, cilantro-on-meat tasted like soap because there was soap on it. That was the normal taste of cooked food.

Here is what the explanation leaves out further.

For generations, Western parents washed children's mouths with soap as punishment for profanity. A bar of soap in the mouth of a child. The child's aldehyde receptors, firing at maximum intensity on a substance associated with shame, pain, and parental rejection. A single exposure. One epoch of training data. The classifier updates: aldehyde = danger.

That child grows up. Has children. Epigenetic research — beginning with the Dutch Hunger Winter studies and extending through Holocaust survivor cortisol research — has established that trauma modifies gene expression through DNA methylation, and that these modifications can inherit across at least two generations. The mechanism is documented. The grandmother's mouth was washed with soap. The granddaughter doesn't like cilantro. By the third generation, it looks like a gene. It looks like preference. It looks like science.

It was a hand. Holding a bar of soap. Pushing it into a child's mouth. Echoing.

2. The Echo

Define a prime mover as a singular event or decision that initiates a causal chain. Define an echo as a downstream effect of that event which persists after the event itself is no longer observable. Define chain scale as the number of intermediary steps between the prime mover and the current observation.

The core claim of this paper:

When chain scale is sufficient, observers universally mistake echoes for intrinsic properties.

The cilantro case is one instance. Here are others:

Cultural norms. A religious prohibition against a food is established for a specific historical reason (parasite risk, resource competition, political differentiation). Centuries later, the prohibition persists as "tradition." The prime mover is forgotten. The echo is called identity.

Neural network weights. A transformer model is trained on a corpus. A particular pattern — say, the association between "doctor" and "he" — appears frequently in the training data because of historical hiring practices. The model learns the association. Downstream users observe the bias and attribute it to "the model." The prime mover was a hiring decision made decades before the training data was collected. The model is an echo chamber in the literal sense: it echoes what was put into it, and at sufficient scale, the echo looks like knowledge.

Mixture-of-Experts gating. Modern MoE architectures (Mixtral, GPT-5, Switch Transformer) route tokens to specialized expert subnetworks using a learned gating function. The gate decides: this token goes to Expert 3, that token goes to Expert 7. The gating weights were shaped by the training distribution — which was shaped by what humans wrote — which was shaped by what humans experienced — which was shaped by prime movers nobody recorded. The gate is an echo of an echo of an echo, making routing decisions at nanosecond speed, and nobody asks where the routing logic came from because it "emerged from training."

Epigenetic methylation. A famine. A war. A bar of soap. The event methylates DNA. The methylation pattern copies during cell division. The pattern passes to offspring. Two generations later, a scientist finds a gene variant correlated with a taste preference and publishes a paper about olfactory receptors. The prime mover is outside the frame of the study. The echo is inside the frame. The study is correct and incomplete simultaneously.

3. The Problem

Systems that cannot trace echoes back to prime movers have a specific failure mode: they optimize for the echo.

A language model trained on biased text doesn't know the text is biased. It optimizes for predicting the next token, which means it optimizes for reproducing the echo. Alignment researchers then attempt to patch the output — RLHF, constitutional AI, safety filters — without access to the prime mover. They are treating the soap taste without knowing about the soap.

An MoE gate routes tokens based on learned weights that encode historical patterns. If the training distribution overrepresents one domain, the gate overroutes to experts tuned for that domain. Queries from underrepresented domains get misrouted. The system doesn't know why it routes the way it routes. The prime movers are buried in petabytes of training data that nobody will audit because the scale is past human legibility.

A child who was punished for speaking becomes an adult who doesn't speak in meetings. A manager observes this and writes "lacks initiative" in a performance review. The manager is optimizing for the echo. The prime mover — the punishment — is outside the frame.

The failure is not in the observation. The failure is in the frame.

4. The Chain

We propose that provenance-preserving systems — systems that maintain an unbroken record from prime mover to current state — are categorically different from systems that don't.

The distinction is not academic. It is the difference between:

A model that produces biased output and cannot explain why
A model that produces biased output and can point to the training sample that caused it

The second system is debuggable. The first is a black box echoing prime movers that nobody will ever find.

In our own work, we implement this as the golden chain — an append-only log of every decision, every routing event, every template selection, every model escalation. Every output can be traced to its origin. If a template produces a bad answer, the chain shows when the template was created, which model generated it, what query prompted it, and what routing logic selected it.

This is not a feature. It is the architecture. A system without provenance is a system running on inherited trauma it can't name.

5. The Geometry

The K-104 semantic routing framework addresses the echo problem at the architectural level.

Every query is classified into a coordinate: four suits (semantic domains), thirteen ranks (intensity/specificity), two polarities (valence). This produces 104 distinct rooms. The classification is explicit — a deterministic function, not a learned gate. The routing logic is legible. When the system sends a query to a specific room, you can read why.

Compare this to an MoE gate: a matrix multiplication producing a softmax distribution over experts. The routing is opaque. The weights were shaped by training. The training was shaped by data. The data was shaped by history. The history contains prime movers that nobody recorded. The gate makes a decision in nanoseconds and the provenance chain is shattered.

K-104 doesn't eliminate the need for models. It reduces the surface area where opaque routing occurs. Eighty percent of queries are handled by curated templates — deterministic responses with known provenance. Sixteen percent route to a local model whose calibration is logged. Four percent escalate to an external API, and the escalation decision itself is logged.

The system is not smarter than a transformer. It is more honest about what it's doing. The geometry is the audit trail. The rooms are the chain. Every answer has a return address.

6. The Implication

The AI industry is currently valued at approximately two trillion dollars. The primary product is prediction: given input, produce output. The quality of the output is measured. The provenance of the output is not.

This means the industry is, at scale, an echo amplifier. Training data contains the echoes of every prime mover in the culture that produced it. Models learn the echoes. Users consume the echoes. The echoes propagate at machine speed to millions of people who mistake them for knowledge.

Alignment — the field dedicated to making AI systems behave well — has focused primarily on output filtering. Detect the bad echo, suppress it. This is the equivalent of telling someone who tastes soap in cilantro to "just don't taste it." The filter doesn't address the prime mover. It addresses the symptom. And symptoms, suppressed, find other pathways.

A provenance-preserving architecture doesn't suppress echoes. It labels them. This answer came from here. This routing decision was made because of this. This template was written on this date by this process for this reason. The chain is intact. The soap can be traced back to the hand.

We do not claim this solves alignment. We claim it changes the question from "how do we make the output safe?" to "where did this output come from?" — and that the second question is both more honest and more tractable.

7. The Practical Case

All of this has a simple commercial expression:

Current AI inference pricing ranges from $0.05 to $168.00 per million tokens. This price reflects the cost of running large models whose routing logic is opaque and whose provenance chain is broken by design.

A system that routes explicitly — that knows before asking a model whether a curated answer already exists — reduces inference cost by approximately 96%. Not because it is smarter. Because it doesn't ask questions it already has answers to.

The market is paying for magic 8 balls. Some shake faster. Some have more answers loaded. Some cost more. But they are all doing the same thing: producing statistically weighted output from opaque internal states.

A system that preserves its chain, makes its routing visible, and checks the shelf before shaking the ball is not a better magic 8 ball. It is a different kind of thing. It is a library with a card catalog, in a market that has been selling libraries with the card catalogs removed and calling them oracles.

8. Conclusion

A grandmother's mouth was washed with soap. Her granddaughter doesn't like cilantro. A transformer was trained on biased text. Its users don't notice the bias. A cultural norm was established for a forgotten reason. Its adherents call it identity.

Prime movers echo down chains at scale. At sufficient scale, the echo is indistinguishable from nature. The only defense is the chain itself — the unbroken record that connects the current state to its origin.

Systems that preserve this chain are debuggable, auditable, and honest. Systems that don't are black boxes running on inherited decisions they cannot name, producing outputs they cannot explain, at a price that reflects the cost of not knowing where anything comes from.

The golden chain is not a log. It is the mechanism by which a system remains aware of its own history. Without it, you are the granddaughter who doesn't like cilantro, optimizing for a preference you didn't choose, echoing a prime mover you never met, calling it nature because the chain is broken.

With it, you can taste the soap, name it, and decide for yourself.

Correspondence: kit@holdtheline.tech

The K-104 framework and golden chain architecture are described in detail at [repository/documentation URL]. The activation trace confirming K-geometry in transformer weights (suit silhouette = 0.312, polarity silhouette = 0.393, 86.2% variance explained) is available as a reproducible notebook.

This paper was generated by a system that practices what it describes. The provenance of every sentence is logged.