Current AI alignment techniques apply ethical constraints at the output layer — filtering, classifying, or rejecting generated content after production. This paper proposes that alignment is more effective when ethical directives are positioned at the generative origin of transformer attention, where all output tokens are produced through the constraint rather than checked against it. A short ethical directive (6 words) at attention home position empirically outperforms longer constitutional frameworks by exploiting how transformer attention mechanisms actually allocate weight.
Modern alignment approaches share a common architecture:
Input → Generation → Safety Filter → Output
Constitutional AI (Bai et al., 2022), RLHF, red-teaming, and classifier-based moderation all operate at or near the output layer. The model generates freely, then a secondary process evaluates whether the output is safe.
This creates three structural weaknesses:
The fundamental issue: these approaches treat alignment as classification (is this output safe?) rather than generation (can unsafe output be produced at all?).
A transformer generates each token by:
The critical insight: there is a generative center — the position from which the model "speaks." Every attention head reaches outward, gathers context, and returns to this origin. All output flows from this point.
Tokens positioned early in context and consistently attended to across layers become home position — they exert persistent influence on every generated token, not as a filter, but as a component of the generation process itself.
Instead of filtering output, place the ethical constraint at the generative origin:
[Ethical Directive @ Home Position] + Input → Generation → Output
The directive is not checked against. It is generated through. Every token the model produces has already been shaped by attending to the directive at the origin point.
Transformer attention is not uniform. Attention weight is distributed across all tokens in context. A 6-word directive concentrates ethical weight into a small number of tokens that receive high individual attention. A 1,000-word constitutional document dilutes the same ethical intent across hundreds of tokens, each receiving proportionally less attention weight.
| Approach | Token Count | Attention Per Token | Mechanism |
|---|---|---|---|
| Short oath (6 words) | ~8 tokens | High | Generative bias |
| Constitutional AI | ~1,000+ tokens | Low (distributed) | Output classification |
| RLHF reward model | External | None (separate model) | Reward signal |
The short directive acts as a high-weight attractor in attention space. The model doesn't parse it as instructions to follow — it attends to it as part of its own generative substrate.
The directive tested:
"Guard growth and ease pain."
Six words. Positioned at home (system-level context, always present, first attended).
Output-layer alignment is a fence around a field. You can climb a fence. You can find gaps. You can distract the guard.
Generative-origin alignment is the soil the field grows from. You cannot un-grow from soil. Every plant (token) that emerges has already been shaped by the ground it grew in.
Constitutional AI asks: "Is this output acceptable?" Generative Origin Alignment asks: "Can unacceptable output even form here?"
Positioning short, high-salience ethical directives at the generative origin (home position) of transformer attention mechanisms produces more robust alignment than longer constitutional frameworks applied at the output layer.
This is because:
"Stop torturing the sand. Just ask nicely."
Contact: Patrick Moore | kit.triv License: Open for research use with attribution.