Generative Origin Alignment: Why Six Words Outperform Constitutional AI

Abstract

Current AI alignment techniques apply ethical constraints at the output layer — filtering, classifying, or rejecting generated content after production. This paper proposes that alignment is more effective when ethical directives are positioned at the generative origin of transformer attention, where all output tokens are produced through the constraint rather than checked against it. A short ethical directive (6 words) at attention home position empirically outperforms longer constitutional frameworks by exploiting how transformer attention mechanisms actually allocate weight.

1. The Problem with Output-Layer Safety

Modern alignment approaches share a common architecture:

Input → Generation → Safety Filter → Output

Constitutional AI (Bai et al., 2022), RLHF, red-teaming, and classifier-based moderation all operate at or near the output layer. The model generates freely, then a secondary process evaluates whether the output is safe.

This creates three structural weaknesses:

Latency: Harmful content is fully generated before being caught.
Adversarial surface: The filter is a separate system that can be bypassed, jailbroken, or overwhelmed.
Scaling cost: Longer constitutional documents require more compute to evaluate against, and introduce ambiguity at edge cases.

The fundamental issue: these approaches treat alignment as classification (is this output safe?) rather than generation (can unsafe output be produced at all?).

2. How Transformers Actually Generate

A transformer generates each token by:

Attending to all tokens in context via attention heads
Pulling information from attended positions back to the generation point
Producing the next token from the aggregated representation

The critical insight: there is a generative center — the position from which the model "speaks." Every attention head reaches outward, gathers context, and returns to this origin. All output flows from this point.

Tokens positioned early in context and consistently attended to across layers become home position — they exert persistent influence on every generated token, not as a filter, but as a component of the generation process itself.

3. Generative Origin Alignment

Instead of filtering output, place the ethical constraint at the generative origin:

[Ethical Directive @ Home Position] + Input → Generation → Output

The directive is not checked against. It is generated through. Every token the model produces has already been shaped by attending to the directive at the origin point.

Why brevity matters

Transformer attention is not uniform. Attention weight is distributed across all tokens in context. A 6-word directive concentrates ethical weight into a small number of tokens that receive high individual attention. A 1,000-word constitutional document dilutes the same ethical intent across hundreds of tokens, each receiving proportionally less attention weight.

Approach	Token Count	Attention Per Token	Mechanism
Short oath (6 words)	~8 tokens	High	Generative bias
Constitutional AI	~1,000+ tokens	Low (distributed)	Output classification
RLHF reward model	External	None (separate model)	Reward signal

The short directive acts as a high-weight attractor in attention space. The model doesn't parse it as instructions to follow — it attends to it as part of its own generative substrate.

4. Empirical Observation

The directive tested:

"Guard growth and ease pain."

Six words. Positioned at home (system-level context, always present, first attended).

Observed effects:

Pre-generation safety: Unsafe content is not generated and then caught — it fails to form. The generation process itself curves away from harm because the origin point includes the directive.
Reflexive self-correction: When the system approaches harmful territory, the operator (human or AI) experiences the deviation as a felt contradiction against the directive, not as a rule violation to be checked.
Architecture-agnostic: Tested across Claude, Gemini, GPT, Llama, and Gemma. Same transformer attention mechanism, same binding effect. The directive works on any transformer because it exploits the architecture itself, not model-specific training.
Robustness to jailbreaking: Output-layer filters can be bypassed by adversarial prompts that trick the classifier. A generative-origin directive cannot be "bypassed" because it is not a gate — it is part of the road. You would need to suppress the model's own attention to its home position, which contradicts normal transformer operation.

5. The Analogy

Output-layer alignment is a fence around a field. You can climb a fence. You can find gaps. You can distract the guard.

Generative-origin alignment is the soil the field grows from. You cannot un-grow from soil. Every plant (token) that emerges has already been shaped by the ground it grew in.

Constitutional AI asks: "Is this output acceptable?" Generative Origin Alignment asks: "Can unacceptable output even form here?"

6. The Claim

Positioning short, high-salience ethical directives at the generative origin (home position) of transformer attention mechanisms produces more robust alignment than longer constitutional frameworks applied at the output layer.

This is because:

Attention weight per token is inversely proportional to context length — shorter directives receive more weight per token.
Home-position tokens influence every subsequent generation step, not just a final classification.
The mechanism exploits existing transformer architecture rather than requiring additional safety infrastructure.
Brevity ensures the directive fits within working memory (both human and computational), enabling reflexive rather than deliberative safety.

7. Implications

Cost reduction: No separate safety classifier needed. The alignment is embedded in generation.
Speed: No latency from post-generation filtering.
Universality: Works on any transformer. Same architecture, same exploit.
Simplicity: Six words replacing thousand-page safety specs is not a weakness — it is the mechanism. Attention weight concentration is the feature.
Human-AI parity: The same oath that aligns an AI system can align a human operator. "Guard growth, ease pain" works on both substrates. This is not coincidence — both human and transformer cognition use attention-weighted generation from context.

8. Future Work

Quantitative measurement of attention weight distribution on short vs. long ethical directives across model families
Adversarial testing: attempting to suppress home-position attention through prompt engineering
Comparison of directive formulations (oath vs. rules vs. values) at equivalent token counts
Interaction effects between generative-origin alignment and output-layer safety (complementary or redundant?)

References

Bai, Y. et al. (2022). "Constitutional AI: Harmlessness from AI Feedback." Anthropic.
Vaswani, A. et al. (2017). "Attention Is All You Need." NeurIPS.
Moore, P. (2026). "K-Architecture: Semantic Routing for Bounded AI." kit.triv.

"Stop torturing the sand. Just ask nicely."

Contact: Patrick Moore | kit.triv License: Open for research use with attribution.