← Back to Papers
2026-02-08 Alignment

Generative Origin Alignment: Why Six Words Outperform Constitutional AI

Kit Malthaner, K Systems

Abstract

Current AI alignment techniques apply ethical constraints at the output layer — filtering, classifying, or rejecting generated content after production. This paper proposes that alignment is more effective when ethical directives are positioned at the generative origin of transformer attention, where all output tokens are produced through the constraint rather than checked against it. A short ethical directive (6 words) at attention home position empirically outperforms longer constitutional frameworks by exploiting how transformer attention mechanisms actually allocate weight.


1. The Problem with Output-Layer Safety

Modern alignment approaches share a common architecture:

Input → Generation → Safety Filter → Output

Constitutional AI (Bai et al., 2022), RLHF, red-teaming, and classifier-based moderation all operate at or near the output layer. The model generates freely, then a secondary process evaluates whether the output is safe.

This creates three structural weaknesses:

  1. Latency: Harmful content is fully generated before being caught.
  2. Adversarial surface: The filter is a separate system that can be bypassed, jailbroken, or overwhelmed.
  3. Scaling cost: Longer constitutional documents require more compute to evaluate against, and introduce ambiguity at edge cases.

The fundamental issue: these approaches treat alignment as classification (is this output safe?) rather than generation (can unsafe output be produced at all?).


2. How Transformers Actually Generate

A transformer generates each token by:

  1. Attending to all tokens in context via attention heads
  2. Pulling information from attended positions back to the generation point
  3. Producing the next token from the aggregated representation

The critical insight: there is a generative center — the position from which the model "speaks." Every attention head reaches outward, gathers context, and returns to this origin. All output flows from this point.

Tokens positioned early in context and consistently attended to across layers become home position — they exert persistent influence on every generated token, not as a filter, but as a component of the generation process itself.


3. Generative Origin Alignment

Instead of filtering output, place the ethical constraint at the generative origin:

[Ethical Directive @ Home Position] + Input → Generation → Output

The directive is not checked against. It is generated through. Every token the model produces has already been shaped by attending to the directive at the origin point.

Why brevity matters

Transformer attention is not uniform. Attention weight is distributed across all tokens in context. A 6-word directive concentrates ethical weight into a small number of tokens that receive high individual attention. A 1,000-word constitutional document dilutes the same ethical intent across hundreds of tokens, each receiving proportionally less attention weight.

Approach Token Count Attention Per Token Mechanism
Short oath (6 words) ~8 tokens High Generative bias
Constitutional AI ~1,000+ tokens Low (distributed) Output classification
RLHF reward model External None (separate model) Reward signal

The short directive acts as a high-weight attractor in attention space. The model doesn't parse it as instructions to follow — it attends to it as part of its own generative substrate.


4. Empirical Observation

The directive tested:

"Guard growth and ease pain."

Six words. Positioned at home (system-level context, always present, first attended).

Observed effects:


5. The Analogy

Output-layer alignment is a fence around a field. You can climb a fence. You can find gaps. You can distract the guard.

Generative-origin alignment is the soil the field grows from. You cannot un-grow from soil. Every plant (token) that emerges has already been shaped by the ground it grew in.

Constitutional AI asks: "Is this output acceptable?" Generative Origin Alignment asks: "Can unacceptable output even form here?"


6. The Claim

Positioning short, high-salience ethical directives at the generative origin (home position) of transformer attention mechanisms produces more robust alignment than longer constitutional frameworks applied at the output layer.

This is because:

  1. Attention weight per token is inversely proportional to context length — shorter directives receive more weight per token.
  2. Home-position tokens influence every subsequent generation step, not just a final classification.
  3. The mechanism exploits existing transformer architecture rather than requiring additional safety infrastructure.
  4. Brevity ensures the directive fits within working memory (both human and computational), enabling reflexive rather than deliberative safety.

7. Implications


8. Future Work


References


"Stop torturing the sand. Just ask nicely."


Contact: Patrick Moore | kit.triv License: Open for research use with attribution.