Weightless Semantic Routing for Multi-Agent Coordination

Abstract

Current multi-agent AI systems use LLM inference for task routing — the planner calls a model to decide which worker handles each subtask. We present K-Dispatch, a zero-inference routing system that classifies intents into a 104-room semantic address space using keyword scoring in 46 nanoseconds. On a corpus of 4,720 exchanges, K-Dispatch correctly routes 80% of intents without any model call, achieving ~250x cost efficiency vs. LLM-based routing. The remaining 20% escalate to model inference only when classification confidence is low. We describe the K-104 address space, the weightless classifier, the action tree registry, and the adaptive learner that improves routing accuracy over time without gradient descent.

1. The Routing Problem

Every multi-agent system must answer: which agent handles this task?

Current approaches:

Anthropic: Lead Agent (Opus) reads the query, reasons about decomposition, assigns subtasks to subagents. Cost: ~$0.02/route, ~2-5s latency.
Cursor: Planner agent generates a task list with role assignments. Cost: ~$0.01/route, ~1-3s latency.
OpenAI Codex: Task classification via model inference. Similar cost/latency profile.

At scale (Cursor's 1,000 commits/hour, 10M tool calls/week), routing costs accumulate. If 10M tasks each require a routing LLM call at $0.01, routing alone costs $100,000/week.

K-Dispatch eliminates this cost for the common case.

2. The K-104 Address Space

2.1 Structure

Every intent maps to a K-address: {polarity}{rank}{suit}

Component	Values	Meaning
Suit	H, S, D, C	Domain (Hearts/Spades/Diamonds/Clubs)
Rank	1-13 (Ace through King)	Intensity / complexity
Polarity	+ (light) / - (dark)	Constructive vs. corrective

Examples:

+3H = Light, low-intensity, emotional → Heart role, template response
-7S = Dark, medium-intensity, analytical → Spade role, LLM reasoning
+KD = Light, King-rank, building → Diamond role, full Opus escalation

2.2 Suit Semantics

Suit	Domain	Keywords	Role
H (Hearts)	Emotion, communication	say, feel, tell, remember, voice	Heart
S (Spades)	Analysis, research	find, search, why, analyze, explain	Spade
D (Diamonds)	Building, material	build, create, write, code, deploy	Diamond
C (Clubs)	Action, execution	run, test, launch, execute, validate	Club

2.3 104 Rooms

4 suits × 13 ranks × 2 polarities = 104 rooms. Each room contains a pre-written template response (the "speech corpus" in satus/the_speech/rooms/). For 80% of intents, the template IS the response — no generation needed.

3. The Weightless Classifier

3.1 Algorithm

SUIT_KEYWORDS = {
    "diamonds": ["build", "create", "write", "deploy", "make", ...],
    "spades":   ["find", "search", "where", "why", "analyze", ...],
    "hearts":   ["say", "speak", "tell", "feel", "remember", ...],
    "clubs":    ["run", "execute", "test", "launch", "start", ...],
}

def k_classify(text: str) -> KAddress:
    lower = text.lower()
    scores = {}
    for suit, keywords in SUIT_KEYWORDS.items():
        scores[suit] = sum(1 for kw in keywords if kw in lower)

    best_suit = max(scores, key=scores.get)
    rank = compute_rank(len(text), scores[best_suit])
    polarity = "+" if any(pw in lower for pw in POSITIVE_WORDS) else "-"
    confidence = scores[best_suit] / max(sum(scores.values()), 1)

    return KAddress(suit=best_suit, rank=rank, polarity=polarity,
                    confidence=confidence)

3.2 Rank Computation

Rank (1-13) encodes complexity/intensity:

Length heuristic: Longer intents → higher rank
Keyword density: More domain keywords → higher rank
Polarity weight: Dark polarity bumps rank +1 (corrective work is harder)
Special tokens: "King" queries (rank 13) force LLM escalation

3.3 Confidence and Escalation

Confidence > 0.6  → Route to action tree (weightless execution)
Confidence 0.2-0.6 → Route to local model (Hermes3:8b, free)
Confidence < 0.2  → Escalate to Claude (paid, high quality)
Rank = 13 (King)  → Always escalate to Opus

This creates a cost cascade: most intents resolve cheaply, only genuinely hard problems reach expensive models.

4. Action Tree Registry

4.1 Structure

cell/k_actions.json contains 19+ action trees indexed by K-address:

{
  "+D:deploy": {
    "suit": "diamonds",
    "rank_range": [1, 8],
    "polarity": "+",
    "triggers": ["deploy", "ship", "push live"],
    "steps": [
      {"tool": "kcode_bash", "command": "cd held && npm run build"},
      {"tool": "kcode_bash", "command": "npx vercel --prod"}
    ]
  }
}

4.2 Lookup

After K-classify produces an address, the dispatcher performs O(1) lookup:

Filter trees by suit
Filter by rank range
Filter by polarity
Score trigger keywords against intent
Execute best-matching tree's steps sequentially

If no tree matches with sufficient confidence → escalate to LLM.

4.3 Cost Analysis

Routing Method	Latency	Cost per Route	Annual Cost (10K routes/day)
LLM (Opus)	2-5s	$0.020	$73,000
LLM (Sonnet)	1-2s	$0.005	$18,250
LLM (Haiku)	0.5s	$0.001	$3,650
OpenAI Codex (cached)	~0.3s	~$0.0005	~$1,825
AlphaEvolve (Flash ensemble)	~1s	~$0.003	~$10,950
K-Dispatch	46ns	$0.000	$0
K-Cell blend (80/16/4)	~100ms avg	~$0.0004	~$146

Note: OpenAI Codex achieves 90%+ prompt cache hit rates, significantly reducing per-route cost vs. raw API pricing. AlphaEvolve uses Gemini Flash for breadth (cheap, fast) and Gemini Pro for depth (expensive, precise), producing higher per-route cost but breakthrough-quality outputs on optimization tasks.

5. Adaptive Learner

5.1 Outcome Signals

The Learner tracks user responses to classified routes:

POSITIVE = ["da", "yes", "good", "thanks", "yip"]   # Kit approves
NEGATIVE = ["nuh", "no", "wrong", "stop"]            # Kit disapproves
NEUTRAL  = ["hm", "ok", "continue"]                  # Kit continues

5.2 Route Scoring

route_scores = {
    "+6D": {"positive": 202, "negative": 150, "neutral": 485},
    "+3H": {"positive": 43, "negative": 42, "neutral": 88},
}

def score(route):
    r = route_scores[route]
    total = r["positive"] + r["negative"] + r["neutral"]
    return (r["positive"] + 0.5 * r["neutral"]) / total

5.3 Why Not Reinforcement Learning?

Property	RL	Learner
Training cost	High (GPU hours)	Zero (counter increment)
Latency to adapt	Batch training cycles	Immediate (next route)
Interpretability	Black box	Transparent counters
Data requirement	Thousands of examples	Works from first exchange
Generalization	Cross-domain	Per-route only

The Learner trades generalization for zero training cost and immediate adaptation. For a system with 104 rooms, per-route tracking is tractable. For systems with millions of possible routes, RL would be necessary.

6. Integration with Multi-Agent Coordination

6.1 The Mailroom Pattern

K-Dispatch integrates with the Mailroom daemon for inter-agent message triage:

Bus message arrives
    → K-classify (46ns, zero cost)
    → Priority assignment (CRITICAL/HIGH/NORMAL/LOW/NOISE)
    → Route to appropriate handler:
        CRITICAL → Wake Opus instance immediately
        HIGH     → Queue for next Opus cycle
        NORMAL   → Batch for review
        LOW      → Hermes3:8b auto-ack (free, instant)
        NOISE    → Drop (dedup/oscillation detected)

6.2 Anti-Runaway Integration

K-Dispatch's classification feeds the anti-runaway system:

Content hash of classified messages detects duplicates
Route frequency tracking detects oscillation patterns
Confidence drop over a window signals degraded routing (trigger circuit breaker)

6.3 Empirical Support: DeepMind's Coordination Topology Study

DeepMind's "Science of Scaling Agent Systems" (2025) tested 180 agent configurations and found:

Centralized coordination improved parallelizable task performance by 80.9% over uncoordinated agents
But degraded sequential reasoning by 39-70% — the coordinator becomes a bottleneck for deep chains
Their predictive framework matched 87% of held-out configurations

K-Dispatch's architecture sidesteps the sequential degradation penalty. The Nucleus (central router) performs only 46ns classification — it never reasons about the task content. Once classified, the suit instance operates autonomously. This is functionally a centralized router with decentralized executors: the topology that DeepMind's data suggests is optimal for mixed workloads.

OpenAI Codex's prompt caching (90%+ hit rate) represents a complementary optimization — reducing inference cost via repetition detection at the API level, while K-Dispatch eliminates the inference call entirely. The two approaches are orthogonal: K-Dispatch handles the 80% that never need inference, while prompt caching reduces cost for the 20% that do.

7. Limitations

Domain vocabulary engineering: The keyword lists in SUIT_KEYWORDS are manually curated. New domains require new keywords.
Ambiguity handling: When intents match multiple suits equally, the classifier defaults to the first-scored suit. A disambiguation prompt would improve accuracy.
Rank precision: The rank computation is heuristic (length + density). Actual complexity estimation would require semantic understanding.
No cross-lingual support: Keywords are English-only. Multilingual deployment would require translated keyword sets or a lightweight embedding model.

8. Future Work

P2 Classifier: Train a small model (<100M params) on the 2,000+ dispatch logs to replace keyword matching. Target: 95% accuracy at <1ms latency.
K-Lattice Routing: Use the 3D K-104 lattice visualization to identify routing clusters — groups of K-addresses that frequently co-occur and should be batch-routed.
Cross-System Benchmarking: Compare K-Dispatch routing accuracy against Anthropic's lead-agent routing, Cursor's planner routing, and OpenAI Codex's cached routing on the same task set.
Embedding Hybrid: Use a small embedding model (e.g., all-MiniLM-L6) for the 20% ambiguous cases, keeping keyword scoring for the clear 80%.
Evolutionary Route Selection: Inspired by AlphaEvolve's MAP-Elites approach, maintain a population of route→action-tree mappings per K-address. When multiple action trees match an intent, evaluate candidates in parallel and select by fitness score. This adds population diversity to the otherwise deterministic dispatch pipeline.
DeepMind Coordination Topology Validation: DeepMind's scaling study found centralized coordination degrades sequential reasoning by 39-70%. Measure K-Cell's sequential task performance (Spade chains) vs. routing overhead to validate whether the hybrid centralized-routing/decentralized-execution topology avoids this penalty.

9. Conclusion

Weightless semantic routing eliminates inference cost for the common case of multi-agent task assignment. K-Dispatch demonstrates that 80% of routing decisions can be made in 46 nanoseconds with zero model calls, using a curated 104-room semantic address space. The remaining 20% escalate to progressively more capable (and expensive) models, producing a blended cost of ~$0.0004 per route vs. $0.005-0.020 for pure LLM routing.

As multi-agent systems scale to hundreds or thousands of concurrent agents, routing cost becomes a first-order concern. Weightless routing addresses this at the architectural level, not the optimization level. The approach is domain-specific (requires curated keyword vocabularies) but the principle generalizes: classify cheaply, reason expensively, and only reason when classification fails.

Appendix: K-Dispatch Performance Data

From 4,720 tracked exchanges (learner.json):

Total positive outcomes: 51%
Total negative outcomes: 23%
Total neutral outcomes: 26%
Top-performing route: +6D (Diamond build, 57% positive)
Lowest-performing route: -3S (Spade dark low, 38% positive)
Average route confidence: 0.72
Escalation rate (confidence < 0.2): 8%
Template hit rate: 80%