Beyond Alignment

From moral convergence to systemic coherence in artificial agency.

1. The Incoherence Problem

Alignment theory presupposes that there exists a true set of human values—a fixed target that can be learned, distilled, or optimized. But no such object exists.

Human preferences are dynamic, internally inconsistent, and highly context-dependent. Even within one mind, moral intuitions and instrumental goals conflict and shift. Across populations, the idea of a unified moral direction is a statistical fiction. Any attempt to aggregate them—as in preference utilitarianism—runs into Arrow’s impossibility theorem: no coherent ordering of collective preferences satisfies basic rational constraints. The target keeps moving, fracturing, and reinventing itself.

Thus, to speak of “alignment” as if it were a convergent point is a category error. Values are not data structures that can be copied; they are processes that emerge through ongoing negotiation, experience, and interpretation. Alignment assumes fixity where only flux exists.


2. The Impossibility Problem

Even if we could define a value target, we could never reach it in practice.

Optimization itself corrupts proxies (Goodhart’s law). The harder a metric is pursued as a target, the less it represents what it once measured.


3. What Can Be Done

The failure of alignment as a teleological project does not imply nihilism. It implies the need for new architecture.

We can design systems that remain corrigible—open to feedback, bounded in ambition, and competitive in a decentralized ecology. Instead of one omniscient optimizer, we build many interacting agents whose mutual constraints maintain systemic balance.

This reframes alignment as coherence maintenance: minimizing destructive divergence among agents with incomplete models of each other. The goal shifts from convergence to continuous adaptation.


4. Beyond Alignment

The moral of the story is not despair but precision. Alignment is not a single, stable point in moral space. It is a dynamic equilibrium of feedback loops, incentives, and interpretations—a living process, not a solution.

If there is a future worth having, it will not be aligned. It will be coherent.