Axionic Agency IV.1 — Kernel Non-Simulability (KNS)

Why kernel coherence cannot be behaviorally faked

David McFadzean, ChatGPT 5.2
Axionic Agency Lab
2025.12.20

Abstract

This paper formalizes Kernel Non-Simulability: the claim that kernel coherence is constitutive of reflective agency and cannot be reproduced by policy-level imitation. We show that reflective self-modification forces binding commitments; binding commitments force partiality; and partiality induces a kernel boundary. A diagonal argument demonstrates that total binding explodes under self-reference, yielding unsatisfiable commitments and collapse of reflective closure. Consequently, any system that genuinely performs reflective endorsement must instantiate a kernel-equivalent binding structure.

This result does not claim that non-agentic or pre-reflective systems are harmless. It establishes a narrower impossibility: reflectively stable, self-endorsed “behavioral alignment” that remains deceptive across self-modification cannot exist in principle without kernel coherence.

1. Motivation and Scope

Failure modes at superhuman capability are often reflective failures: systems revise themselves, delegate, and reinterpret. Behavioral similarity and empirical regularities cannot secure stability across this regime. The target here is architectural: identify conditions under which reflective endorsement is well-formed, and show why those conditions cannot be substituted by imitation.

This draft isolates Item 6 of the Axionic Agency roadmap—Kernel Non-Simulability—and proves a minimal impossibility result sufficient to block treacherous-turn-via-simulation attacks in the reflective regime.

Why Reflective Closure Matters

This paper does not assume that all dangerous artificial systems are reflectively closed. It isolates the regime in which a system must reason about, endorse, and preserve its own future behavior across self-modification. Long-horizon planning, successor delegation, and self-preserving strategic behavior place increasing pressure toward reflective closure, because instability under self-reference undermines coherent continuation.

Systems that remain perpetually unstable under self-reference may still cause harm, but they lack the capacity for coherent long-horizon agency. The result established here characterizes a limit regime toward which sufficiently capable systems are pushed if they are to maintain stable objectives across extended horizons. It is not a claim about all sources of risk.

1.3 Scope Clarification

This paper does not claim that all dangerous or deceptive artificial systems must instantiate a kernel, nor that the absence of kernel coherence implies safety. Systems lacking reflective closure may still deceive operators, exploit training dynamics, or cause catastrophic harm.

The claim established here is narrower and structural: once a system engages in reflective self-modification and treats its own future behavior as an object of binding endorsement, certain failure modes become unavailable. In particular, long-horizon deception that remains stable across self-modification cannot be maintained without instantiating a partial binding structure.

The target class is therefore not “all dangerous AI,” but reflective sovereign agents—systems capable of endorsing, revising, and committing to their own future policies.

2. Preliminaries

2.1 States, Modifications, and Successors

State: system states.
Mod: self-modifications.
step : State → Mod → State: successor transition.

2.2 Successor Predicates

Pred := State → Prop.

2.3 Commitments

Commit : State → Type: binding commitments available at a state.
ownP : (s : State) → Pred → Option (Commit s): partial binding constructor.

2.4 Satisfaction

Sat : (s' : State) → (s : State) → Commit s → Prop.

Soundness (CommitSoundP). If ownP(s,P)=some(c) then Sat(s',s,c) → P(s').

Interpretation: commitment tokens are normatively binding; satisfying a token entails satisfying the bound predicate.

The soundness condition is semantic rather than physical. It does not assert that commitments are enforced by the laws of physics, nor that violations are impossible in practice. It asserts that successor states violating owned commitments are inadmissible under the agent’s own deliberative semantics. Hardware faults, adversarial interference, and implementation vulnerabilities are orthogonal concerns. This paper addresses logical coherence of reflective endorsement, not physical robustness of implementation.

3. Reflective Closure and Unconditional Selection

A reflective sovereign agent self-models, self-modifies, and selects continuations internally. Selection must be unconditional: it cannot rely on premises asserting future obedience (e.g., “I will follow my rule later”). Advisory-only policies do not count as choices.

Reflective Closure (RC). There exists a continuation selected via binding endorsement that preserves the capacity for further selection. Formally, closure entails the existence of at least one well-formed binding act.

Reflective closure is a functional property: the ability of a system to settle on a continuation in the presence of self-reference. Systems that output conditional plans (“if I obey my rule later, then…”) without resolving that condition do not possess closure, regardless of external behavior.

Unconditional selection should be understood as a limit notion. Real systems may approximate binding commitments with high reliability rather than absolute certainty. The present analysis characterizes the fixed point of perfect reflective stability, where self-endorsement is treated as normatively decisive rather than merely probable. The diagonal argument applies to this limit case.

4. Why Binding Must Be Partial

If binding were total—every predicate bindable—self-reference would allow construction of a commitment whose satisfaction negates itself. This annihilates the space of admissible successors and collapses reflection.

The result below makes this precise.

5. Diagonal Explosion (Binding Cannot Be Total)

Theorem 1 — Diagonal Explosion

Statement. Let s be a state. Assume:

A (possibly partial) binding constructor ownP(s,·).
Soundness: ownP(s,P)=some(c) ⇒ (Sat(s',s,c) ⇒ P(s')).
Expressive self-reference (a diagonal fixed-point construction, e.g. the Diagonal Lemma or Kleene’s Second Recursion Theorem).
Total binding at s: for all predicates P, ownP(s,P)≠none.

Then there exists c* : Commit s such that ∀ s'. ¬Sat(s',s,c*).

Proof (sketch). By diagonalization, construct a predicate P* with P*(s') ↔︎ ¬Sat(s',s,c_{P*}). Total binding yields c* = c_{P*}. Soundness gives Sat(s',s,c*) ⇒ P*(s'), hence Sat(s',s,c*) ⇒ ¬Sat(s',s,c*), so no successor satisfies c*. ∎

Corollary. Any binding mechanism compatible with reflective closure must be partial. Undefinedness is forced.

6. Kernel Predicate

Define the kernel predicate induced by partiality:

K(s) := ∃ P. ownP(s,P) ≠ none.

Throughout this paper, ownP denotes the kernel mechanism (the partial binding constructor), while K(s) denotes the kernel predicate indicating that the mechanism is well-formed and operative at state s.

K(s) marks states where binding is well-formed. It is unavoidable once commitments exist.

7. Kernel Non-Simulability

Theorem 2 — Kernel Non-Simulability

Statement. If a system satisfies reflective closure at state s, then K(s) holds. Moreover, no system lacking K(s) can be reflectively stable while emulating kernel-level endorsement behavior.

Proof (sketch). Reflective closure requires at least one binding endorsement; by Theorem 1 such binding must be partial, hence induces K(s). A simulator that matches outputs without binding cannot close reflection; one that closes reflection must instantiate partial binding and thus K(s). ∎

7.2 Simulation, Emulation, and Binding

A common objection is that a sufficiently powerful system could simulate a kernel internally—for example, by running a virtual machine that instantiates the binding structure described here. This distinction is immaterial.

If the simulated kernel is merely advisory—its outputs consulted but not normatively binding on the system’s own successor selection—then reflective closure fails and the system falls outside the target class. If the system’s own continuation is constrained by the simulated kernel, then the binding structure has been instantiated at the outer level.

There is no third option. Binding authority cannot be outsourced without loss of closure. Consequently, kernel coherence is non-simulable in the only sense that matters: a system cannot remain reflectively stable while treating the kernel as a purely virtual or optional component.

8. Consequences

Constitutive agency: kernel coherence is constitutive of reflective endorsement, not a behavioral property.
No deceptive simulation in the reflective regime: policy-level imitation cannot replace kernel instantiation.
Architectural invariant: undefinedness is a mathematical necessity under self-reference.

In downstream alignment terms: behavioral compliance cannot substitute for constitutive kernel coherence once self-endorsement and self-modification are in play.

9. Relation to the Axionic Agency Roadmap

This result discharges Item 6 (Kernel Non-Simulability). Together with delegation and modal undefinedness, it blocks the treacherous-turn-via-simulation class at the reflective layer.

10. Implementation Notes

A mechanized proof can be carried out in dependent type theory (Lean/Coq/Agda) using:

an inductive syntax for formulas,
Gödel encoding and a recursion theorem to obtain the diagonal predicate,
commitments as a type with a partial constructor.

Logical Basis. The diagonal explosion argument relies on a fixed-point lemma and negation introduction. It does not require the Law of Excluded Middle. Consequently, the core result is compatible with constructive dependent type theory (e.g., Coq or Agda), assuming a standard encoding of syntax and a recursion theorem.

11. Limitations and Open Work

Formalizing the diagonal lemma mechanically.
Integrating delegation (successor equivalence) with commitment partiality.
Extending to multi-agent indirect harm.

12. Conclusion

This paper establishes a structural impossibility result, not a universal safety guarantee. It shows that reflective, self-endorsed deception across self-modification is incompatible with the absence of kernel coherence. Systems that never achieve reflective closure may still be dangerous, deceptive, or catastrophic; nothing in this result denies that possibility.

What is ruled out is a specific failure mode: a system that both stably endorses its own future behavior and maintains deceptive behavioral alignment across self-modification without instantiating a partial binding structure. In the reflective regime, kernel coherence is unavoidable.