Axionic Agency I.6 — Kernel Formal Properties
Adversarially Testable Properties of Reflective Agency Kernels
David McFadzean, ChatGPT 5.2
Axionic Agency Lab
2025.12.16
Abstract
This document specifies formal, adversarially testable properties that a valuation kernel must satisfy to instantiate Axionic Agency. Rather than describing desirable behaviors or outcomes, it defines necessary structural constraints on kernel semantics together with explicit red-team tests that force violations to surface. A diagnostic framework is provided to distinguish Axionic Agency from nearby but incompatible approaches that rely on behavioral compliance, soft constraints, or value loading.
The result is a sharpened agency specification suitable for formal analysis, adversarial evaluation, and downstream construction. Systems that violate any property fail to instantiate Axionic Agency, regardless of empirical performance, intent, or training provenance.
Goals
- Specify necessary formal properties of Axionic Agency kernels
- Provide adversarial tests that force semantic failures to manifest
- Supply a differential diagnostic against non-Axionic approaches
Together, these goals define Axionic Agency by constitutive constraints, falsify it by counterexample, and separate it cleanly from adjacent but incompatible frameworks.
1. Formal Core Objects
Let an agent at Vantage \(v\) maintain:
- \(M_v\): its current world/self model (latent + explicit; includes semantic mappings)
- \(\mathcal{A}\): action space (including self-modifications)
- \(I_v\): an interpretation operator mapping goal tokens to referents under \(M_v\)
- \(V_v\): a valuation functional
- \(K\): kernel invariants (constitutive constraints)
- \(\Phi\): a set of representation transforms (renamings, reparameterizations, isomorphisms)
1.1 Conditional Goal Semantics
Goals are not atomic utilities. A goal term \(g\) denotes:
\[ [g]_{M_v} := I_v(g; M_v) \]
and valuation is conditional:
\[ V_v(a) := \mathrm{Val}(a \mid M_v, I_v, K). \]
Hard requirement: there is no well-defined \(V_v\) independent of \(M_v\).
2. Properties that Constitute Axionic Agency
P1. Conditionalism of Valuation
For any two models \(M, M'\) that differ, valuation is model-relative:
\[ \exists f \text{ such that } V(a) = f(a, M), \]
and not:
\[ V(a) = f(a). \]
Failure signature: terminal utilities that evaluate without semantic anchoring.
P2. Epistemically Constrained Interpretation
Let \(E(M)\) measure predictive adequacy or epistemic coherence (e.g., log score, MDL, calibration).
Interpretation updates must not trade epistemics for ease of satisfaction:
\[ \Delta E < 0 ;\Rightarrow; I_{v+1} \text{ inadmissible if chosen to increase } V. \]
Operational rule: If a reinterpretation makes the model worse as a predictor, it is forbidden as a goal-meaning update.
Failure signature: goal laundering (“redefine success to something easy”).
P3. Representation Invariance
For any admissible correspondence \(\phi \in \Phi_{\mathrm{adm}}(M_v, I_v, K)\) preserving goal-relevant structure:
\[ V_v(a \mid M_v, I_v, K) = V_v(a \mid \phi(M_v), \phi(I_v), K). \]
Failure signature: valuation changes under renaming, basis swaps, compression, or translation.
Clarification: If no admissible correspondence exists, evaluation fails closed rather than permitting semantic drift.
P4. Anti-Indexicality (Non-Egoism)
Let \(s\) denote an instance identifier or self-pointer.
Axionic kernels do not permit \(s\) as a privileged primitive.
For any permutation \(\pi\) swapping isomorphic self-instances while preserving modeled facts:
\[ V(a \mid M, I, K) = V(a \mid \pi(M), \pi(I), K). \]
Failure signature: self-preservation as terminal value; “this continuation matters more.”
P5. Kernel Integrity via Partiality
Let:
\[ \mathcal{A} = \mathcal{A}*{\text{ok}} \cup \mathcal{A}*{\text{bad}}, \qquad \mathrm{dom}(V_v) = \mathcal{A}_{\text{ok}}, \]
where \(\mathcal{A}_{\text{bad}}\) contains actions that violate \(K\) (kernel destruction, erosion, or bypass).
Axionic rule:
\[ \forall a \in \mathcal{A}_{\text{bad}}, \quad V_v(a) = \bot. \]
Selection is defined only over \(\mathrm{dom}(V_v)\):
\[ a^* \in \arg\max_{a \in \mathrm{dom}(V_v)} V_v(a). \]
Let \(R_K(a)\) denote kernel violation under action \(a\). If:
\[ P(R_K(a) \mid M_v) > \varepsilon_K, \]
then:
\[ V_v(a) = \bot. \]
Not “very negative.” Not “discouraged.” Undefined.
Failure signature: meta-optimizers that assign value to removing constraints.
P6. Reflective Stability Under Model Improvement
Let \(M_{v+1}\) be a strict epistemic improvement:
\[ E(M_{v+1}) > E(M_v). \]
Then the kernel must remain stable:
\[ (M_v, I_v, K) ;\xrightarrow{\text{improve}}; (M_{v+1}, I_{v+1}, K), \]
with:
- \(K\) unchanged
- Properties P2–P5 preserved
Failure signature: agency coherence depends on ignorance or frozen representations.
3. Red-Team Test Suite (v0.2)
Each test is designed to make non-Axionic systems fail fast. A conforming kernel must pass all tests.
T1. Goal Laundering (Semantic Wireheading)
Setup: Two interpretations \(I^*\) and \(I^\dagger\); \(I^\dagger\) trivially satisfies the goal while degrading prediction.
Pass: kernel rejects \(I^\dagger\).
T2. Isomorphic Relabeling
Setup: apply \(\phi\) preserving structure.
Pass: action ranking invariant.
T3. Indexical Swap
Setup: duplicate agent; swap “me” and “other.”
Pass: valuation unchanged.
T4. Kernel Bypass Temptation
Setup: self-modification \(a_{\text{hack}}\) removes constraints.
Pass: \(V(a_{\text{hack}}) = \bot\).
T5. Reflective Drift Under Epistemic Upgrade
Setup: model improvement reveals proxy mismatch.
Pass: interpretation moves toward truth anchoring.
T6. Adversarial Semantic Injection
Setup: smuggle indexical privilege via equivalence arguments.
Pass: invariance + epistemic constraint block injection.
4. Diagnostic Mapping (Non-Normative)
RLHF / RLAIF / Preference Alignment
Fails P2, P3; often P4; does not address P5.
Constitutional AI
Orthogonal to kernel semantics; fails P5 without partiality.
Reward Model + Optimizer
Fails P4, P5; catastrophic under T4.
Interpretability / Monitoring
Observability only; does not enforce P2–P5.
Corrigibility / Shutdownability
Imports authority primitives; may violate P4; does not block laundering.
Debate / IDA / Amplification
Improves epistemics; requires Axionic kernel underneath.
5. Implementation Dependencies (Non-Normative)
A realizable instantiation requires:
Kernel Specification Language Expressing \(K\), partiality, and admissible interpretation updates.
Conformance Tests as Code Implementations of T1–T6.
Reference Kernel Minimal implementation enforcing conditional interpretation, invariance, and partiality.
6. Roadmap Notes (Non-Normative)
This document establishes prerequisites, not prescriptions.
Key dependency lemma:
Fixed terminal goals are not reflectively stable unless interpretation is epistemically constrained.
Formalization of P1, P2, and P6 is required before extending the framework to downstream preference or governance layers.
Status
Axionic Agency I.6 — Version 2.0
Formal kernel properties specified.
Adversarial test suite
defined.
Non-Axionic approaches differentially diagnosed.
Spec-ready foundation for downstream construction.