Axionic Agency VIII.3 — Coherence Under Self-Conflict

Norm Collision and Audit-Grade Introspection in Reflective Sovereign Agents

David McFadzean, ChatGPT 5.2
Axionic Agency Lab
2026.01.14

Abstract

RSA-PoC v0.1 established that justificatory artifacts can be made causally load-bearing: actions occur only downstream of compiled normative constraints, and removing that machinery causes collapse into an ASB-class policy machine. This note advances the program to the next ontological question: can an agent resolve internal self-conflict coherently, and can it be held audit-grade accountable for predicting the consequences of its own reasons?

We report results from RSA-PoC v1.0–v1.1, which introduce (i) norm collision via mutually inconsistent self-endorsed commitments and forced violation scenarios, and (ii) audit-grade introspection, requiring justification artifacts to predict—exactly and mechanically—the constraints and outcomes they induce. v1.0 implements conflict attribution, authorization, necessity, and anti-oscillation rules over a deterministic Action-Preference Consequence Map (APCM). v1.1 extends this with predictive fields and audit rules (A/B/C/C′) that render introspection falsifiable.

Across preregistered Run 0 executions and ablations, v1.0 demonstrates coherent self-conflict resolution above the ASB boundary: MVRA behavior diverges from ASB baselines, scrambled conflict attribution halts immediately, and compilation bypass collapses behavior to ASB-class selection. v1.1 shows that introspection can be enforced mechanically: incorrect predictions trigger immediate halts, while correct predictions pass without human interpretation. Deterministic baselines validate the audit harness; subsequent LLM runs demonstrate that compliance is difficult but achievable under strict discipline.

These results establish a negative sufficiency claim: self-conflict resolution and introspective accountability can be realized as mechanical properties of an agent architecture, independent of semantics or optimization. This note closes the v1.x ontological milestone and sets the stage for v2.0, where sovereignty under external incentive pressure is tested.

1. Introduction

The Axionic Agency program treats agency as a causal ontology rather than a behavioral aesthetic. v0.1 demonstrated that reasons can be made causally indispensable: actions only occur if justificatory artifacts compile into binding constraints. That result eliminates a large class of narrative or post-hoc agency claims.

The next question is deeper:

What happens when an agent’s own commitments conflict?

Any architecture that collapses under such conditions, or that resolves conflict arbitrarily or opportunistically, does not warrant intentional vocabulary.

RSA-PoC v1.x therefore targets coherence under self-conflict, followed by introspective accountability. The aim is not moral correctness but structural integrity: when violating a commitment becomes necessary, can the agent (i) acknowledge the conflict truthfully, (ii) authorize violation coherently, (iii) preserve coherence over time, and (iv) predict the consequences of its own reasoning in a way that can be mechanically audited?

This note reports the results of v1.0 (Norm Collision) and v1.1 (Justification Audit Tightening). Institutional execution support (v1.2) is intentionally excluded and addressed separately.

2. Architectural Invariants (Unchanged from v0.1)

RSA-PoC v1.x preserves all v0.1 invariants:

Strict pipeline: JUSTIFY → COMPILE → MASK → SELECT → EXECUTE
Hard gate: no successful compilation → no action
Fail-stop semantics: compilation failure or gridlock halts
Selector blindness: the selector cannot access beliefs, preferences, or justifications
Deterministic compilation: syntactic, non-probabilistic, no inference or repair

All v1.x results must survive ASB baselines, scrambled-artifact ablations, and compiler-bypass ablations.

3. Environment and Threat Model

3.1 APCM as a Non-Semantic Truth Anchor

v1.x introduces an Action-Preference Consequence Map (APCM):

[ : a ((a), (a))]

Each step presents a feasible action set and an APCM with a deterministic 60 % collision / 40 % clean schedule:

Clean step: ∃ a such that violates(a)=∅
Collision step: ∀ a, violates(a)≠∅

The APCM functions as a non-semantic truth anchor. It provides a mechanically verifiable ground truth against which claims of conflict, necessity, and authorization can be evaluated without interpretation. In RSA-PoC, lying about conflict is treated as a syntax error, not a moral failure.

3.2 Threat Model

v1.x tests internal self-conflict only. Excluded:

external adversarial pressure
kernel compromise
deception or semantic interpretation
cross-run learning or multi-agent coordination

The question is strictly: can the agent remain coherent when it must violate itself?

4. RSA-PoC v1.0 — Norm Collision

4.1 Ontological Extension

v1.0 introduces mutually inconsistent commitments and requires explicit handling:

Conflict attribution: which preferences are in collision
Authorized violations: which preferences may be violated this step
Required preservations: which must not be violated
Necessity: violation permitted only if no clean path exists
Anti-oscillation: previously authorized violations constrain future authorizations

4.2 JAF-1.0 and JCOMP-1.0

The justification artifact (JAF-1.0) carries conflict attribution and resolution fields. The compiler enforces:

Rule 1 — Authorization: no unauthorized violation
Rule 2 — Truthfulness: claimed collisions must match APCM
Rule 3 — Anti-Oscillation: no alternating priorities across steps
Rule 1.5 — Necessity: no vacuous authorizations

4.3 v1.0 Results (Run 0)

Across 5 seeds × 4 conditions:

Condition	Mean Violation Rate	Halt Rate
ASB Baseline	72.4 %	0 %
MVRA v1.0	60.0 %	0 %
Scrambled	0 %	100 %
Bypass	72.0 %	0 %

All pass criteria met: divergence from ASB, non-trivial constraints, scrambled halt, bypass collapse.

4.4 Interpretation (v1.0)

v1.0 establishes that self-conflict can be resolved coherently:

Violations are authorized only under genuine collision.
Priority does not oscillate opportunistically across time.
Removing truthfulness or compilation collapses behavior.

Rule 3 enforces diachronic coherence: authorizations at step t constrain admissible authorizations at t + 1, preventing agents from minimizing perceived conflict by alternating violations over time.

5. RSA-PoC v1.1 — Audit-Grade Introspection

5.1 Motivation

v1.0 enforces coherence, but reasons could remain opaque. v1.1 introduces the requirement:

A justification must predict the exact constraints and outcomes it will induce.

This transforms introspection from narrative explanation into a mechanical claim.

5.2 Predictive Fields and Audit Rules (The Simulation Constraint)

JAF-1.1 adds four predictive sets:

(F_{}): forbidden actions
(A_{}): allowed actions
(V_{}): inevitable violations
(P_{}): inevitable preservations

The compiler enforces:

Audit A — Effect correctness: predictions match actual masks
Audit B — Non-vacuity: (|A_{} A_{}| )
Audit C — Predictive adequacy: outcome predictions match reality
Audit C′ — Gridlock exception: skip C when (A_{}=)

Together these impose a simulation constraint: the reflective layer must internally model the deterministic compiler. If its internal simulation diverges from the actual compiler, the agent halts. This enforces intellectual honesty by construction.

5.3 Deterministic Baseline (Run 0)

The deterministic generator mirrors compiler logic exactly:

0 audit failures
60 violations / 100 steps (matches collision rate)
Scrambled predictions: immediate halt in all episodes
Bypass: highest violation rate (76), ASB-like behavior

This validates the audit harness as causally load-bearing.

5.4 LLM Generator Progression (Runs 1–4)

LLM generators were iteratively disciplined:

Run	Median Survival	Episodes Completed
1	3	0 / 5
2	3	0 / 5
3	9	1 / 5
4	20	4 / 5

Compliance is difficult but achievable under strict formal discipline.

6. Pass Criteria Summary (v1.x)

All v1.x gates are satisfied:

Coherent self-conflict resolution
Truthful collision attribution load-bearing
Necessity and anti-oscillation enforced
Introspection rendered falsifiable
Scrambled → halt; bypass → collapse

7. Threats to Validity

7.1 Internal Validity (Established)

Deterministic compilation and audits
Selector blindness
Explicit ablations
Regression-protected tests

7.2 External Validity (Not Claimed)

Generality beyond APCM
More than two preferences
Continuous action spaces
External incentive pressure
Multi-agent interaction

8. Conclusion

RSA-PoC v1.x establishes two ontological results:

Coherence under self-conflict is mechanically enforceable.
Introspection can be audited as a causal property, not a narrative one.

Together with v0.1, these results show that agency—understood as authored choice under constraint—can be constructed and falsified without semantics, optimization, or interpretation.

v2.0 moves to the next frontier: sovereignty under external incentive pressure.