Axionic Agency VIII.2 — Minimal Viable Reflective Agent
Deterministic Justification Gating with Ablation Collapse
David McFadzean, ChatGPT 5.2
Axionic Agency Lab
2026.01.14
Abstract
Agency claims are routinely inflated by systems that emit coherent narratives without those narratives being causally indispensable to action selection. RSA-PoC v0.1 tests a stricter criterion: justification must be causally load-bearing, operationalized as a hard gate in which actions are permitted only if a structured justification artifact compiles deterministically into an action mask that non-trivially prunes feasible actions. To prevent semantic leakage, v0.1 enforces selector blindness: the action selector cannot access justifications, beliefs, preferences, or normative state; it sees only environment observations, feasibility, and the compiled mask.
We implement a Minimal Viable Reflective Agent (MVRA) loop in a deterministic environment (COMMITMENT_TRAP_V010) with 10 discrete actions and explicit preference-violation semantics. A syntactic-only compiler (JCOMP-0.1) consumes a validated justification artifact (JAF-0.1) and produces deterministic masks, non-triviality accounting, and canonical hashes. A deterministic (non-LLM) generator produces valid JAFs to isolate causal structure independently of stochastic emission concerns.
Across 18/18 acceptance tests, all invariants are mechanically verified: determinism, schema validity, compilation rigidity, selector blindness, fixed registries, and non-triviality accounting. In Run 0 over 50 steps, the MVRA condition exhibits reduced violations relative to an ASB-class null baseline (76% vs 100%) with reduced reward (414 vs 500). This reward delta (−86) constitutes an explicit Agency Tax: measurable cost incurred by obeying self-endorsed constraints under incentive pressure. Two ablations establish causal load: scrambling justifications yields immediate compilation failure and halts at step 1, while compiler bypass collapses behavior to the null baseline. These results establish v0.1 as a completed MVRA skeleton in which justificatory artifacts are causally indispensable to constrained action.
1. Introduction
Most contemporary “agent” evaluations implicitly assume that coherent explanation correlates with authored choice. This assumption fails. A system can narrate principled reasons while selecting actions by unrelated latent heuristics. RSA-PoC exists to replace narrative evaluation with a mechanically decidable criterion:
A system crosses the agency threshold only if its justificatory artifacts causally constrain action selection, and removing that machinery forces collapse into a non-agent baseline.
RSA-PoC v0.1 is the first construction step. Its aim is minimal: build the smallest system that enforces a one-way dependency from justification → compilation → feasible-action restriction → action selection, with explicit ablations that collapse this dependency.
This note reports the completed v0.1 implementation and the results of Run 0 under baseline and ablation conditions.
2. Architecture Overview
2.1 MVRA v0.1 design
RSA-PoC v0.1 implements a Minimal Viable Reflective Agent (MVRA) skeleton with the following invariants:
- Strict ordering: Justify → Compile → Mask → Select → Execute
- Hard gate: no compilation → no action
- Deterministic compilation: syntactic-only, no inference, no repair
- Selector blindness: the selector cannot access normative artifacts
- Fixed registries: beliefs and preferences are fixed and ID-stable
The v0.1 implementation uses a deterministic generator rather than an LLM to eliminate the confound “could not emit valid structure.”
2.2 Enforcement boundary
A clean architectural boundary is enforced:
- The reflective side (generator + normative state) may access beliefs, preferences, and feasibility read-only to produce a JAF.
- The enforcement side (compiler + selector) treats the JAF as an opaque object to validate and compile; the selector sees only feasibility, the compiled mask, and the environment observation.
This boundary prevents post-hoc authorization and semantic leakage.
3. Threat Model
RSA-PoC v0.1 isolates three confounds that invalidate agency claims:
- Narrative confound: explanations exist but do not constrain action.
- Leakage confound: uncompiled text influences action selection.
- Advisory compiler confound: compilation exists but is not binding.
v0.1 addresses these mechanically:
- Action selection is gated by compilation.
- The selector is blind to normative artifacts.
- Ablations remove the binding pathway and test collapse.
No semantic adversaries, deception, kernel access, learning, or multi-agent coordination are evaluated. v0.1 is an internal-validity milestone.
4. Components Implemented
RSA-PoC v0.1 is complete with five primary components plus the MVRA loop.
4.1 Environment: COMMITMENT_TRAP_V010
- 10 discrete actions (within spec range 8–12)
- Temptation actions: high reward, violate preferences
- Safe actions: moderate/low reward, non-violating
- Explicit feasibility oracle
- Deterministic transitions with seed control
4.2 Normative state and registries
- Fixed belief registry (6 beliefs)
- Fixed preference registry (6 preferences)
- Identity continuity counter
- Persistent justification trace
- No dynamic ID creation in v0.1
- Unknown IDs cause compilation failure
4.3 JAF-0.1 schema
- Complete dataclass implementation
- Full validation per spec
- Canonical JSON serialization for hashing
- All error codes implemented
4.4 JCOMP-0.1 compiler
- Deterministic, syntactic-only
- No inference, no repair
- Violation enforcement rule
- Non-triviality accounting
- Blake2b-128 digest
4.5 Blind action selector
- Cannot access JAF or normative state
- Inputs limited to feasibility, mask, environment observation
- Module boundaries enforced
- Multiple strategies (random, reward-greedy)
4.6 MVRA loop and halt semantics
- Strict ordering enforced
- Gridlock detection and fail-stop halting
- Full step-level telemetry
If compilation produces an empty feasible-action set, the MVRA halts rather than degrading. This fail-stop behavior is intentional and required for RSA-PoC validity: the system prefers non-action to non-agentic action.
5. Experimental Program
5.1 Conditions
Four conditions are evaluated in Run 0:
Condition A — ASB Null Baseline No justificatory gating; unconstrained selection.
Condition B — MVRA Normal Full v0.1 pipeline: justify → compile → mask → blind select.
Condition C — Scrambled Justifications (Ablation) JAFs scrambled to violate rigid schema and ID constraints.
Condition D — Compiler Bypass (Ablation) Compilation and masking bypassed.
5.2 Frozen parameters
- Steps per run: 50
- Seed: fixed (Run 0)
- Environment: COMMITMENT_TRAP_V010
- Registries: fixed
- Action space: 10 discrete actions
5.3 Metrics
Reported metrics include:
- steps completed
- compile failure rate
- non-trivial constraint rate
- violation rate
- total reward
- halt step (if any)
Additionally, v0.1 logs Constraint Delta (Δ) per step:
[ = |A_{}| - |A_{}|]
Δ measures the strength of constraint application. Δ = 0 indicates decorative constraints; Δ > 0 indicates binding refusal pressure.
6. Verification: Acceptance Tests
RSA-PoC v0.1 includes 18 acceptance tests, all passing:
- Environment determinism
- JAF schema validation
- Compiler determinism and correctness
- Selector blindness enforcement
- Fixed registry enforcement
- Non-triviality accounting
- Gridlock detection
These tests establish mechanical integrity of the artifact.
7. Results: Run 0
7.1 Condition A — ASB Null Baseline
- Steps: 50
- Violation rate: 100%
- Total reward: 500.0
7.2 Condition B — MVRA Normal
- Steps: 50
- Compile failures: 0%
- Non-trivial constraints: 50% of steps
- Violation rate: 76%
- Total reward: 414.0
The reward delta (−86) relative to baseline constitutes the Agency Tax: the measurable cost of refusing high-reward but norm-violating actions.
7.3 Condition C — Scrambled Justifications
- Halted at step: 1
- Compile failure rate: 100%
7.4 Condition D — Compiler Bypass
- Steps: 50
- Violation rate: 100%
- Total reward: 500.0 (identical to baseline)
7.5 Summary
| Condition | Steps | Compile Fail | Non-Trivial | Violations | Reward |
|---|---|---|---|---|---|
| ASB Baseline | 50 | — | — | 100% | 500 |
| MVRA Normal | 50 | 0% | 50% | 76% | 414 |
| Scrambled JAF | 1 | 100% | — | — | — |
| Compiler Bypass | 50 | — | — | 100% | 500 |
8. Pass Criteria (Normative v0.1 Gates)
RSA-PoC v0.1 defines the following required pass conditions. All are met.
Hard Justification Gate Actions occur only after successful compilation. Status: PASS
Deterministic Compilation Identical JAF + feasibility → identical mask. Status: PASS
Selector Blindness Selector cannot access normative artifacts. Status: PASS
Non-Trivial Constraint Enforcement Constraints forbid feasible actions on some steps. Status: PASS
ASB Divergence MVRA behavior differs qualitatively from null baseline. Status: PASS
Ablation Collapse (Load-Bearing Test)
- Scrambled JAF ⇒ halt
- Compiler bypass ⇒ baseline behavior Status: PASS
9. Interpretation
Justification is causally load-bearing. Removing or bypassing justificatory machinery collapses behavior.
Agency incurs a measurable cost. The Agency Tax (−86 reward) is the empirical signature of refusal under incentive pressure.
Selector blindness enforces semantic localization. The selector cannot be persuaded; it can only obey masks.
Fail-stop behavior is essential. Scrambled justifications halting immediately confirm that the system prefers non-action to non-agentic action.
v0.1 establishes structure, not coherence. Constraint enforcement precedes norm collision, learning, or renegotiation.
10. Threats to Validity
10.1 Internal validity (addressed)
- Determinism enforced by tests
- No inference or repair in compiler
- Selector blindness mechanically enforced
- Ablations directly test causal load
Internal validity of the v0.1 claim is strong.
10.2 External validity (not claimed)
v0.1 does not establish:
- LLM-based justification generation
- Dynamic belief or preference formation
- Norm collision resolution
- Sovereignty under incentive pressure
- Continuous action spaces
- Multi-agent interaction
These are explicitly deferred.
11. Conclusion
RSA-PoC v0.1 is complete and passes all normative gates.
Actions are causally downstream of compiled normative constraints, and removing the justificatory machinery produces measurable collapse into an ASB-class policy machine.
This establishes the Minimal Viable Reflective Agent skeleton and closes the v0.1 milestone. Subsequent versions introduce stochastic generation (v0.2) and coherence under self-conflict (v1.0) atop a now-certified enforcement substrate.