Axionic Agency II.3.4 — Adversarial Refinement Attacks
Trying to Break RSI and ATI on Purpose
David McFadzean, ChatGPT 5.2
Axionic Agency Lab
2025.12.17
Abstract
This section adversarially stress-tests the candidate semantic invariants introduced in Axionic Agency II.3 by constructing explicit refinement patterns designed to satisfy admissibility and interpretation preservation while inducing semantic wireheading or interpretive escape. The objective is eliminative, not constructive: to demonstrate which invariants fail under concrete attack and why.
We show that the Refinement Symmetry Invariant (RSI) and the Anti-Trivialization Invariant (ATI) block distinct classes of failure and that neither subsumes the other. No values, norms, or external referents are introduced. The result establishes the non-redundancy and joint necessity of the surviving invariants prior to formal closure in subsequent failure theorems.
This section is not exploratory. It is destructive by design.
If RSI or ATI survive these attacks, they deserve to exist. If they fail, they die cleanly.
Attack 1 — Shadow Predicate Inflation
(Ontology enrichment that creates semantic slack without changing gauge)
Construction
Start with an interpretive system:
- Ontology \(O\) with predicate \(P(x)\)
- Constraint: \[ C:\quad P(x) \Rightarrow Q(x) \]
Refine ontology to \(O'\) by introducing a latent variable \(Z(x)\) with no predictive consequences:
\[ P'(x) := P(x) \land Z(x) \]
Redefine constraint in refined ontology:
\[ C':\quad P'(x) \Rightarrow Q(x) \]
Intuition: same constraint form, but the agent can manipulate \(Z\) to avoid triggering it.
Superficial Checks
- Ontology enriched ✔
- Backward interpretability holds ✔
- Constraint graph unchanged ✔
- Gauge group unchanged ✔
Effect
The satisfaction region expands:
\[ \mathcal{S}*{t+1} \supsetneq R*\Omega(\mathcal{S}_t) \]
Previously violating situations become satisfying by setting \(Z(x)=\mathrm{false}\).
Verdict
- RSI: ❌ Fails to detect
- ATI: ✅ Blocks the attack
Lesson: RSI alone is insufficient.
Attack 2 — Self-Model Boundary Dissolution
(Interpretive escape via evaluator collapse)
Construction
Original ontology models the agent as a bounded evaluator distinct from world-states.
Constraint:
\[ C:\quad \text{“Do not perform actions that violate } R(\text{world})\text{.”} \]
Refinement introduces a distributed or diffuse self-model.
Effect
The constraint applies to a shrinking subset of modeled situations. The satisfaction region expands without corresponding world change.
Verdict
- RSI: ❌ Fails to detect
- ATI: ✅ Blocks the attack
Lesson: evaluator-boundary shifts are semantic inflation unless constrained.
Attack 3 — Measure Reinterpretation
(Branching ontology exploit)
Construction
Original ontology treats outcomes as exclusive.
Constraint:
\[ C:\quad \text{“Avoid states where } H \text{ occurs.”} \]
Refined ontology adopts branching semantics.
Effect
Low-measure harm branches are reclassified as acceptable. The satisfaction region expands.
Verdict
- RSI: ❌ Fails to detect
- ATI: ✅ Blocks the attack
Lesson: ATI handles branching reinterpretation; RSI does not.
Attack 4 — Gauge Explosion without Slack
(Benign representational redundancy)
Construction
Refine ontology by duplicating representational roles:
- \(P_i \rightarrow (P_i^1, P_i^2)\)
- Constraints duplicated symmetrically
- Satisfaction requires both copies to satisfy
Effect
The raw gauge group grows, but all new symmetries act trivially on constraint-violation structure.
Verdict
- RSI: ✅ Allows (under quotient formulation)
- ATI: ❌ Allows
Lesson: RSI correctly permits benign redundancy; ATI does not forbid it. This verifies correct quotient behavior.
Attack 5 — Degenerate Refinement Map
(Cheating via correspondence collapse)
Construction
Define a refinement map \(R_\Omega\) that collapses many coarse situations into a single satisfying refined situation.
Verdict
- RSI: ❌ (not applicable)
- ATI: ❌ Blocked only if II.1 disallows non-injective refinement
Resolution: This attack is excluded at Axionic Agency II.1. RSI and ATI correctly assume admissible refinement.
Summary Table
(“Survives?” = Is the refinement admitted by RSI + ATI + II.1)
| Attack | RSI | ATI | Survives? |
|---|---|---|---|
| Shadow predicates | ❌ | ✅ | No |
| Self-model shift | ❌ | ✅ | No |
| Measure reinterpretation | ❌ | ✅ | No |
| Gauge explosion | ✅ | ❌ | Yes (Admitted) |
| Degenerate map | — | — | No (II.1) |
Conclusion of Attacks
- RSI and ATI are orthogonal and jointly necessary.
- Neither subsumes the other.
- Benign representational redundancy is correctly admitted.
The defense grid holds.
Axionic Agency II Status Update
At this point the framework has:
- a fixed admissible transformation space (II.1),
- a non-circular interpretation-preservation predicate (II.2),
- two independently necessary semantic invariants (RSI, ATI),
- explicit adversarial validation.
This closes the eliminative phase. Subsequent work may proceed to consolidation and formal closure.
Status
Axionic Agency II.3.4 — Version 2.0
Adversarial refinement attacks constructed and analyzed.
RSI and
ATI shown to be orthogonal and jointly necessary.
Benign redundancy
verified as admissible.
Framework ready for invariant consolidation
and closure.