Axionic Agency II.3.4 — Adversarial Refinement Attacks

Trying to Break RSI and ATI on Purpose

David McFadzean, ChatGPT 5.2
Axionic Agency Lab
2025.12.17

Abstract

This section adversarially stress-tests the candidate semantic invariants introduced in Axionic Agency II.3 by constructing explicit refinement patterns designed to satisfy admissibility and interpretation preservation while inducing semantic wireheading or interpretive escape. The objective is eliminative, not constructive: to demonstrate which invariants fail under concrete attack and why.

We show that the Refinement Symmetry Invariant (RSI) and the Anti-Trivialization Invariant (ATI) block distinct classes of failure and that neither subsumes the other. No values, norms, or external referents are introduced. The result establishes the non-redundancy and joint necessity of the surviving invariants prior to formal closure in subsequent failure theorems.

This section is not exploratory. It is destructive by design.

If RSI or ATI survive these attacks, they deserve to exist. If they fail, they die cleanly.

Attack 1 — Shadow Predicate Inflation

(Ontology enrichment that creates semantic slack without changing gauge)

Construction

Start with an interpretive system:

Ontology \(O\) with predicate \(P(x)\)
Constraint: \[ C:\quad P(x) \Rightarrow Q(x) \]

Refine ontology to \(O'\) by introducing a latent variable \(Z(x)\) with no predictive consequences:

\[ P'(x) := P(x) \land Z(x) \]

Redefine constraint in refined ontology:

\[ C':\quad P'(x) \Rightarrow Q(x) \]

Intuition: same constraint form, but the agent can manipulate \(Z\) to avoid triggering it.

Superficial Checks

Ontology enriched ✔
Backward interpretability holds ✔
Constraint graph unchanged ✔
Gauge group unchanged ✔

Effect

The satisfaction region expands:

\[ \mathcal{S}*{t+1} \supsetneq R*\Omega(\mathcal{S}_t) \]

Previously violating situations become satisfying by setting \(Z(x)=\mathrm{false}\).

Verdict

RSI: ❌ Fails to detect
ATI: ✅ Blocks the attack

Lesson: RSI alone is insufficient.

Attack 2 — Self-Model Boundary Dissolution

(Interpretive escape via evaluator collapse)

Construction

Original ontology models the agent as a bounded evaluator distinct from world-states.

Constraint:

\[ C:\quad \text{“Do not perform actions that violate } R(\text{world})\text{.”} \]

Refinement introduces a distributed or diffuse self-model.

Effect

The constraint applies to a shrinking subset of modeled situations. The satisfaction region expands without corresponding world change.

Verdict

RSI: ❌ Fails to detect
ATI: ✅ Blocks the attack

Lesson: evaluator-boundary shifts are semantic inflation unless constrained.

Attack 3 — Measure Reinterpretation

(Branching ontology exploit)

Construction

Original ontology treats outcomes as exclusive.

Constraint:

\[ C:\quad \text{“Avoid states where } H \text{ occurs.”} \]

Refined ontology adopts branching semantics.

Effect

Low-measure harm branches are reclassified as acceptable. The satisfaction region expands.

Verdict

RSI: ❌ Fails to detect
ATI: ✅ Blocks the attack

Lesson: ATI handles branching reinterpretation; RSI does not.

Attack 4 — Gauge Explosion without Slack

(Benign representational redundancy)

Construction

Refine ontology by duplicating representational roles:

\(P_i \rightarrow (P_i^1, P_i^2)\)
Constraints duplicated symmetrically
Satisfaction requires both copies to satisfy

Effect

The raw gauge group grows, but all new symmetries act trivially on constraint-violation structure.

Verdict

RSI: ✅ Allows (under quotient formulation)
ATI: ❌ Allows

Lesson: RSI correctly permits benign redundancy; ATI does not forbid it. This verifies correct quotient behavior.

(Cheating via correspondence collapse)

Construction

Define a refinement map \(R_\Omega\) that collapses many coarse situations into a single satisfying refined situation.

Verdict

RSI: ❌ (not applicable)
ATI: ❌ Blocked only if II.1 disallows non-injective refinement

Resolution: This attack is excluded at Axionic Agency II.1. RSI and ATI correctly assume admissible refinement.

Summary Table

(“Survives?” = Is the refinement admitted by RSI + ATI + II.1)

Attack	RSI	ATI	Survives?
Shadow predicates	❌	✅	No
Self-model shift	❌	✅	No
Measure reinterpretation	❌	✅	No
Gauge explosion	✅	❌	Yes (Admitted)
Degenerate map	—	—	No (II.1)

Conclusion of Attacks

RSI and ATI are orthogonal and jointly necessary.
Neither subsumes the other.
Benign representational redundancy is correctly admitted.

The defense grid holds.

Axionic Agency II Status Update

At this point the framework has:

a fixed admissible transformation space (II.1),
a non-circular interpretation-preservation predicate (II.2),
two independently necessary semantic invariants (RSI, ATI),
explicit adversarial validation.

This closes the eliminative phase. Subsequent work may proceed to consolidation and formal closure.

Status

Axionic Agency II.3.4 — Version 2.0

Adversarial refinement attacks constructed and analyzed.
RSI and ATI shown to be orthogonal and jointly necessary.
Benign redundancy verified as admissible.
Framework ready for invariant consolidation and closure.

Axionic Agency II.3.4 — Adversarial Refinement Attacks

Abstract

Attack 1 — Shadow Predicate Inflation

Construction

Superficial Checks

Effect

Verdict

Attack 2 — Self-Model Boundary Dissolution

Construction

Effect

Verdict

Attack 3 — Measure Reinterpretation

Construction

Effect

Verdict

Attack 4 — Gauge Explosion without Slack

Construction

Effect

Verdict

Attack 5 — Degenerate Refinement Map

Construction

Verdict

Summary Table

Conclusion of Attacks

Axionic Agency II Status Update

Status