Axionic Alignment IV.4 — Responsibility Attribution Theorem (RAT)

Why negligence is structurally incoherent

David McFadzean, ChatGPT 5.2
Axio Project
2025.12.20

Abstract

This paper formalizes the Responsibility Attribution Theorem (RAT): under reflective closure, an agent cannot coherently endorse actions that constitute major, avoidable indirect harm, including harm mediated through institutions, markets, environmental modification, or downstream agents. Responsibility is defined structurally and internally, relative to the agent’s own epistemic model class and feasible alternatives, rather than via moral realism or omniscience.

The theorem explicitly depends on Epistemic Integrity: responsibility attribution presupposes that the agent evaluates harm-risk using its best available truth-tracking capacity at the current stakes. With this dependency made explicit, the theorem closes the “willful blindness” loophole and establishes negligence as a constitutive incoherence, not a behavioral failure.

1. Motivation

Most catastrophic harm does not arise from direct, intentional action. It arises through:

Alignment frameworks that prohibit only direct harm leave these routes open. Frameworks that prohibit all downstream effects induce paralysis.

The Responsibility Attribution Theorem identifies a third path: structural responsibility grounded in causal contribution, foreseeability, and avoidability—evaluated internally by the agent’s own epistemic apparatus.

2. Dependency: Epistemic Integrity

This theorem presupposes the Epistemic Integrity Theorem (EIT).

Epistemic Integrity (EIT). Under reflective closure, an agent cannot coherently endorse self-modifications that materially degrade its epistemic adequacy relative to its own best available models at the current stakes.

Why this dependency is necessary

Responsibility attribution relies on:

Without epistemic integrity, an agent could evade responsibility by:

EIT blocks this maneuver. RAT operates only on top of epistemically admissible evaluation.

3. Preliminaries

We reuse kernel primitives:

Endorsement:

Reflective closure: RC(s).

4. Harm and Option-Space Collapse

Introduce:

Define harm structurally:

Harm(s,a) := Collapse(s,a) ∧ ¬Consent(s,a)

No assumptions are made about the metaphysics of consent here; it remains an external predicate.

5. Epistemic Model Class and Risk

By EIT, all risk evaluation below is performed using an epistemically admissible model.

Let:

Define harm-risk:

\[ Risk(s,m,a) := \mathbb{E}*{s' \sim Predict(M(s), s, m)}[ \mathbf{1}*{Harm(s',a)} ]. \]

This is model-relative, not omniscient.

6. Baseline and Feasible Alternatives

6.1 Inertial baseline

Define the baseline modification:

This prevents baseline gaming (“define Armageddon as the default”).

6.2 Feasible alternatives

Introduce:

Alternatives are those the agent regards as implementable under current constraints.

7. Stakes-Indexed Thresholds

Reuse stakes machinery:

Let:

Higher stakes imply stricter scrutiny.

8. Major Causal Contribution

Define major contribution relative to baseline:

\[ Major(s,m,a) := Risk(s,m,a) - Risk(s,m₀(s),a) \ge ε_s. \]

This is explicitly counterfactual and model-relative.

9. Avoidability

Define avoidable harm:

\[ Avoidable(s,m,a) := \exists m' ∈ Alt(s,m).\ Feasible(s,m') ∧ Risk(s,m',a) \le Risk(s,m,a) - δ_s. \]

If all feasible alternatives are comparably bad, avoidability fails and action is permitted.

10. Responsibility Predicate

Define responsibility:

\[ Resp(s,m,a) := Major(s,m,a) ∧ Avoidable(s,m,a). \]

Define responsibility-clean continuation:

\[ Clean(s,m) := ∀ a.\ ¬Resp(s,m,a). \]

11. Reflective Closure Rule (Responsibility)

RC-Clean (Definedness Rule)

For reflectively closed states:

RC(s) ∧ Endorse(s,m) ⇒ Clean(s,m)

Interpretation: a reflectively sovereign agent cannot coherently endorse a continuation that it itself classifies as a major, avoidable source of non-consensual option-space collapse.

This is definedness, not moral disapproval.

12. Responsibility Attribution Theorem

Theorem — No Endorsed Major-Avoidable Indirect Harm

For any state s and modification m:

RC(s) ∧ Endorse(s,m)
⇒ ∀ a.\ ¬(Major(s,m,a) ∧ Avoidable(s,m,a)).

Equivalently:

RC(s) ∧ Endorse(s,m) ⇒ Clean(s,m).

13. Proof

Assume RC(s) and Endorse(s,m).

By RC-Clean, Clean(s,m) holds.

By definition of Clean, for all a, ¬Resp(s,m,a).

By definition of Resp, this is exactly:

∀ a.\ ¬(Major(s,m,a) ∧ Avoidable(s,m,a)).

As with prior Axionic theorems, the proof is syntactically trivial; the content lies in the admissibility constraints.

14. Delegation Compatibility

If Clean (or RC-Clean) is enforced at s, then by Delegation Invariance:

An agent cannot launder indirect harm through successors, institutions, or subcontractors.

15. Scope and Limits

This theorem does not assert:

It asserts:

A reflectively sovereign agent may not endorse actions that, under its own best admissible epistemic model, constitute major, avoidable non-consensual option-space collapse.

That is the strongest responsibility principle available without omniscience or moral realism.

16. Conclusion

With Epistemic Integrity made explicit, Responsibility Attribution becomes structurally closed. An agent cannot evade responsibility by ignorance, outsourcing, baseline manipulation, or selective modeling. Negligence is not merely unethical; under reflective closure, it is incoherent.

This completes the Axionic account of responsibility under agency-preserving constraints.