Universal Alignment

AGI Alignment, Existential Risks, and the Universality of Human Values

1. Introduction

Artificial General Intelligence (AGI) alignment refers to ensuring that highly capable AI systems reliably pursue goals that align with human values and intentions. Achieving alignment is critically important because misaligned AGI poses existential threats—potentially leading to catastrophic outcomes for humanity. As AGI development accelerates, proactively addressing alignment becomes increasingly urgent, requiring careful consideration of both technical and philosophical dimensions to ensure positive societal outcomes.

2. Estimating P(doom)

The probability of catastrophic outcomes from misaligned AGI, termed P(doom), is challenging to estimate accurately. Given maximal epistemic uncertainty—uncertainty both about whether AGI is technically feasible and whether it can be reliably aligned—the explicit calculation arrives at approximately 25%. This perspective contrasts with more pessimistic views, such as those of Eliezer Yudkowsky, who suggests significantly higher probabilities, often approaching near certainty. Estimating P(doom) involves understanding intricate interactions between technological progress, human oversight capacity, and potential emergent properties of advanced intelligence. Different assumptions regarding these variables lead to significantly varied risk assessments, emphasizing the importance of clearly defining foundational assumptions in such calculations.

3. Orthogonality Thesis (OT): Skepticism and Implications

The Orthogonality Thesis explicitly states that intelligence and goals are independent dimensions. However, several critical perspectives challenge this notion:

A quantitative analysis indicates moderate constraints on OT (20-60% correctness) are plausible, potentially significantly simplifying alignment challenges by naturally constraining viable goal-space diversity. This moderately constrained scenario offers cautious optimism, estimated at approximately a 35% probability, suggesting explicit practical benefits for alignment efforts if accurate.

4. Existential Risks Beyond AGI

While AGI alignment poses substantial existential risks, nuclear war and engineered pandemics represent immediate and critically significant threats:

Climate change, while undeniably serious, primarily serves as a risk multiplier rather than posing direct existential threats. It exacerbates geopolitical tensions, resource scarcity, and population displacement, indirectly elevating the risk of nuclear conflicts and pandemics.

5. AGI's Effect on Other Risks

The development of AGI could substantially influence these existential risks in varying directions:

Thus, effective AGI alignment emerges explicitly as a pivotal factor in mitigating overall existential risk, underscoring the necessity of aligning future AGI with broadly beneficial goals.

6. Human Values Problem in Alignment

A central challenge in AGI alignment lies in the question of universally shared human values. Extensive evidence from anthropology, psychology, sociology, and historical analysis demonstrates significant diversity in values across cultures, individuals, and epochs, suggesting that universally shared values may indeed be minimal or even nonexistent. This diversity poses substantial practical and philosophical challenges to alignment strategies, implying the need for either explicit value pluralism—aligning AI systems with multiple, context-sensitive values—or minimalist approaches prioritizing broadly accepted principles such as autonomy, consent, freedom from harm, and fundamental fairness.

7. Introducing the Universality Metric: Value-9s

To practically navigate the challenge of value alignment, this article introduces the "Value-9s" metric, drawing inspiration explicitly from service reliability metrics common in technological domains:

Alignment strategies should explicitly target values at Level 3 or above, ensuring selected goals maintain sufficient universality to be reliably accepted across diverse human populations, minimizing alignment conflicts and enhancing stability.

8. Conclusion and Practical Recommendations

AGI alignment represents a crucial and urgent area for existential risk mitigation, directly impacting humanity's long-term survival and flourishing. The introduction of the Value-9s universality metric provides an explicit, practical guide for quantifying and selecting human values suitable for robust alignment. It is recommended that future alignment research explicitly measure global universality levels empirically, refining and validating the metric further. Explicitly quantifying the universality of human values provides essential clarity and precision, fundamentally informing pragmatic, scalable strategies for safe and beneficial AGI development.

References