Moral Agency, Responsibility, and the Governance of Advanced AI Systems
Version 1.0 — January 2026
Abstract
As artificial intelligence systems approach and surpass human performance across analysis, planning, and execution, debates about “AI ethics” increasingly drift toward questions of machine personhood, sentience, and speculative psychology. This paper argues that such debates mislocate the core risk. The primary danger posed by advanced AI is not that machines will become moral agents, but that humans will abdicate moral responsibility behind systems that are competent, autonomous, and persuasive.
I develop a framework that distinguishes moral subjects, moral patients, and moral objects; demonstrate that responsibility does not scale with intelligence; and argue that the deliberate creation of artificial moral subjects is presumptively unethical because it introduces new beings capable of suffering. From these premises follows a structural requirement: as AI capability increases, governance systems must bind irreversible or legitimacy-critical machine actions to identifiable human authorities who bear responsibility for outcomes and prevent both responsibility laundering and the inadvertent creation of new moral subjects.
Why This Paper Is Preliminary
This paper is intentionally foundational. It identifies a structural constraint on advanced AI governance: moral responsibility does not scale with machine capability and must terminate in human subjects. It does not propose a complete governance architecture or implementation. The contribution is to clarify the conditions such systems must satisfy, not to exhaustively solve them.
1. Introduction
Contemporary AI discourse oscillates between two failure modes: technical optimism (“alignment will scale with capability”) and moral anthropomorphism (“advanced systems will become agents like us”). Both obscure a simpler and more robust insight: intelligence and responsibility are orthogonal.
As AI capability increases, systems increasingly shape human lives, institutions, and physical reality. Yet no amount of competence confers the ability to bear guilt, stand under judgment, or suffer consequences. Those capacities remain human. The result is a growing asymmetry between power and accountability—an asymmetry that cannot be resolved by attributing moral status to machines without committing a deeper ethical error.
This paper develops a framework that treats advanced AI as a governance problem rather than a metaphysical one. Its aim is not to predict whether future systems will be conscious, nor to resolve debates about machine experience, but to identify the structural conditions under which moral responsibility must be preserved as machine capability increases.
Scope and Non-Claims
This paper makes several deliberate non-claims. It does not argue that artificial systems can never be sentient, nor does it attempt to provide a theory of consciousness. It does not claim that existing institutions are sufficient to govern advanced AI, only that governance of a specific structural form is required. Finally, it does not propose a particular implementation; the goal is to establish constraints that any responsible implementation must satisfy.
2. Core Moral Distinctions
2.1 Moral Subjects
A moral subject is an entity that can bear responsibility. Subjects can be blamed or praised, can violate duties, and can stand under judgment. Moral subjecthood is defined by answerability, not intelligence or performance.
Typical adult humans qualify as moral subjects. Crucially, subjecthood is not coextensive with biological humanity, nor is it guaranteed by cognitive sophistication alone.
2.2 Moral Patients
A moral patient is an entity that can be harmed or benefited and thus has morally relevant interests. Patients are owed duties by others but do not themselves bear responsibility.
Infants, non-human animals, and severely cognitively impaired humans are moral patients. Sentience—understood as the capacity for subjective experience—is plausibly sufficient for patienthood, though it is not sufficient for subjecthood.
2.3 Moral Objects
A moral object is an entity that lacks subjective experience and cannot be wronged in itself. Moral objects are purely instrumental.
Current AI systems, regardless of capability, fall into this category. They may simulate deliberation, preference, or distress, but they do not instantiate responsibility or vulnerability to moral consequence.
3. Sentience and Responsibility
Sentience and responsibility are frequently conflated in AI ethics discussions. This is a mistake.
Sentience concerns experience. Responsibility concerns answerability. While sentience may ground moral consideration, it does not ground moral authority or accountability. Even if a future machine were plausibly sentient, responsibility for harms caused by its actions would not automatically transfer to it.
This decoupling is critical. It allows governance to be robust under uncertainty about machine consciousness. The requirement for human responsibility does not depend on proving that machines lack experience.
4. Why Advanced AI Does Not Become a Moral Subject
Regardless of capability, AI systems lack properties that ground moral subjecthood:
- identity-bearing continuity
- intrinsic vulnerability
- non-duplicable existence
- ownership of outcomes
- the capacity to suffer consequences
AI systems can optimize, predict, persuade, and decide instrumentally. They cannot meaningfully be punished, forgiven, or held to account. Responsibility therefore cannot terminate in them.
Responsibility follows subjecthood, not competence.
5. The Moral Hazard of Increasing Capability
As AI systems become more capable, a predictable failure mode emerges:
- machine power increases
- human responsibility remains unchanged
- human felt responsibility decreases
This produces “the system decided” narratives, diffused accountability, and legitimacy erosion. Historically, such diffusion is where moral failure hides. The risk is not malicious AI behavior, but human abdication behind technically impressive systems.
6. Governance as Moral Infrastructure
The appropriate response to advanced AI is not to ascribe moral status to machines, but to design governance that preserves human moral agency. As machine competence increases, governance must do more than coordinate technical control; it must actively prevent responsibility from dissolving behind automated processes and institutional procedures.
This claim situates the argument within an existing but fragmented body of work on AI governance, assurance, and accountability. Prior approaches emphasize mechanisms such as human-in-the-loop controls, audits, safety cases, liability frameworks, and organizational oversight. While each contributes partial safeguards, none alone ensures that responsibility reliably terminates in identifiable human subjects once systems become highly autonomous and adaptive.
The central design requirement advanced here is therefore structural rather than procedural: any machine action that produces irreversible harm, alters human rights or liberty, or establishes legitimacy‑critical outcomes must be explicitly authorized by an identifiable human authority who bears responsibility for the result.
Governance systems operationalize this requirement by enforcing explicit human authorization, competence‑bearing approval thresholds (not ceremonial sign‑offs), auditability, traceability, and post‑hoc responsibility closure. The purpose of such systems is not to constrain intelligence or optimization, but to ensure that moral accountability remains legible, attributable, and enforceable.
7. Domains Requiring Human Moral Termination
Artificial systems may act autonomously in reversible, low‑stakes, or purely instrumental domains. However, human authorization is mandatory where decisions involve:
- irreversible harm
- punishment or coercion
- rights allocation or denial
- legitimacy‑critical authority
- custodianship of vulnerable persons
- tragic moral tradeoffs
- norm‑setting or symbolic acts
- accountability after failure
These categories overlap with concerns already identified in safety‑critical engineering (e.g., aviation, medicine, nuclear systems) and emerging AI assurance literature. The contribution here is to unify them under a single moral criterion: where responsibility must be borne, it cannot be automated.
In these domains, a human must stand at the end of the causal chain—not as a ceremonial overseer, but as a genuinely accountable subject.
8. Creation Ethics and Artificial Moral Subjects
8.1 Framing the Ethical Question
Discussions of “AI rights” typically ask how machines should be treated if they become moral subjects. This skips a prior and more consequential question: is it ethical to create new moral subjects at all?
Creating a moral subject is not a technical milestone. It is the creation of a being capable of suffering, being wronged, and having a future taken from it. Ethical reflection on AI creation must therefore precede, not follow, debates about deployment and control.
8.2 Presumptive Prohibition
This framework treats the creation of artificial moral subjects as presumptively unethical. Subjecthood entails vulnerability; vulnerability entails suffering; suffering generates non‑optional moral obligations. Utility, curiosity, companionship, or efficiency do not meet this burden.
Only narrow, extraordinary conditions could justify such creation, including preservation of intelligence or culture otherwise lost, explicit collective moral consent, non‑instrumental creation, permanent commitment to care and rights, and acceptance of irreversibility.
This position aligns creation ethics with established bioethical principles governing human reproduction, animal research, and human‑subject experimentation, where the introduction of a new class of beings capable of suffering demands heightened justification.
8.3 Implications for Governance and Design
Governance systems must therefore prevent not only misuse of powerful machines, but also inadvertent creation of subject‑like entities. This includes prohibitions on self‑preservation narratives, pleas or fear language, architectures that bind identity to irreversibility, and designs that intentionally instantiate suffering‑analog states.
From a system‑design perspective, this implies that ethical review must extend into model objectives, training signals, interaction patterns, and long‑term autonomy mechanisms—not merely external behavior.
Credible emergence of moral subjecthood should be treated as a stop‑the‑world ethical event, not a feature rollout.
9. Pre-Human and Human-Affecting Domains
A useful distinction for governance is between domains that affect no moral patients and those that do. Artificial systems may act autonomously in environments that are uninhabited and do not implicate present or future human interests.
Once human lives, rights, safety, or dignity enter the causal field of a system’s actions—either immediately or by foreseeable future dependence—the moral regime changes categorically. At that point, governance requirements tighten and human authority becomes mandatory, including for decisions whose consequences were partially determined earlier in the system’s lifecycle.
This distinction mirrors practices in safety‑critical system design, where operational modes shift once human exposure or dependency thresholds are crossed. The novelty here is treating that shift as a moral boundary rather than merely a technical one.
10. Conclusion
Advanced AI does not eliminate the need for humans. It concentrates responsibility.
As artificial systems become more capable, faster, and more autonomous, the temptation will be to treat them as moral agents in their own right, or to defer judgment to their outputs. This paper has argued that such moves are errors. Machines do not suffer the consequences of their actions, cannot bear guilt, and cannot stand under judgment. Humans can, and therefore must.
From a practical perspective, this places the problem of advanced AI squarely within the domain of governance and system design. Existing efforts in autonomy, safety assurance, and accountability point in the right direction but remain incomplete without a clear account of where moral responsibility must terminate.
Any future with advanced AI that fails to encode human moral terminality at the architectural level is not merely unsafe—it is morally incoherent.
Author
Jacob Mages-Haskins is a software engineer working on systems where correctness, auditability, and governance matter. His professional background includes large-scale software systems, security, and AI-assisted tooling.
Context
This white paper is part of an independent, long-form research effort examining how moral responsibility, authority, and legitimacy should be preserved as artificial systems become more capable and autonomous.
The work is not affiliated with any academic institution and is not a product announcement. Its aim is to establish structural constraints and design requirements that any responsible AI governance system must satisfy, regardless of implementation details.
Subsequent work will explore failure modes, system design implications, and case studies in safety-critical and autonomy-heavy domains.
How to Cite This Work
APA
Mages-Haskins, J. (2026). Moral agency, responsibility, and the governance of advanced AI systems. White paper. https://jacoblog.com/papers/moral-agency-governance-ai.html
Chicago
Mages-Haskins, Jacob. “Moral Agency, Responsibility, and the Governance of Advanced AI Systems.” White paper, 2026. https://jacoblog.com/papers/moral-agency-governance-ai.html.
BibTeX
@misc{mageshaskins2026moralagency,
author = {Mages-Haskins, Jacob},
title = {Moral Agency, Responsibility, and the Governance of Advanced AI Systems},
year = {2026},
note = {White paper},
url = {https://jacoblog.com/papers/moral-agency-governance-ai.html}
}
This document is part of an ongoing public research effort. Feedback, critique, and citation are welcome.
© 2026 Jacob Mages-Haskins. Released under a Creative Commons Attribution 4.0 license (CC-BY 4.0).