The handoff that isn't: how clinical AI escapes accountability

Johnson, Mo

WORKING PAPER №01

Abstract

Clinical AI does not complete the handoff. In clinical medicine a handoff transfers three things at once: the information, the reasoning, and a named, accountable owner on the sending side. The AI-to-clinician transition transfers the information. It does not transfer a named sender. The Governance Owner, the institutional seat that authorized the deployment, holds the charter, and carries accountability for the model entering the workflow, is absent at the moment of clinical use. The Decision Owner, the clinician at the bedside, receives the output but cannot receive what was never sent. Accountability does not escape so much as fail to transfer. This paper diagnoses that structural incompleteness as the mechanism beneath the clinical AI accountability gap, maps it through the three-decision taxonomy, explains why the existing governance layers do not close it, and names the complete fix: two owners, both named, six functions activated, The Handoff completed between them.

Clinical AI does not complete the handoff.

A real handoff transfers three things at once: the information, the reasoning behind it, and a named, accountable owner on the sending side. The receiver gets all three. The receiver can question the sender. The receiver can refuse the transfer. And when something goes wrong, the institution can trace authority in both directions, back to the sender and forward to the receiver.

The AI-to-clinician transition produces one of those three. The information transfers. The reasoning is partly visible at best and structurally missing at worst. The named sender does not exist. No model holds accountability. In most deployments no record names the institutional role that authorized the output to enter the clinical workflow. The clinician receives the recommendation and, with it, the full weight of the decision, including accountability for a recommendation they did not generate, reasoning they cannot fully inspect, and a system they did not build.

That is not a handoff. It is a one-way transfer of liability wearing the shape of one.

The structural failure has a precise location: the sending seat is empty. The Governance Owner, the named institutional role whose charter authorizes the deployment, whose commission validates it for the patient population, and whose cover binds the institution to the output at the moment of use, is absent. Without the Governance Owner named, The Handoff cannot activate. Without The Handoff, accountability does not transfer. It dissolves into the architecture between the system that shaped the decision and the clinician who signs for it.

This paper diagnoses that mechanism.

What a handoff actually is

The clinical handoff is not informal. In American hospital medicine, sign-out is a structured transfer of patient responsibility from one clinician to another. The I-PASS framework, for Illness severity, Patient summary, Action list, Situation awareness, and Synthesis, codifies what must transfer and in what form.

The stakes are not theoretical. Communication and handoff failures are a root cause of roughly two-thirds of sentinel events, the most serious preventable adverse events in hospitals, which is why the Agency for Healthcare Research and Quality and the Joint Commission named handoff improvement a national patient-safety priority. When Starmer and colleagues implemented the I-PASS bundle across nine pediatric residency programs and measured the result in the New England Journal of Medicine in 2014, preventable adverse events, the injuries caused by medical error, fell by 30 percent, from 4.7 to 3.3 per 100 admissions, with no added burden on clinician workflow. Structure the transfer and patients are measurably safer. Leave it unstructured and they are measurably harmed.

The handoff works because it rests on three conditions.

A named outgoing owner the institution can trace back to. The sending clinician is identified. Their name is on the sign-out. Their assessment anchors the transfer. If something goes wrong on the next shift, the institution knows who held the patient before and what they knew at the moment of transfer. Accountability does not diffuse. It traces.

Reasoning inspectable enough that the receiver can question it. The sender does not transmit conclusions alone. They transmit the clinical picture, the working diagnosis, the rationale, and the anticipated trajectory. The receiver engages with the reasoning rather than accepting the output. A transfer that hands over a diagnosis without its basis is not a handoff. It is an instruction.

A receiving owner with enough information to refuse. The receiver can push back. If the clinical picture does not support the assessment, the receiver has standing to say so. The transfer is bilateral. The receiver accepts accountability when they accept the patient, and they accept the patient with the capacity to reject the terms of that acceptance.

These three conditions are the test. A transfer that meets all three is a handoff. A transfer that fails any one of them leaves accountability structurally incomplete.

The handoff that isn’t

Apply the test to the AI-to-clinician transition.

Condition one fails first. There is no named outgoing owner on the system side. The model has no name. The vendor is not the owner, having built the system rather than the deployment decision. The procurement committee is not the owner, having approved the purchase rather than the clinical authorization. The Chief Medical Information Officer who championed the initiative is not, in most institutions, formally named as the Governance Owner of the specific deployment pathway whose output is now reaching the clinician.

The Governance Owner is the institutional role, typically the Chief Medical Officer, the Chief Medical Information Officer, the Chief Digital Officer, or General Counsel, who executes three functions: Charter, the formal institutional authorization of the deployment; Commission, the validation of the deployment for the specific patient population and clinical context; and Cover, the binding of institutional accountability to the output at the moment of clinical use. In most health systems, no individual has been formally named to hold all three functions for a given clinical AI deployment. The governance structure exists. The named sender does not. Condition one fails before the clinician ever sees the output.

Condition two fails structurally. The reasoning behind a clinical AI recommendation is not inspectable at the point of care. A study published in Nature Communications in 2025, using the SourceCheckup framework, found that between 50 and 90 percent of large language model medical responses were not fully supported, and were sometimes contradicted, by the sources the models themselves cited. The receiver cannot audit what the sender cannot verify. And the danger is concentrated in what is unsaid: the NOHARM benchmark, developed by the ARISE Network in a Stanford and Harvard collaboration and released in January 2026, found that across 31 large language models the potential for severe harm reached 22.2 percent of cases, and that 76.6 percent of those harmful errors were errors of omission, the model failing to raise a diagnosis, a red flag, or a necessary next step. The receiver cannot question what they cannot see, and they cannot see what was left out.

A transfer built on a basis that is up to 90 percent unsupported and whose failures are overwhelmingly things the system did not say is not a handoff. It is an instruction without a foundation.

Condition three fails by design. The clinician does accept accountability when they act on the recommendation. The chart records their name, their reasoning, their decision. But acceptance without the capacity to inspect the reasoning is not a free acceptance. The clinician who acts on a sepsis alert at three in the morning does not know which inputs shaped the recommendation, whether the model was validated for this patient population, or whether the institution formally authorized this deployment for this context. They accept accountability for something they cannot fully audit, from a sender whose name appears nowhere on the transfer. On Stanford’s MedAgentBench, a benchmark of clinically derived tasks run in a realistic electronic health record environment under strict first-attempt scoring, the leading models reached roughly 70 percent task success, which leaves the best clinical AI agents wrong on close to one task in three, and weakest on exactly the multi-step execution where a clinical decision is most consequential. The receiver is being asked to accept, unconditionally, a sender that is wrong roughly a third of the time and cannot show its work.

All three conditions fail. The AI-to-clinician transition is not a handoff. It is a structural transfer of liability that completes on one side only.

Where accountability disappears: the shaped decision

The failure does not announce itself. It hides inside the medical record.

Three decisions coexist in any clinical workflow that includes AI, and all three look identical in the chart. The first is the Clinician Decision, made before any AI entered the workflow, documented as the clinician’s, auditable to the clinician’s reasoning alone. Accountability is clear. The second is the Parallel Decision, made after an AI recommendation was generated and not used. The system was consulted but not followed. The chart records the clinician’s decision and the system’s presence is invisible. The third is the Shaped Decision, made after an AI recommendation entered the workflow and influenced the call. It is documented as the clinician’s. The system’s contribution is not recorded anywhere.

The Shaped Decision is where accountability disappears. The audit trail erases both owners at once. The Governance Owner, who authorized the deployment whose output shaped the recommendation, does not appear in the chart, because no deployment record links them to the output. The Decision Owner, the clinician, appears alone, holding accountability for a decision that was architecturally shared and documentarily singular.

The consequence surfaces only when the outcome is questioned. The institution cannot reconstruct what happened. The regulator cannot trace the decision. The mortality review cannot establish whether the system shaped the call or the clinician made it independently. The chart looks complete. The accountability structure is not.

This erasure is not a lapse in documentation discipline. It is the structural consequence of an unnamed sending seat. If the Governance Owner were named in the deployment record, linked to the specific output, and traceable to the moment of use, then a chart that omitted that linkage would be visibly incomplete. The gap would be legible and auditable. Instead the chart looks whole, because nothing is missing from a record that never recorded the sender in the first place.

Why the institution does not catch it

Most institutions deploying clinical AI have governance. The failure is not the absence of governance. It is its location.

Clinical AI governance is the fourth of four layers, and each layer governs something the one beneath it cannot see. Data Governance, the first, protects the inputs: how data is collected, secured, and permissioned. Most United States health systems built this layer years ago, under pressure that predates clinical AI. AI Governance, the second, protects the general outputs: model selection, validation, bias, drift, and the broad regulatory envelope that applies to any AI in any sector. Healthcare AI Governance, the third, protects the operational deployment: workflow integration, alert calibration, monitoring across populations, and the privacy and safety obligations specific to health systems. Most institutions have built some of this. Clinical AI Governance, the fourth, protects the patient decision at the bedside, the moment an AI-shaped recommendation reaches a patient and a clinician acts on it. It is the layer where both seats of the handoff live, both named, both recorded in the deployment pathway, both traceable to the specific output that reaches the clinician. Most institutions have not built this layer at all.

The first three layers govern the model, the data, and the regulatory envelope. None of them governs the moment. The moment is the fourth layer, the handoff happens there, and in most institutions the fourth layer is empty. Drawn as four overlapping circles rather than a stack, the same structure shows the same gap from the other side: the four governance domains converge at one point, the patient decision, and the fourth circle is the dotted one, the exposed center where the gap lives.

The institutions are not positioned to catch it, and they know it. In a Black Book Research survey of 182 United States hospital leaders conducted in late 2025, only 22 percent reported high confidence that they could produce a complete, auditable AI explanation for regulators or payers within 30 days, and 33 percent cited unclear internal ownership between IT, quality and safety, and compliance as a top barrier to audit readiness. That is the empty fourth layer, measured directly. The governance literature confirms the shape of the problem. A scoping review of 77 healthcare AI governance frameworks, published in npj Digital Medicine, found that most were not applicable to real-world settings and that oversight mechanisms were the least common component, present in under a fifth of frameworks. The frameworks exist. The layer that operates them at the bedside does not.

The parties whose business is pricing risk have already drawn the conclusion. Effective January 1, 2026, the Insurance Services Office introduced Form CG 40 47, a Commercial General Liability endorsement that excludes bodily injury, property damage, and personal and advertising injury arising out of generative artificial intelligence. W. R. Berkley Corporation introduced Form PC 51380, an absolute artificial intelligence exclusion written for the directors and officers, errors and omissions, and fiduciary liability lines, the coverage that protects the executives who approve deployments. The accountability gap and the coverage gap now occupy the same institutional space. When the exclusion applies, the institution holds the clinical AI risk itself, unindemnified, at exactly the point in the workflow where no one has been named to own it.

The Gap Score™ measures the distance. Scored against the nine blocks of the Mind the 9 Blocks™ framework, a named deployment is rated on how many of its accountability functions are active. With the Governance Owner unnamed, three of the blocks, Charter, Commission, and Cover, cannot be scored. With the Decision Owner unnamed, three more, Decide, Document, and Defend, cannot be scored. A deployment with both seats empty cannot be scored on the six functions that define the handoff. The Gap Score™ is not a performance metric. It is a structural diagnostic, and it names what is missing so the institution can close it.

Closing the handoff

The fix is structural, and it matches the failure.

The handoff fails because the sending seat is empty. Closing it requires naming both seats, activating all six functions, and recording the transfer in the deployment pathway so the audit trail reflects the whole accountability structure, not only the clinician who received the output but the institution that sent it.

The Governance Owner holds three functions. Charter is the formal institutional authorization of the specific deployment for the specific clinical context, not a general AI policy but a named authorization for this model, this workflow, this population. Commission is the validation that the deployment performs as required for that population, with documented performance, known failure modes, and a named reviewer; it is what makes the receiver’s acceptance meaningful, because the sender has inspected the reasoning being transmitted. Cover is the binding of institutional accountability to the output at the moment of clinical use; the Governance Owner’s name is in the deployment record, so when the output reaches the clinician the institution is on record as having authorized it, and the sending seat is not empty.

The Decision Owner holds three functions. Decide is the clinical call, made with documented awareness that a clinical AI recommendation was present in the workflow. Document is a chart that reflects not only the decision but its context, the recommendation, whether it was incorporated, and the clinician’s reasoning in relation to it. Defend is the standing to be reached, questioned, and held accountable for the clinical decision, with the Governance Owner’s deployment record as the institutional context behind it.

When all six functions are named, Charter, Commission, Cover, Decide, Document, and Defend, The Handoff activates. The fourth layer of governance, Clinical AI Governance, goes from empty to solid. The four-layer convergence completes, so that Data, AI, Healthcare AI, and Clinical AI governance all hold at once at the point of clinical use. The accountability gap closes.

This is not a technology solution. It does not require a new model, a new vendor, or a new regulation. It requires institutional architecture: two named seats, six recorded functions, one completed transfer. The Clinical AI Accountability Canvas™ maps the full set of blocks against the deployment record, the Mind the 9 Blocks™ framework operationalizes the diagnostic, and the Gap Score™ measures the distance from the current state to a closed one.

What this is not

Four scope guards before this paper closes.

This is not an argument against clinical AI. The structural failure diagnosed here is independent of the value clinical AI creates, and it does not disappear if institutions slow deployment. It intensifies. The response to a structural accountability failure is to fix the structure, not to pause the technology.

This is not a claim that the clinician is blameless. The Decision Owner holds real accountability for the clinical call. The diagnosis is that the clinician holds accountability they cannot fully discharge, because the sending side of the transfer is structurally absent. Naming the Governance Owner does not reduce the clinician’s accountability. It gives that accountability an institutional foundation.

This is not solved by a better model alone. A model accurate to 99 percent with no named Governance Owner still fails the first condition of an accountable handoff. The sending seat is absent regardless of model performance. Accuracy is a property of the output. The handoff is a property of the institutional architecture around it. These are different problems requiring different solutions.

This is not solved by general AI governance frameworks. The frameworks that govern enterprise AI were built for procurement-stage risk management by legal and compliance functions. They address the model. They do not address the moment. Clinical AI accountability requires its own vocabulary, its own institutional architecture, and its own literature.

Conclusion

The handoff is not a figure of speech. In clinical medicine it is a formal, structured, accountable transfer of patient responsibility from a named sender to a named receiver. It works because both sides are named, because the reasoning is inspectable, and because the receiver has standing to refuse.

The AI-to-clinician transition fails all three conditions. The sending seat is empty. The reasoning is partly opaque. The receiver accepts accountability without the institutional foundation that would make that acceptance complete.

Closing the handoff is not complicated. Name the Governance Owner. Name the Decision Owner. Activate all six functions. Record the transfer. The Handoff activates, the convergence completes, and the gap closes. The gap that the completed Handoff closes has a name: The Accountability Gap™ (TAG™). The named clinician is the unit of the clinical decision. The named institutional role is the unit of the handoff. Everything else builds from those two names.

Frequently asked questions

What is the handoff that isn't?: The AI-to-clinician transition presents itself as a handoff but fails every condition that makes a handoff accountable. There is no named sender on the system side. The reasoning is not fully inspectable. And the clinician accepts accountability for a recommendation they did not build and cannot fully audit. What looks like a transfer is a one-way assignment of liability.
Who are the two named owners?: The Governance Owner holds the sending seat: the institutional role, typically the Chief Medical Officer, Chief Medical Information Officer, Chief Digital Officer, or General Counsel, that charters the deployment, commissions its validation, and provides cover for the model entering the workflow. The Decision Owner holds the receiving seat: the Chief of Service, attending physician, or bedside clinician who decides, documents, and defends the clinical call. Both seats must be named. One seat filled is not a complete handoff.
Why doesn't naming a CMIO close the gap?: A Chief Medical Information Officer named as governance lead fills part of the Governance Owner seat but does not, by itself, name a Decision Owner for the specific deployment pathway. Most institutions stop at the governance layer and never formally name who owns the clinical decision for each deployment. The handoff requires both seats, not one, and a committee is not a seat. A seat is a named individual who holds all three of its functions.
Why doesn't a more accurate model close the gap?: A model accurate to 99 percent with no named Governance Owner still fails the first condition of an accountable handoff. The sending seat is structurally absent regardless of model performance. Accuracy is a property of the output. The handoff is a property of the institutional architecture around it. They are different problems with different solutions.
What does closing the handoff require?: Name all six functions across both seats: Charter, Commission, and Cover on the Governance Owner side; Decide, Document, and Defend on the Decision Owner side. When all six are named, The Handoff activates, the fourth layer of governance goes from empty to solid, the four-layer convergence completes at the point of clinical use, and the accountability gap closes.

Funding

None.

Conflicts of interest

Mo Johnson, MD MBA is the founder of GPe Research, and the originator of the Clinical AI Accountability Canvas™, Mind the 9 Blocks™, Gap Score™, and TAG™ frameworks referenced in this paper. No commercial arrangements exist with any vendor or health system named or alluded to. The frameworks described are the author's own intellectual property.

Published under CC-BY-4.0. Free to share and adapt with attribution.

How to cite

Johnson, M. (2026). The handoff that isn't: How clinical AI escapes accountability (Working Paper No. 01). GPe Research Publications. https://publications.gperesearch.com/papers/the-handoff-that-isnt

@techreport{johnson2026handoff,
  author      = {Johnson, Mo},
  title       = {The Handoff That Isn't: How Clinical AI Escapes Accountability},
  institution = {GPe Research Publications},
  type        = {Working Paper},
  number      = {01},
  year        = {2026},
  month       = {6},
  url         = {https://publications.gperesearch.com/papers/the-handoff-that-isnt}
}

Closing The Accountability Gap™ in regulated AI.