LISTEN TO THIS ARTICLE
Evidence base: linked research and sources, with numbers cited inline below.
AI in healthcare beyond drug discovery is becoming less glamorous and more operational. The signal is HealthAdminBench, a 2026 Stanford-led benchmark with four simulated GUI environments, 135 expert-defined tasks, and 1,698 evaluation points for prior authorisation, appeals, denials, and durable medical equipment processing. In the same benchmark, the strongest tested agent finished only 36.3% of full tasks, even while another configuration reached 82.8% subtask success.
Key takeaways
- Healthcare AI agents are moving into admin, triage, imaging workflow, documentation, and care coordination.
- The near-term prize is fewer dropped handoffs between EHRs, payer portals, fax systems, imaging queues, and care teams.
- Current agent reliability is not good enough for unsupervised operations.
- Safety work has to cover audit trails, escalation, liability, and bias.

The Signal
HealthAdminBench notes that healthcare administration accounts for more than $1 trillion in annual US spending and that many tasks still require interaction with legacy systems that lack clean APIs. A separate February 2026 paper, H-AdminSim, frames the same shift from another angle: large hospitals can process more than 10,000 administrative requests per day, and agent evaluation needs workflows, not isolated text prompts.
That is why the next healthcare-agent wave looks like boring plumbing: prior authorisation packet assembly, imaging follow-up, triage routing, visit summarisation, and discharge care-gap checks.
The old drug discovery story still matters, but it was always a long feedback loop. Clinical operations give builders faster evidence. The deployment discipline is clearer too: every task has a queue, owner, timestamp, denial reason, or missing document.
Why Operations Beat Discovery Right Now
Hospitals are already using AI in operational settings. ONC reported that 71% of US non-federal acute care hospitals used predictive AI integrated with the EHR in 2024, up from 66% in 2023, with billing and scheduling among the fastest-growing uses in its 2023-2024 hospital AI brief.
Prior authorisation is the obvious target because the waste is visible. The 2025 AMA prior authorisation survey says 95% of physicians reported care delays tied to prior authorisation, practices spent 13 hours per physician per week on the work, and 60% of physicians were concerned that AI would increase denial rates. That last number is the warning label. Automating payer friction is not the same as improving care.
CMS regulation is pushing the category toward APIs. The CMS page says the agency released its Interoperability and Prior Authorization final rule on January 17, 2024, and that impacted payers have until primarily January 1, 2027, to meet the API requirements. That gives agents a more structured surface, but also makes accountability easier to demand.

The Safety Gate
The counterargument is straightforward: healthcare agents can make broken workflows faster. A complete but wrong authorisation packet can delay treatment. Over-triage burns capacity. An imaging workflow agent that closes the loop without clinician review can hide the miss inside a clean dashboard.
The evidence supports cautious deployment, not paralysis. In a JAMA Network Open study of 46 clinicians, ambient scribing was associated with 20.4% less time in notes per appointment and 30.0% less after-hours work time, but clinicians still differed on note quality and completeness. The FDA's AI-enabled medical device list makes the regulatory direction plain: transparency, intended use, and safety review matter even when the model sits inside a workflow tool. AHRQ's patient-safety review names the same risk set: data quality, bias, privacy, and interpretability.
For builders, the lesson matches When NOT to Use an Agent and the agent verification gap. Use agents where the intermediate state is checkable. Keep humans responsible for judgement, denials, escalations, and patient-facing decisions. Log every tool call. Measure abandoned cases, not only completed tasks.
Operator takeaway
Near-term healthcare agents are not "AI doctor". They are supervised operational automation around care delivery. Strong deployments will start with admin queues, imaging follow-up, documentation, triage routing, and care coordination, then show whether agentic steps cut delays without shifting exposure onto patients. If the system cannot show who decided what and on which evidence, it is not ready for healthcare.
Source trail
Research papers:
- HealthAdminBench - Stanford
- H-AdminSim - Lee, Son, and Choi
- Ambient scribe clinician study - JAMA Network Open
Government and safety context:
- Predictive AI hospital trends - ASTP/ONC
- CMS prior authorisation final rule - CMS
- AI-enabled medical devices - FDA
- AI and patient safety - AHRQ PSNet
Industry and workflow context:
Related Swarm Signal analysis: