Agentic AI is flooding into healthcare - who is validating it?

Agentic AI arrived at HIMSS 2026 in force, but the validation frameworks haven't kept pace. Marino's Cara Ludlow on how speed of deployment can be matched by the rigour of validation through critical architectural choices.

Last week at HIMSS (Healthcare Information and Management Systems Society) 2026 in Las Vegas, the world’s largest health IT conference, something became very clear: the age of AI agents in healthcare has arrived. Oracle launched an agent that drafts clinical notes across 30 specialties. Google, Microsoft, Amazon, and Epic all unveiled new AI-powered tools designed to work autonomously within clinical workflows. The technology on display was genuinely impressive.

But amid the excitement, a critical question kept surfacing from clinicians, regulators, and healthcare IT leaders alike: how are these systems being validated before they reach patients?

The speed-safety tension

The pace of deployment is extraordinary. We have moved from experimental chatbots and pilot programmes to production-grade AI agents that can autonomously draft treatment recommendations, triage patients, summarise medical records, and suggest next steps in clinical care. All in the space of roughly eighteen months.

The commercial pressure driving this is understandable. Healthcare systems are stretched. Clinical burnout is real. The promise of AI that can handle administrative burden and surface insights from mountains of patient data is genuinely compelling.

The problem is that the validation frameworks have not kept pace with the deployment timelines. The FDA has approved over 1,400 AI-enabled medical devices since 1995 (source: Intuition Labs) and the rate of submissions has accelerated sharply. But the advent of agentic AI, in systems that act more autonomously and can potentially modify their own behaviour, introduces challenges that the existing regulatory framework was not designed for. 

When a system can adapt, how do you certify that it will behave safely in every future state?

Probabilistic systems in deterministic environments

This is a tension we have written about before at Marino. There is a fundamental difference between deterministic systems - where the same input always produces the same output -  and probabilistic systems like large language models, where outputs are generated based on statistical likelihood.

In healthcare, the consequences of that distinction are not abstract. A probabilistic system that occasionally hallucinates a drug interaction, misinterprets a clinical note, or generates a plausible but incorrect diagnosis is not just producing a “bad recommendation.” The consequences are clear: a potentially devastating patient safety risk and a significant liability event.

The regulatory environment in the US is making this more complex, not less. The Trump administration has broadly moved to reduce rules that might slow AI adoption, which leaves healthcare organisations with less federal guidance on implementation. Meanwhile, individual states are beginning to legislate AI use in healthcare independently, creating a fragmented patchwork of requirements for organisations operating across state lines.

In the EU, the picture is different. The AI Act’s high-risk classification explicitly covers AI systems used in healthcare, and the full obligations take effect in August 2026. For European healthcare organisations, the regulatory direction is clear, even if the implementation details are still being finalised.

The open-weight opportunity in health AI

One of the most interesting developments running in parallel with the agent explosion is the maturation of open-weight AI models. High-quality models from the likes of Mistral, Meta’s Llama family, and Moonshot AI’s Kimi K2.5 are now capable enough to serve as the foundation for clinical AI applications - and they can be self-hosted.

For healthcare organisations, self-hosting is not just a technical preference. It is a compliance and governance strategy. When you run your own model on your own infrastructure - or on EU-hosted infrastructure with clear data residency guarantees - you maintain full control over the data pipeline. Patient data never leaves your environment. You can audit every inference. You can version, test, and validate the model before it touches a production workflow.

The hardware economics have shifted to make this practical. High-performance inference can now run on a single server with modern GPUs, at a fraction of what it would have cost even two years ago. For a hospital group or a national health service, the total cost of ownership for a self-hosted clinical AI system is increasingly competitive with per-API-call pricing from the major cloud providers - with the added benefit of data sovereignty and auditability.

This matters enormously for validation. If you control the model, you control the testing regime. You can run it against your own clinical datasets, measure its performance against your own clinical standards, and demonstrate to regulators - and to your clinical governance board - exactly how the system behaves and where its limitations lie.

What good validation looks like

The challenge at HIMSS was not that validation is impossible. It is that many organisations are deploying first and validating later, or relying on the vendor’s own benchmarks rather than conducting independent clinical evaluation.

Good validation for healthcare AI agents should include several elements. The system should be tested against representative patient populations, not just the datasets it was trained on. There should be clear performance thresholds — and a defined process for what happens when the system falls below them. Human oversight should be meaningful, not ceremonial. And there must be ongoing monitoring in production, because an agent that performs well in testing may behave differently when exposed to the full variety and messiness of real clinical data.

For organisations in Ireland and the EU, the AI Act provides a framework for this. The high-risk requirements (risk management, data governance, human oversight, accuracy and robustness) are essentially a codification of what good clinical AI governance should look like anyway. The regulation is not inventing new burdens; it is formalising best practice.

Where Marino sits in this

At Marino, we have been building in healthcare for years. Our work on SMART on FHIR applications for clinical environments like the Rotunda Hospital, and our involvement in the DTIF-funded CellConnect consortium for advanced therapy supply chains, has given us a deep appreciation for the rigour that healthcare demands.

We are not against the adoption of AI agents in clinical settings. Quite the opposite. We believe they will be transformative. But we also believe that the speed of deployment must be matched by the rigour of validation - and that the architectural choices you make today, including whether to self-host and how to govern your AI pipeline, will determine whether your organisation is on the right side of the regulatory and patient safety line when the music stops.

The agents are here. The question is whether the accountability frameworks are keeping up.

If you are navigating AI adoption in a clinical or healthcare environment, get in touch at marinosoftware.com/get-in-touch.

!@THEqQUICKbBROWNfFXjJMPSvVLAZYDGgkyz&[%r{\"}mosx,4>6]|?'while(putc 3_0-~$.+=9/2^5;)<18*7and:`#

Need a Quote for a Project?

We’re ready to start the conversation however best suits you - on the phone at
+353 (0)1 833 7392 or by email