Co-authored with Dr Natasha Dole. Personal observations are SB’s unless otherwise noted.
Ambient AI scribes are arriving in NHS Emergency Departments (EDs). The demos are compelling, the efficiency gains sound transformative and procurement is gathering momentum. But has the evidence caught up?
This Stanford paper, by Preiksaitis et al. (Ann Emerg Med 2026), is the largest real-world ED adoption study of ambient AI scribing to date. It is worth reading carefully, not because it settles the question, but because it asks the right ones.
What kind of paper is this?
This was a retrospective observational study at a high-volume tertiary academic ED in the US, covering the first several months of implementing the DAX Copilot ambient AI scribe (Microsoft Nuance), integrated into Epic. Attending physicians could optionally use the tool encounter by encounter. Human scribe visits were excluded. Epic audit log data measured on-shift documentation time, total electronic health record (EHR) time and note length across 8,740 eligible encounters.
This is methodologically straightforward and considerably more informative than a satisfaction survey. However, it is not a causal study. The clinicians selected the encounters and the tool was not randomised. The paper tells us how clinicians used the tool and what happened to time metrics when they used it. It does not tell us whether the tool safely saves time across the ED.
What did they find?
Adoption was low and extraordinarily concentrated
Only 11.2% of encounters used ambient AI. And just 38% of physicians used it; nine individuals (the top 10% of users) accounted for 70.5% of all ambient encounters. Unsurprisingly, the median physician usage rate was zero.
This is what headline adoption claims often hide. A small group of super-users may get real value from a tool, while the median clinician never uses it. That may still be worthwhile, but it is a different business case from this will transform ED clerking. Sponsorship and enthusiasm are not the same as evidence.
Use clustered tightly in specific zones and acuity profiles
Ambient use concentrated in chair-based outpatient care, telemedicine and lower-acuity encounters. Telemedicine accounted for 7.7% of standard encounters but 27.4% of ambient encounters. Ambulance arrivals, higher-acuity patients and those needing interpreters were all significantly less likely to involve ambient AI.
This is the most important finding in the paper. The tool was not used randomly across the ED. It was used where the ED least resembles an ED: quieter, more linear, more one-to-one, less interrupted, lower acuity and more discharge-bound.
Modest time savings and shorter notes
Median on-shift documentation time fell from 3:50 to 2:45 minutes, a 28% reduction. Total EHR time was 16% shorter. Notes were around 900 characters shorter. While the authors suggest shorter notes may reflect elimination of macro templates, another explanation is just as plausible.
Clinicians used ambient AI for simpler, lower-acuity encounters that would have generated shorter notes anyway. The dataset cannot separate tool effect from case selection. That distinction is not academic. One explanation supports broad deployment; the other supports narrow, environment-matched use.
Are there any limitations?
Selection bias
Physicians chose the tool for encounters they expected to be straightforward: lower acuity, no interpreter, self-presenting, discharge-bound. The headline time saving therefore reflects an association under favourable selection rather than a generalisable ED productivity gain. Tool efficiency and case selection cannot be separated from this dataset.
Audit log methodology is blunt
Audit logs cannot reliably distinguish focused documentation from interruptions, multitasking, leaving the record open, or reviewing other data. They measure time the record was active, not time spent productively documenting.
Documentation quality was not assessed
The study measured time and length, not whether notes were accurate, complete, or safe. A faster note is not necessarily a better note. This is a substantial gap in a paper being read for procurement decisions.
Single site with mature integration
This was one US tertiary academic ED with DAX Copilot already integrated into Epic. Most NHS EDs operate on different EPR architecture without similar integration. And most are not academic centres with implementation support.
What does the wider evidence tell us?
Wellbeing is perhaps the strongest case for ED trials
Olson et al. (JAMA Netw Open 2025) found that 30 days of ambient AI use was associated with burnout dropping from 51.9% to 38.8% across 263 outpatient clinicians at six US health systems, with significant improvements in cognitive task load and after-hours documentation. ED has the highest burnout of any specialty and documentation burden is a genuine driver. Reducing it is a legitimate goal and the wellbeing case is the strongest argument for trialling ambient AI.
But does the published evidence support this in an ED setting? Available evidence only comes from outpatient settings that look nothing like majors or resus. Whether those gains translate to these high-acuity ED zones is still unknown.
Time savings vary widely across the literature
At the largest real-world implementation to date (Permanente Medical Group) the saving was just 18 seconds per appointment. Intermountain Health found no statistically significant productivity gain at all. Preiksaitis sits at the optimistic end of a highly heterogeneous evidence base, under the conditions most likely to inflate it: voluntary use, low acuity and enthusiastic early adopters.
Editing burden may not actually disappear
Morey et al. (Ann Emerg Med 2026) compared ambient AI with human scribes in an academic ED. Note quality was comparable in some adult cases, but physicians required more editing time with ambient AI, particularly for complex documentation. That is the workflow signal we cannot ignore. Ambient AI may not remove documentation labour. For now, it simply shifts it downstream into verification, correction, reorganisation and medicolegal ownership.
In complex ED cases, that editing burden may be exactly where the safety risk sits. A human scribe can ask for clarification, recognise uncertainty, flag missing information and learn the clinician’s pattern over time. Ambient AI produces a confident draft, but the confidence is form, not substance. A well-formatted note can still be wrong.
The quality gap and audio sensitivity
Reddy et al. (Ann Intern Med 2026) addressed the quality question Preiksaitis left unmeasured. This independent, vendor-neutral evaluation by the US Veterans Health Administration tested 11 ambient AI scribe products across five standardised primary care cases. Using blinded raters and a validated documentation quality instrument, they found human-generated notes scored higher across all 10 quality domains, with the largest deficits in thoroughness, organisation and usefulness.
Critically for our purposes, the two scenarios with the most degraded AI performance were the back pain case with substantial background noise and the chest pain case where both patient and clinician were masked. Pristine, close-range, one-to-one audio is not a description of most ED environments.
Although this is primary care evidence, the audio sensitivity finding maps directly onto the conditions of an ED. Noise, masks, interruptions, simultaneous conversations, distressed relatives, intoxication, pain, breathlessness and non-linear histories are not edge cases – they are Tuesday afternoon.
Barriers to scaling beyond outpatients
Ohde et al. (npj Digital Medicine 2026) examined barriers to scaling ambient AI scribes across diverse settings. Several findings bear directly on NHS EDs.
Current tools struggle with multi-source attribution: reliably identifying who said what across an overlapping conversation involving clinician, nurse, patient and family. The consequences of misattribution in a clinical note are not theoretical.
Their discussion of automation bias is particularly sobering. The risk of clinicians placing excessive trust in AI-generated text is likely greater, not less, in high-pressure environments. That is the opposite of what safe high-acuity deployment requires.
Ohde also notes that tools with summarisation capability raise distinct questions about classification as a medical device. Most of the commercial tools being marketed to NHS trusts are not pure transcription products. They summarise, structure and in some cases, suggest. The regulatory and governance implications of that remain unresolved.
They raise a consent issue rarely discussed in the ED context. Passive recording is straightforward when a patient can agree to it. But in resuscitations, obtunded patients and those arriving by ambulance in extremis, the assumption of consent cannot hold. Any NHS deployment framework needs explicit protocols for these scenarios, not just a general opt-in notice in the waiting room.
Coding and downstream effects
Dai et al. (npj Digital Medicine 2025) raise a concern that operates differently in NHS and US contexts but is worth understanding in both. In US fee-for-service settings, ambient AI scribes are increasingly marketed as coding intensity tools, with measurable uplifts in billing complexity at several major health systems.
In the NHS, Healthcare Resource Group coding drives activity data rather than direct revenue capture, so the incentive structure differs. But the underlying question remains: what happens to coding completeness when ambient AI is deployed across genuinely complex encounters without proper validation? The risk in an NHS ED is less upcoding for revenue and more subtle. AI-generated notes may make activity appear more complex, more complete, or more standardised than the clinical reality. That affects audit, commissioning, service evaluation and quality dashboards. If the note becomes data, and the data become performance truth, then note quality becomes system governance.
There is also an equity issue worth naming. EDs serve patients with language barriers, cognitive impairment, delirium, intoxication, safeguarding concerns, distress, trauma and variable health literacy. If ambient AI is selectively used for straightforward English-speaking patients while avoided in more complex communication contexts, the tool may widen the documentation-quality gap between easy and hard-to-hear patients. That does not mean we should not use it. It means we must measure subgroup performance.
What does this study tell us about ED as an environment?
The zone clustering data is the finding that deserves the most attention. Think about a colleague who works brilliantly in an outpatient setting but struggles with the rush and noise of majors and resus. The acoustic complexity, overlapping conversations, and non-linear encounter structure work against them. We do not conclude they are a poor clinician. We recognise the environment as the problem and think carefully about how we deploy them.
Ambient AI scribes have exactly the same environmental sensitivity. They perform best where conversations are linear, one-to-one and uninterrupted. The Preiksaitis data make this visible empirically: given free choice, clinicians deployed the ambient AI scribe precisely where we would predict it to work best in an ED.
The critical difference is that a colleague struggling in a chaotic environment knows it. The ambient AI scribe does not. Like a junior assistant who is competent but lacks initiative, it produces a confident, well-formatted note regardless of whether it has accurately captured the encounter. Clinicians in the busier, more chaotic parts of ED, may not have the time to scrutinise it carefully enough to catch the errors. That absence of self-awareness is what makes deployment a patient safety question, not just a workflow question.
What does this mean for NHS practice?
Integration is necessary but not sufficient
The technical infrastructure underpinning this study does not exist across most NHS EDs. Epic is an all-in-one system: notes, orders, audit logs and the ambient scribe integration all live in a single connected environment. NHS EDs typically operate on a patchwork of trust-specific systems, often poorly connected.
When ambient AI was trialled at my department, the tool produced a draft note in its own interface that then had to be manually copied and pasted into the EPR. That step is not trivial. Copy-paste transfer is a well-documented source of clinical documentation error.
The seamless workflow in this paper depends entirely on native integration that most NHS trusts cannot currently offer. That is changing, for sure. But even on a mature Epic installation, with DAX Copilot already embedded, adoption in this study was only 11%.
The adoption figures themselves demand scrutiny
Even with full institutional support and no barriers to use, 62% of physicians never engaged with the tool at all. The headline benefit figures come from a self-selected minority applying the tool to their easiest cases. Scaling that to a diverse NHS workforce across multiple sites requires assumptions the data simply do not support yet.
The interpreter findings reflect the same equity issue. Encounters requiring interpreters were systematically avoided by physician choice, not protocol. In an NHS context where language diversity is greater and interpreter access already stretched, that is not a footnote.
Environment-matching rather than blanket rollout
Preiksaitis does not show ambient AI is safe in low-acuity zones; it shows clinicians chose to use it there. The wider evidence, including Reddy’s primary care audio findings, points toward the same direction. If NHS trusts are going to pilot ambient AI tools, the evidence points toward areas that most resemble outpatient settings: minor illness and injury zones, same-day emergency care and urgent treatment centres. Not resus, majors or other high-acuity zones. Deploy it where ED behaves more like outpatients, because that is where it performs as intended, not as a liability.
The bottom line
This paper, and the other five, tell a consistent story: ambient AI scribes are promising, but their value is conditional. Used where the ED behaves like outpatients, ambient AI may reduce cognitive load and make documentation less painful. Deployed into the noisiest, most complex parts of the ED because the demo looked good, is not innovation. It simply outsources risk to the signed clinician.
The message for NHS EDs is straightforward. Do not let the headlines outrun the evidence. Instead, watch carefully, pilot narrowly and match the technology to the environment.
References
- Preiksaitis C, et al. Ambient Artificial Intelligence Scribe Adoption and Documentation Time in the Emergency Department. Ann Emerg Med. 2026 May;87(5):569-574. https://doi.org/10.1016/j.annemergmed.2025.12.017
- Olson KD, et al. Use of Ambient AI Scribes to Reduce Administrative Burden and Professional Burnout. JAMA Netw Open. 2025;8(10):e2534976. https://doi.org/10.1001/jamanetworkopen.2025.34976
- Morey J, et al. Ambient artificial intelligence versus human scribes in the emergency department. Ann Emerg Med. 2026 May;87(5):561-568. https://doi.org/10.1016/j.annemergmed.2025.10.006
- Reddy A, et al. Rapid Evaluation of Artificial Intelligence Technology Used for Ambient Dictation in Primary Care: Comparing the Quality of Documentation of AI-Generated and Human-Produced Clinical Notes. Ann Intern Med. 2026 Apr. https://doi.org/10.7326/ANNALS-25-02772
- Ohde JW, et al. Barriers and opportunities of scaling ambient AI scribes for clinical documentation across diverse healthcare settings. NPJ Digit Med. 2026 Mar. https://doi.org/10.1038/s41746-026-02554-0
- Dai T, et al. Policy brief: ambient AI scribes and the coding arms race. NPJ Digit Med. 2025;8:780. https://doi.org/10.1038/s41746-025-02272-z

