AI & Medicine: Saviour or Snake Oil?

Air France Flight 447 takes off at 1930hrs on the final day of May in 2009, from Rio de Janeiro en route to Paris, carrying 228 passengers. In charge is Captain Marc Dubois, a seasoned pilot with over 20 years and 11,000 flying hours, but today he assumes the role of Pilot Non-Flying for this flight. The Pilot Flying is Pierre-Cedric Bonner, boasting approximately 3,000 flying hours. After take-off and four minutes into the flight, Bonner engages the autopilot on the state-of-the-art Airbus A330.

Thirty minutes later, the A330 cruises at around Mach 0.82 (approximately 540 mph) at 35,000 feet, nearing the limits for this aircraft. Flying at this speed and altitude requires precision, maintaining a narrow angle of attack in a slightly nose-up position to allow the thin air to maintain lift. The autopilot, a sophisticated system, handles data assimilation and adjustments seamlessly; it’s so sophisticated that even if pilots try something silly, like fly the plane dangerously, it will step in… as long as it’s on. In aviation, there’s a saying: The ideal cockpit setup is a computer to fly the plane, a pilot to monitor the computer, and a dog to bite the pilot if they try to touch the computer. However, this narrative isn’t about computer failure; it’s about human failure caused by our reliance on computers.

As the crew observes an impending storm, Bonner expresses nervousness and suggests climbing higher to avoid it. However, Dubois remains unconcerned, stating, “Ce n’est pas un problème” (“It’s not a problem”). He is so unconcerned that Dubois decides to rest and hands over control to the relief pilot David Roberts. Despite Roberts’ 6,000 flying hours, he moved into executive management and is on this flight merely to maintain his pilot currency.

Unbeknownst to the crew, ice crystals begin forming in the pressure tubes on the plane’s nose, used to measure airspeed. At 23:10 hours, these tubes block completely with the loss of data causing the autopilot to disengage. The plane still maintains its course and speed but buffeted a little by air turbulence. All Bonner and Roberts need to do is work the problem without interfering.

Instead, Bonner panics. Reacting to a negligible drop in altitude, he pulls back sharply on the joystick, causing the plane’s nose to rise and the angle of attack to increase. This action leads to a stall, with automated warnings blaring in the cockpit. Communication breaks down, and Bonner, unaware of the severity, continues pulling back on the joystick, exacerbating the problem. Dubois is called back to the cockpit yet struggles to grasp the situation amid poor communication.

The aircraft descends, nose up at 16 degrees but falling from the sky at 11,000 feet per minute. By the time the other pilots realise what Bonner is doing it’s too late to recover from the stall. At 23:14 hours, Air France Flight 447 crashes into the ocean, resulting in the instant loss of all 228 lives on board.

So, how does a plane crash from 14 years ago relate to AI in medicine? We’ll explore this connection shortly, but first, let’s delve into what AI actually is.

Understanding the Essence of Artificial Intelligence

If you’ve ever followed a diagnostic or treatment pathway or encountered scoring systems in medicine (which applies to all of us), you’ve brushed shoulders with algorithms and probability in your daily medical routine. These are essentially human-devised rule-based algorithms, resembling simple decision trees. However, these algorithms, though widely used, lack the true essence of artificial intelligence and can sometimes fall short in exhibiting intelligence.

AI, or as the tech enthusiasts call it, Machine Learning Algorithms are not your typical set of rules; instead, it operates as a neural network of switches that starts off knowing absolutely nothing. It needs data — lots and lots of it. Think of it less like a decision tree and more like a dynamic random forest. Let’s say our goal is to create AI capable of detecting STEMIs in ECGs. We write some code, throw in a few basic rules, and feed it millions of ECGs, some showing STEMIs and others not. Initially a blank slate, the AI begins with random guesses, reminiscent of a medical student in their early days. But, over time, it evolves, studying the intricate patterns within the ECG waves and fine-tuning its neural network of switches. From clueless random guesses, it transforms through an educated guesser, much like a medical student progressing through their studies, until it becomes an expert, confidently declaring with a high probability, whether an ECG shows a STEMI or not.

The intriguing part? The machine essentially teaches itself, and we’re often left in the dark about its learning process and decision-making criteria. It might sound a bit worrying, but in many ways, it mirrors how we, as humans, pick up certain skills. Consider distinguishing between a table and a stool — it’s not just about the number of legs or the height. It’s an intuitive process fuelled by experience and probability. And that’s where the heart of AI lies: probability. An algorithm doesn’t offer a guaranteed answer; it simply tells you what’s most probable based on the data it has, employing a concept known as Bayesian Inference.

Bayesian Inference

Imagine you have a massive jar filled with a million marbles of various colours. Among them, you spot a significant number of red marbles and want to figure out just how many there are in total. Counting them all seems like a tedious task, so you decide to grab a handful of 10 marbles, count the red ones, and then multiply by 100,000 for a quick estimate. But your first handful might not have any red marbles, throwing off your initial assumption – there clearly are red marbles in there. Undeterred, you take another handful, finding 5 out of 10 marbles are red. The jar definitely doesn’t have 50% red marbles. Now armed with two different estimates — zero red marbles or half the jar being red — you calculate that 25% of the jar is likely red based on these samples. As you keep sampling and refining, your estimate becomes more precise until with each handful the percentage of red marbles barely changes and you arrive at an answer with a high degree of probability.

AI Has Been With Us for a Some Time

If you have a smart phone, if you have a loyalty card, if you have a social media account, if you have shopped at Amazon, if you have a modern car, if you have used Google you have interacted with an Ai Machine Learning algorithm. Yet for some reason have we only started getting worried about Ai since ChatGPT became mainstream. This is possibly because ChatGPT can seem so mind blowing at times, yet at the heart of it is the same tech that predicts what you are going to type next every time you send a message on your phone, known as a Large Language Model (LLM). If you want to learn more about ChatGPT’s workings have a read of Stevan’s excellent invert view with it here: https://www.stemlynsblog.org/tag/chatgpt/

ChatGPT saves me many hours of sitting in front of a computer, trying to word things just right. It’s like having my own personal assistant and sounding board.

If you’ve not used ChatGPT before sign up for free here: https://openai.com/ and try typing the following in to it:

‘Please write an email to an underperforming student explaining how she needs to improve her timekeeping and the importance of this’.

‘Please write a short reference for Ms ***** ***** who is a final year medical student at Manchester University recommending her for a position of healthcare support worker. She is a good, enthusiastic, hardworking student’ [follow-up: ‘please make it less enthusiastic’]

‘Please write a how to guide for the cranial nerve examination’.

‘Please write a business case for a new tutor because we have additional students this year brining in addition income to the department’.

‘Please write an email to a student saying that I am so sorry you have failed your exam and offering support’.

‘Please create a lesson plan for an intramuscular injection workshop to second year medical students’.

‘Please can you make 10 flashcards on Acute Coronary Syndromes for final year medical students’ [follow-up: ‘can you make each part of the information more detailed’] [ follow-up: ‘can you please produce this in a table format’]

‘Please produce a revision timetable for a final year medical student’.

‘Please give me 10 ideas for a blog on the use of AI in medicine’.

‘Please give explain basin inference simplistically and provide me with an example’.

Whilst I use ChatGPT a lot, I always read through, check for errors and often reword things – It is a time saving device, a prediction machine, not an all-knowing brain.

Navigating the AI Revolution in Academia

There are lots of concerns in academic circles about these types of Large Language Models (LLMs) writing essays for students, but I am less concerned than many in the field. LLMs are trained on the freely available content of the internet and don’t include anything behind a paywall so the essays it produces are mostly on a superficial level rather than a deep delve with a well-constructed argument. LLMs also suffer from hallucinations (making stuff up) which will definitely get any student a low mark. I also don’t believe that most essays are currently written in the manner that they should be anyway. How many of have at one time written an essay based on a deep dive in to the all the available literature, weighing up all the arguments for and against and come to a conclusion based on that as opposed to writing down our preconceived ideas and then searching for literature that supports our view?

One of my colleagues recently used a LLM to reduce her word count on her essay and I have no issues with that. I am quite happy for my trainees to use Ai to help them write reflective piece – they still must input the data about what they did and learned, it just wraps it up and articulates it in a nice package. It may be argued that there is value in learning how to do this yourself and maybe this is true however, if the future is going to be Ai (and be assured it will be), then it is learning for the sake of learning and not for any practical application.

Once LLMs improve and are given access to work currently behind paywalls, they will be able to write quality essays, and this will be harder to police. One way around this might be to viva the student on the topic (i.e. defend your essay). Whilst this would take time to do, this time could come from time saved by getting Ai to effectively mark, critique and summarise the students essay instead of being marked by a tutor. As with many things to do with AI we must learn to adapt.

Is the Era of AI Doctors Upon Us?

Contemplating a future where an all-encompassing AI diagnostician takes the reins might seem within reach, but the reality suggests otherwise, at least not in the near future. Medicine is rife with complexities and nuances, making it a formidable challenge. While achieving 80% accuracy may be within our grasp, tackling the last 20% is always the real test, demanding about 80% of the effort. It’s reminiscent of the perpetual promise of a self-driving car, always just two years away for the past decade. Despite the existence of self-driving taxis adhering to specific city limits, the dream of a universally self-driving automobile remains elusive, and driving a car is relatively simple when compared to taking medicine.

At the core of this challenge is AI’s insatiable appetite for data, the more complex the subject matter, the more data it needs. However, the snag lies in obtaining and utilising substantial amounts of patient data, encumbered by the constraints of confidentiality and the absence of a consolidated databases with hospitals relying on disparate systems that often don’t communicate. Important historical patient data is predominantly stored in archaic paper formats which adds another layer of complexity.

Delving into old medical notes reveals inaccuracies, outdated treatments, presumptions, historical biases, and errors. If this flawed data becomes the fodder for AI learning, it adheres to the principle of “garbage in, garbage out.” While rectifying these issues is conceivable through meticulous review, the responsibility falls on doctors, who are already stretched thin attending to their primary duties.

Beyond these hurdles lies a potentially more significant challenge: the acceptance of AI doctors. Operating on probabilities rather than certainties, algorithms are fallible and human trust in algorithms falters when their imperfections are exposed. Consider the media outcry when a Tesla in “Autopilot” mode crashes, overshadowing the numerous lives saved by the collision avoidance system. Despite overwhelming evidence that in every scenario where calculations are involved, algorithms produce better overall outcomes, there’s a prevailing preference for human judgment.

We, too, function akin to artificial intelligence devices, with our decision-making resembling that of a computer. While it is straightforward to recognise a patient with a rigid abdomen as necessitating surgery, the intricacies involved in assessing a non-differentiated patient exhibiting vague tenderness and marginally elevated blood markers place us in a situation where certainty is elusive, and there is a risk of making incorrect decisions.

So, while the tantalising prospect of AI doctors is on the horizon, the journey to widespread acceptance and implementation is riddled with complexities that extend far beyond the capabilities of the algorithms themselves.

Where Artificial Intelligence Excels

AI’s forte lies in its remarkable ability for pattern recognition. Whether scrutinising X-rays, ECGs, or cells on a slide, AI can swiftly analyse thousands of images without succumbing to fatigue, boredom, or distraction. This presents a promising avenue for eradicating human errors associated with such tasks. Moreover, AI operates tirelessly, 24/7, devoid of breaks, biological needs, or the constraints of holidays, enabling a seamless workflow.

Consider a future where waiting for X-ray or CT scan reports becomes a thing of the past, and medical trainees no longer require senior approval for every ECG. AI’s efficiency shines as it navigates through the familiar and, when faced with unfamiliar data, smartly flags it for the attention of human specialists, whether Radiologists, Cardiologists, or Pathologists. The vision is clear: AI collaborates with humans to enhance efficiency, ultimately saving precious time in the medical realm.

Returning to Flight 447

Captain Dubois, despite logging 346 hours in the last six months, only engaged in 15 take-offs and 18 landings. Assuming four minutes per take-off or landing, this equates to merely 2 hours of actual flying every six months. The rest of his time was spent monitoring controls, adjusting the occasional dial, and attending to other non-flying tasks. Bonner had a similar experience, while Roberts had even less.

Despite Bonin’s nearly 3000 flying hours, most of that time was spent letting the A330 autopilot handle the flying. Consequently, his hands-on flying experience was limited. When faced with a challenging situation that typically fell within the autopilot’s purview, he lacked the hands-on knowledge and experience to respond effectively. There was no mental schema saying, ‘I’ve encountered this before; here’s what I should do.’

While automation and AI have significantly improved flight safety, they also present unique challenges. Similar concerns arise in the context of AI in medicine. What happens when the algorithm is stumped and leaves the decision to a human, akin to a Gallic shrug saying, “Je ne sais pas, over to you”?

Certainly, AI holds the promise of enhancing patient outcomes, much like it has improved aviation safety. However, with progress come occasional setbacks. Just as Flight 447 was a sobering reminder in aviation, the growth of AI in medicine raises questions about the potential pitfalls.

A Difficult Challenge Ahead?

Future doctors, relying heavily on AI, may lack the experiential knowledge needed to handle challenging situations. Unlike the traditional path where doctors accumulate experience over time, AI-driven scenarios may leave them without a wealth of varied experiences to draw upon when needed most.

Laparoscopic and robotic surgery is being developed and has its value, the real challenge comes with the patient who has a frozen abdomen due to multiple adhesions and previous surgery and that the human surgeon is required to pick this apart with skills and insights that are not available in the laparoscopic domain and yet those skills are developed out with the laparoscopic domain.

Similarly, if an AI diagnostician responds with a ‘computer says no,’ physicians may find themselves ill-equipped to step in. Their reliance on the computer’s conclusions during routine consultations may leave them without the clinical reasoning skills developed through diverse experiences.

As we embrace the benefits of AI in medicine, we must ponder whether a smooth handover from AI to human in critical moments is truly reassuring or a reason to reflect on the evolving nature of medical expertise.

Ai only has to be better than the average doctor for it to be better for the population as a whole, yet who reading this would describe themselves as a below average Emergency Medicine Doctor (obviously, reading St Emlyns Blog immediately puts you above average, so let me put it another way, would you describe yourself as a below average St Emlyns blog reader).

Here is where we have another conundrum: as with automation in aviation, Ai in medicine will save more lives overall but there will be a price to pay. Old fashioned, fly by the seat of your pants, pilots might never have made the simple error that caused the crash of flight 447 because the experience they gained from 1000s of hours of manual flying wouldn’t have left them in the situation where they didn’t know what to do. It therefore follows that, although Ai’s introduction in to medicine will save more lives overall, in the future a small number of patients will die who could have been saved by the doctors of today – and that I feel that is going to be our biggest challenge ahead.

So for a final twist this entire blog was written using a collaboration between human (me) and Ai (ChatGPT). Had I relied on purely the Ai you would have got a pretty short, superficial blog. Had I relied purely on me you would have got over 3500 rambling words on the subject. By working together with my ideas and its articulation we were able to come up with a (hopefully) coherent and interesting blog – but it won’t have made me a better writer!

Thanks to everyone a St Emlyns for their additional ideas on this blog!

Disclaimer: This blog is my personal view on how Ai is starting to influence teaching, learning, clinical practice and assessment. It is my view alone and not that of my employers/university who are developing their own strategies for dealing with Ai in education, learning and assessment. Ai is a fast moving area where progress and regulation may appear out of sync, and so it will be interesting to see if my observations, views and predictions are widely agreed, and/or come to pass.

Victoria Brazil

December 26, 2023 at 7:43 pm

Lovely post, thank you Nick ( and Chat GPT)
Just the latest example of the tension between training for ‘old skills ‘ ( DL, open appendectomies, landmark central lines) versus getting better at newer, superior techniques that might occasionally fail/ not be available – like the Tesla autopilot or the aircraft autopilot in your story
vb

Simon Carley
December 28, 2023 at 6:33 pm

Much truth in this, as all new technologies appear to be feared initially, with skeptics disproportionately looking for any flaws.