Who decides when AI is safe enough?

Balancing the technical and ethical sides of assuring the safety of AI in the complex world of healthcare

Every few weeks Professor Tom Lawton conducts an experiment. In his role as a consultant in the intensive care unit at Bradford Teaching Hospitals NHS Foundation Trust, he tries out an automated system designed to wean a patient off their ventilator.

"It has become almost a joke now," he says. "The longest I’ve ever left the system running is four hours. Sooner or later, it does something that does not make sense to me. I don’t know how it has come to that decision, and the system isn’t able to explain its thinking, so I end up just turning it off."

For Professor Lawton and for the whole of the NHS, the question of whether or not to use this kind of autonomous system is one of safety. Can we trust this technology to make the right decisions about a patient’s treatment?

In particular, can we trust it with the kind of decisions that have to be made in an intensive care unit – such as whether to wean a patient off a ventilator, or how to treat a case of sepsis – where the medical community has yet to define what the optimum treatment is?

Patient in a hospital bed

Allied to the issue of trust is the question of responsibility. What if we start using this technology and something goes wrong? Who would be responsible? The doctor? The NHS? The regulator? The software engineer?

These are vexed questions that range across the fields of medicine, computer science, ethics, data science, philosophy, psychology, law and beyond. And the University of York is leading the search for answers through a £12-million initiative called the Assuring Autonomy International Programme (AAIP) funded by Lloyd’s Register Foundation.

Coming up with answers to these sorts of questions means tackling lots of issues. So from the start we have worked hard to harness multiple disciplines. It’s too easy for different specialists to say that a particular issue is not their problem – that you need to talk to a lawyer or a doctor or a psychologist instead. So, when it came to the question of ‘responsibility’, I decided to put every possible specialist in a room together so that none of us would have that excuse. It was up to us – collectively – to come up with a credible response.
Professor Ibrahim Habli, Assuring Autonomy International Programme
Professor Ibrahim Habli

Professor Ibrahim Habli

Professor Ibrahim Habli

One of the people at this 2018 round-table meeting was Dr Zoe Porter, an Applied Ethicist in the AAIP, with a background in philosophy. She remembers being surprised by what she heard.

My research is focussed on the location of moral responsibility, which has led me to explore the ‘responsibility gaps’ that can be found in autonomous systems. What surprised me was that people working in other disciplines had identified the same kind of gaps – both in law and safety assurance. Their academic culture and language were often different, but there was plenty of common ground.
Dr Zoe Porter, Assuring Autonomy International Programme
Dr Zoe Porter

Dr Zoe Porter

Dr Zoe Porter

The immediate upshot of the meeting was a paper called ‘Mind the Gaps’ – co-authored by Dr Porter and published in the journal Artificial Intelligence. It brought greater precision to the categorisation of the risks associated with autonomous systems, with a view to designing a more effective method of managing these risks and so assuring the safety of these systems to the satisfaction of all. The AAIP has since set about building on this foundational work.

Another of the people at the round-table meeting, and a co-author of the ‘Mind the Gaps’ paper, was Professor Lawton – this time in his role as head of clinical AI at the Bradford NHS Trust. Joining the meeting remotely after a stint in surgery, he remembers worrying that it might not prove productive.

It’s a classic problem in the NHS. Someone in an IT company or a university wants to develop a new device or system, and they contact clinicians for their opinions. Sometimes the doctors say they’re too busy to help, so the IT company ends up developing something that turns out to be useless. Other times, there’ll be a meeting. We’ll identify a problem, discuss a solution, circulate minutes, and the company will go away for six months only to return with something equally as useless. You can see that they started from the right place but at some point, generally because of some small misunderstanding about a medical factor, they have ended up veering wildly off course.
Professor Tom Lawton, Bradford Teaching Hospitals NHS Foundation Trust
Professor Tom Lawton

Professor Tom Lawton

Professor Tom Lawton

Fortunately, Professor Lawton discovered that the AAIP works differently. “Everyone here really walks the walk when it comes to being interdisciplinary,” he explains. “The process of collaboration extends well beyond that first kick-off meeting, and that’s because the barriers to collaboration have been lowered. People feel able to get in touch with each other whenever they need to. Whether it’s a quick phone call, text message, email, WhatsApp, or Google Doc comment – the research benefits from a kind of constant course correction. There’s no veering off on a tangent.”

AAIP Research Fellow Dr Yan Jia agrees. “I texted Tom a quick question just yesterday, and he was able to give me an answer straight away. It’s great to have such close and regular interaction.”

Dr Jia, Professor Lawton, and Professor Habli have collaborated on several papers including two studies that have made important strides in the assurance of autonomous systems – one relating to the treatment of sepsis, the other to weaning patients off ventilators.

“When it comes to ventilators,” explains Professor Lawton, “there are risks to the patient from weaning them off too soon and from waiting too long. There’s no surefire method to identify the optimum time so I asked Yan whether there might be anything in the data that could help.”

We used a machine learning algorithm to draw lessons from the data of a set of patients who had been successfully weaned in the past. Then we asked the algorithm to apply what it had learned to analyse the equivalent clinical factors for another set of patients in order to give us two pieces of information: a prediction of whether a patient will be ready to be weaned successfully in the next hour; and how much each clinical factor contributed to this prediction.
Dr Yan Jia, Assuring Autonomy International Programme
Dr Yan Jia

Dr Yan Jia

Dr Yan Jia

“I really liked what Yan did,” says Professor Lawton. “First of all, the prediction provided by her algorithm gives the clinician a one-hour window to consider its recommendation and carry out further tests to double-check it. Secondly, by providing contribution scores the autonomous system is effectively explaining its working. It is pointing out what it considers to be the most important factors for a particular patient, which can be a valuable prompt for the clinician. Both measures make it much easier for the clinician to learn to trust the autonomous system.”

Dr Jia also added a third novel function to her algorithm – the ability to answer counterfactual questions, otherwise known as ‘what ifs’. This allows the clinician to find out which clinical factors would have to change for the system to alter its recommendation. What if we change X or Y? Would that mean that the patient would be ready for weaning?

“This counterfactual approach works quite nicely for me as a clinician,” says Professor Lawton, “because it flags up the key clinical factors that would have to change. And sometimes that could prompt me to consider ways of getting the patient to that state. Again, this extra layer of transparency and functionality makes me far more likely to trust the system.”

A patient on a ventilator in a hospital

Trust is vital to the successful introduction of autonomous systems to healthcare settings. “We need to know that a system is acceptably safe before it is deployed,” says Professor Habli. “That means establishing a trusted method of assuring a system’s safety, proving the efficacy of this method through real-world test cases, and then sharing this methodology as a publicly available resource. That’s one of our central aims at the AAIP.”

As a Safety Engineer and an Artificial Intelligence/Machine Learning Researcher at NHS Digital, Shakir Laher is also working towards the same goal. That being, under the auspices of the AAIP, to develop safety assurance methodologies for machine learning that can be employed in the healthcare domain.

“York is at the forefront of research in this area,” says Shakir, a former computer science teacher who switched careers after studying under Professor Habli’s supervision to gain an MSc in Computing from the University. “They have the expertise we need at NHS Digital when it comes to assuring safety of machine learning.”

It is perhaps no coincidence that, like Shakir, many of those associated with the AAIP combine specialist knowledge with a more eclectic background. Ibrahim Habli did his first degree in computer science but spent much of his time in the philosophy department. Tom Lawton has a degree in medicine and philosophy, and paid his way through medical school by taking on programming work. Zoe Porter had a previous career as a speechwriter at the Equality and Human Rights Commission. And in her five years at York, Yan Jia gained a Masters in the Department of Electronic Engineering before switching to the Department of Computer Science to study for her PhD.

I like the way that the AAIP doesn’t have a single overly simplified focus. Healthcare is complex, machine learning is complex, and putting them together is really complex. The multi-angled approach of the AAIP has been vital in identifying the gaps in our knowledge so that we can understand the challenges we face and work towards a solution that has safety running through its core.
Shakir Laher, NHS Digital
Shakir Laher

This final point is perhaps the AAIP’s biggest point of difference. “The way things work in healthcare at the moment,” says Professor Lawton, “is that people come to us with some kind of autonomous system and we ask to see their safety case – some evidence that whatever they have designed is safe. And they say they’ll go away and get one. But that’s not right. Safety can’t be an afterthought. It must be built in from the start, and every developer should be able to provide a safety case that meets our requirements.”

Through its continuing collaboration with the NHS, the AAIP hopes to see the effectiveness of autonomous systems being demonstrated in local hospitals within the next five years. And this time, thanks to the rigour of the AAIP’s safety assurance methodologies, the emphasis on safety throughout the design and actual use of these new systems, and the introduction of greater transparency and explainability to their decision-making processes, it is hoped that clinicians like Professor Lawton will be far more likely to trust them.

And with each assured new system, it is hoped that there will eventually come a time when clinicians are only too happy to switch these systems on – and leave them on.