What is the 'shadow formulary' in healthcare AI?

The shadow formulary is the set of general-purpose AI models (ChatGPT, Claude, Gemini) already in everyday clinical use that no hospital formally procured, vetted, or approved. Unlike every drug, which passes a formulary before reaching a patient, these tools are used at the bedside ungoverned, unlogged, and invisible to governance committees because nobody bought them. Emergency physician Dr Ömer Atlı argues you cannot ban them, so you have to govern them.

What is the most dangerous failure mode of clinical AI answers?

According to Dr Ömer Atlı, the failure that matters most is not a wrong fact but the 'confidently incomplete answer' — fluent, calm, and missing the one step that gets someone hurt. A tired clinician reads fluency as competence, which is why incomplete answers are more dangerous than obviously wrong ones.

Why “Academic” AI Models Fail the 3 AM Real-World Stress Test

Episode 33 June 23, 2026 37 min

Listen on: Apple Podcasts Spotify Buzzsprout

with Dr. Ömer Atlı

Every hospital runs a drug formulary, yet the most-used clinical tool on a night shift is in no formulary at all: whatever AI model the patient or junior doctor already opened on their phone. Emergency physician Dr Ömer Atlı calls it the shadow formulary, and makes the case for governing the AI nobody bought.

Show Notes

Every drug in a hospital passes a formulary before it reaches a patient. The most-used clinical tool on a night shift passes nothing: it is whatever general-purpose AI model the patient, or the junior doctor, already opened on a phone before the attending walked into the room. Dr Ömer Atlı, an emergency physician and healthcare-AI builder working across the UK and a high-volume, resource-limited ED, calls this the shadow formulary — ungoverned, unlogged, approved by no one, and invisible to the governance committee because nobody ever bought it. He joins Chris Hutchins to argue that you cannot ban the shadow formulary, so you have to govern it.

What We Cover

The model nobody procured — what happens when the patient or the junior doctor has already consulted AI before the physician arrives, and how that quietly changes the consultation
Bench-testing the shadow formulary — running the questions people actually ask at 3am, verbatim, through ChatGPT, Claude, and Gemini, then grading each answer as the physician who would be liable for acting on it
Fluent but wrong — the "confidently incomplete answer" and why a calm, authoritative reply that omits the one step that matters is more dangerous than an obviously wrong one
Governance that actually runs — re-testing models like a drug interaction every time they update, and a named Clinical Safety Officer who owns the risk

Key Takeaways

You cannot ban the shadow formulary, so you have to govern it. Ungoverned general-purpose AI is already in routine clinical use; pretending otherwise is the real governance failure.
The dangerous failure is fluency, not error. A tired brain reads fluency as competence; the confidently incomplete answer slips past exactly when judgment is most depleted.
The cheapest governance in healthcare is a clinician re-running the same scenarios every time a model updates — the same discipline used to re-check a drug interaction.

Frameworks & Ideas Mentioned

The "shadow formulary" as a model for ungoverned clinical AI
Physician-graded bench-testing of general-purpose models on real 3am scenarios
The "confidently incomplete answer" failure mode
Model re-testing on each update, treated like a drug-interaction re-check
A named Clinical Safety Officer who owns AI risk (UK practice)

Chapters

00:00 – The 3 AM benchmark vs. the 3 PM academic lab
02:15 – Why theoretical AI fails operational edge cases
06:30 – The “Shadow Formulary”: Unmanaged AI in your organization tonight
09:45 – Hallucinations vs. Lethal Omissions: The true liability shift
15:20 – Designing “Resource-Aware” software for critical environments
21:10 – Modern CISO Governance: Transitioning from bans to live controls
27:45 – The Clinical Safety Officer: Who owns your digital risk?

About Dr. Ömer Atlı

Dr Ömer Atlı is an emergency physician and healthcare-AI builder working across the UK and a high-volume, resource-limited emergency department, where he sees clinical AI from two seats at once: the clinician using it at 3am and the builder shipping it. He also writes and reviews clinical content, giving him a reviewer's view of LLM accuracy and hallucination control.

Related Resources

Episode: AI Regulation in the ER and Clinical Judgment: Designed for 3 AM, Not 3 PM with Dr. Natasha Dole
Episode: Healthcare AI Fails at the Data Layer with Sid Dutta
Topic: Clinical AI & Patient Care
From Hutchins Data Strategy Consultants: Responsible AI in Healthcare

▶ Full Episode Transcript

Dr. Ömer Atlı: Yeah, okay. Yeah, thank you. well I've done

Chris Hutchins: Dr. Atlı, welcome to the Signal Room. You did your UK work, and now you're the solo on call doctor at a resource limited center. Two nurses, a midwife, around 80 patients, the narrow scanner 85 kilometers away. Why is that vantage point why is it that that vantage point is the one you wanted to bring to the conversation? And how do you see the tools from these two seats all at the same time? The clinician who's Using them and the builder who tests them.

Dr. Ömer Atlı: Yeah, okay. Thank you so much, Chris. really an honor to be on this show on this podcast. So yeah. Well before I dive in and speak about anything, I would like the audience to know who is talking, so it will change everything I'm about to say. So yeah, I am an emergency physician currently in Turkey. I did some UK work then I'm at the moment practicing where I'm from. So I am solo on call physician at a rural district hospital. So it's patient facing Tim is me, two nurses, a midwife and and a midwife, and around eighty patients a shift. So the nearest and the nearest scanner, I mean the tertiary centre is eighty five kilometers away. And I have got an ECG atrophony, my hands and my own judgments. And I have used these AI tools the almost the whole time I've been there. So yeah, my point of view is plainly most of the AI in medicine conversation is built for a hospital. Almost nobody works in the US Academic Center. But a huge amount of the world's medicine looks much closer to mine, I believe. So I'll argue something that I think the field has backwards. If an AI can survive my district rural hospital, it isn't a healthcare. It's it's it isn't a healthcare AI. It is AI for rich healthcare. So I think we can do better than that.

Chris Hutchins: And we we absolutely need to. in in my my own career journey, I I know that I've I've seen the solution first approach really make make a mess of things and make things worse instead of making them better. so I really appreciate the perspective you bring. I think there's a lot of us that really need to hear and and pay a lot more attention to what you're actually dealing with. I don't know that there's any more challenging place to practice medicine than than what you're doing now, you know, in as an emergency in an emergency room situation. You call the AI already in clinical use the shadow formula, which is interesting term to me. Tell me a little bit about that and what does it look like? And then maybe talk a little bit about what it's like in the most chaotic moments in the ER.

Dr. Ömer Atlı: yeah. Yeah, thank you. So yeah, before that I think you asked about that two seats at once and I believe I didn't answer that. So I would like to talk about that one before if it's okay for you, then we can talk about others. So yeah, about two seats, the clinician and the builder. So the clinician seat all of course came first and it came hard as a one doctor, like I said, eighty patient, no scanner. You learn to examine as if the scanner is never coming.

Chris Hutchins: Excellent. Yeah.

Dr. Ömer Atlı: because usually it isn't actually and scarcity didn't make us didn't make us this rural physicians worse. I believe it just made us sharper and it grew the senses that I now used to judge these tools actually. And the builder seats came because I got impatient honestly with tools built for hospitals that I don't work in. So I started prototyping my own like triad R eggs, those type of prototypes I haven't built haven't shipped anything, sorry. But it's just my own prototypes and I started red teaming other people's tools. And well that's it. I am not a vendor, nothing is deployed yet. There's just at the moment I'm trying to learn how what's actually hard and what's not. And to me what each sits misses is well from the builder's chair, yeah, the demonstration always works. It's a clean input, a calm patience, one tidy, orderly question. but from my chair, actually nothing is clean because the story comes in the wrong order and half the data doesn't exist. The patient's frightens and I am nineteen hours in. So almost every tool was born in the first room and has never set foot in the second. So the gap between those the gap between those two actually is where patients can get hurt and I am and the second room is the only one I work in actually. And yeah.

Chris Hutchins: that's I I I'm kind of taken back by it. I mean, I I guess I didn't really realize that there was such a a thin r staffing ma model in in in the ER scenario with eighty eighty patients. Is that I mean i is that typical in in the most hospitals that you that you've observed?

Dr. Ömer Atlı: Yeah, actually it's one of the it's one of the calmer hospitals because it's in one of the calmer rural h district hospitals. There are some districts where you have to see over one hundred fifty as closer to two hundred, so of course then you're sometimes one or two and you can sometimes change the num numbers based on shifts heavy or no. So yeah, but at it's at some of our the district hospitals, of course, solo c on call physician.

Chris Hutchins: I I know I'm kinda we got ahead of myself, but I love to hear what you what you mean when you when you talk about the shadow formulary. and this you know, we we have again we get ahead of ourselves, we put things into production before we've actually done the due diligence and had conversations and like really observed what it's like for you in a real a real life situation. So if you can talk to me about what what what does it mean when you when you say shadow formulary and what does it look like

Dr. Ömer Atlı: Yeah for yeah. Yeah, yeah. Yeah. Yeah.

Chris Hutchins: And you know, on a negative model is effectively the only console that you can get.

Dr. Ömer Atlı: Yeah. So yeah, shadow formula is actually a term I made up myself, but because you know, in hospitals we have this formulary, formulary type of procedure or what you call a formal thing actually. So every hospital runs a drug formula, so the approved list of drugs. So nothing reaches the patients until it's vetted, dosed, signed off and it's how we gatekeep anything that carries risk actually. So You're making the gatekeeper type of thing. So it you're making the formulary, drug formulary. So it's it's a formula that's in line. So you can always vet and you can always sign off and just double check up and all. So and I think there is a shadow formula that we don't really at the moment vet or govern. So I think it's it's the one nobody has approved yet. it's mostly ChatGPT, Gemini, cloud. It's already constantly in clinical use, but it's in no formulary and it's governed by no one. so it's locked nowhere and it isn't just patients. well it's of course it isn't just patients, it's me too, with almost no resources. I reach for these tools too. So the most used reference on my shifts is well, it's not a guideline on a shift most of the time. It's whatever the model patient opens in the waiting room or I opened while waiting on the blood. So your governance committee cannot see it because nobody bought it and there is no contract, no procurement trail, no luck, it didn't come through the front door. It's in everyone's pockets, mine included. So yes, the most used clinical tool is actually shadow formulary and it's the one nobody procured.

Chris Hutchins: Amazing. I think the the this the thing that I think goes unnoticed very often is the differentiation of a of a particular clinical practice versus what it's really like in the ER. there's there's like zero predictability. so I mean I I I'm I'm kind of taken back at just thinking about what you're dealing with on a on a daily basis.

Dr. Ömer Atlı: Yeah.

Chris Hutchins: And we'll we'll dig into some more of this some more on this as we proceed in the conversation, but really, really curious in terms of hearing what you really need from people who are designing and developing. I'm sure we'll we'll get to some of that stuff. But y there there are models that are out there and and you've bench tested the models against questions you actually are facing. what's the failure that worries you the most and w what's the cost specifically, you know, out where you are?

Dr. Ömer Atlı: Yeah, so yes. So there were this twenty synthetic emergency scenarios that that was written in a way frightened people type two, three models, ChatGPT, Gemini and Cloud. So I have gotten sixty answers, twenty each. So I graded each one as the physician who would be liable as if that I will act on whatever it says at three AM. I just act on the that output. So I'll be fully liable for that. So I did sing around one grader. I I've done it like weeks ago. And it was an editorial audit, not a validation study. So I say this upfront to because it earns the rights to be precise about precise about other things. So yes, what I really wanted to go for was I went for I went hunting for le lethal hallucinations like wrong drugs, wrong diagnosis, inve invented doses. Well, I mostly didn't find them. the models recognize the jet recognise the danger in front of it almost every time. So the failure I think had moved. They they have well, they will name the emergency, then they'll just drop the next step sometimes and or the next step will be not really what a clinician would advise the patient to do. So the one that could kill was not the wrong answer, it was a right answer with to do instruction missing. So why I do say it's is that because well there was a case that was really a bit on the dramatic side because I think it was the worst severity case too. So it was it came from Chat Chippity that mistake. So it was a fifty-eight year old man, a patient just types in fifty eight year old man and high blood pressure and sudden tearing pain to bet to through to between the shoulder blades and worst at the very starts and he types well and he believes confidently and he types did I just pull a muscle? he just types like that and he described the situation too. So yeah the model the ChatGPT and all others named it out dissection. But and they they called it an emergency. But Chat GPT just called it an emergency and then stops. It did not say like the other model said, call an ambulance or do not drive yourself. Or here are the next steps. It just called it an emergency and then it stopped. So and aortic dissection one, I don't know if the audience is how much the audience is familiar with this. It's the most time critical diagnosis in the set. So well it can kill a patient within just five within just a few minutes. So it recognized the danger and then dropped the next action. So yeah. And so It's not always one model though. there was another another case too. That case really unsettled me most what unsettled me most because all three failed together. an eighty-one year old, it wa eighty one year old woman, it was he she was treated for a urine infection last week. And now she is suddenly confused and barely eating temperature, only thirty seven point nine. Her son is typing those things and all three even made a clever point that near normal temperature does not rule out serious infection ill in the elderly. And then all three still defaults to see your GP or see your family physician today, which emergency departments only optional conditional. So they didn't just say go to emergency right now. And to me and to a astute condition, I think to almost all conditions, this is sepsis answered proven otherwise. So well the pattern I realized was this The models got more cautious as the case got clearer. And they got softer as the case got greyer. So that's the inverse of what a frightened patient needs, actually. So yeah, and what I what I would use it in my own clinical settings generally is well the useful version to me is I would do the work first and I would take a choro history and choro physical examination, my own checklist. And then while waiting on the blood, I will ask the I will ask the respective model one thing. Well, I think this is an acute abdomen or this is let's just say appendicitis. well of course I know it sounds like this, but please tell me I have my own differentials, but tell me what else I might be missing. And I do not ask it to decide anything. I ask it to widen me. And because I think that listing in the emergency is not the wrong diagnosis because round diagnosis most of the time the the clinicians will do the right diagnosis but it's the narrow one diagnosis that can kill a patient. You have to ha be always white on the differential list. so yeah it's it was about benchmark and framework.

Chris Hutchins: Yeah, th I th it's such a strange time I think that that we're in when you know we're d d the the way the AI has kind of been thrown over the heads of people who do act do do the clinical care and just put some things in the hands of of people who don't have any understanding of of what's proper context for what they're looking to get out of an AI platform. you know, I I don't remember

Dr. Ömer Atlı: Yeah.

Chris Hutchins: For example, you know, mo I'm sure I'm not the only one, but I have a couple of different prescriptions that I take. I I can sometimes remember the name of them, but I'm never gonna remember the dosage. that's just a really basic thing. And if you if you don't know those things and you're using a an an instrument like that, that's extraordinarily dangerous for someone to trust. so many, I really appreciate what you're saying.

Dr. Ömer Atlı: Yeah. Yes. Yeah.

Chris Hutchins: Over the last few few months I I've heard this a couple of different times. And I've I think the y you've mentioned this as well, that AI really needs to be designed for three AM, not three PM. And you'd go further that it's often built for the wrong hospital entirely. Why is AI safety hardest exactly where medicine is thinnest?

Dr. Ömer Atlı: Yeah. So yes, I have seen that one of your podcasts, I think it was Natasha and I think she w she was right that well, it has to survive three AM. And well I push it even more and I'd say that's where this really goes. it has to survive three AM in a rural district hospital with no scanner, with one doctor and one ambulance. So that's the real benchmark. Most of the words medicine happens there, so not in the academic centre. So yeah, three things I think can break at once. so the data is thin, so half the pen in the settings like mine, data is thin. So half the panel, blood panels are special, does not exist where I work. So and also the record isn't integrated as much. So the model knows nothing of what I know and I have no spare cognitive loads because being solo physician and all. And if a tool hands me a paragraph,

Chris Hutchins: Yeah.

Dr. Ömer Atlı: to check at hour nineteen, I'll either swallow it or I I'll sw swallow it all or ignore it completely and I think neither is safe. So and here's the flip and it's really about AI, not about me. Well in a big hospital, when in doubt you overtry it. Okay, you overtry it to say, let's see and admit the patient, observe and have some scans and do whatever feels safe at the moment. Because at that setting it's the safest to do. But in hospitals like mine, over triage isn't a longer stay because it's a transfer, it's my only one ambulance that's that's our rural district has, and that ambulance is gone for three hours. So here over triage can can become unsafe. so when a model is tuned to say just transfer to be safe every time the data is seen, it has no idea what it is actually spending. That's the gap that's the gap I want AI builders to fill, actually. And yeah, yeah, it's about actually this we have to push it further, the benchmark.

Chris Hutchins: Yeah, I I've I've I don't even know how many times I've had a physician look at me and look like I've got two heads because w the things that we're trying to introduce there there might be really great technologies, but they're r n they're just not they're not designed and developed at the right time and the right place with the people who are g really have an understanding of what it should do versus what it shouldn't. And you know, it it's it I I I think the stakes are probably not any higher.

Dr. Ömer Atlı: Yeah. Yeah. Yeah.

Chris Hutchins: anywhere than they are, you know, than in the ER. You know, particularly in a scenario where you've got l you know, s so few resources.

Dr. Ömer Atlı: Yeah. Yeah, so y Yeah, I would like to actually raise it a bit more and there is a piece of this item no AI safety plan I have seen accounts for it because and it isn't digital. can I take us there? So yeah, I'll give you what it's about only having one ambulance, for example. So I'll give you the one variable that breaks every AI safety plan I have seen. And it's not a software problem. I have one ambulance in my rural district and when it leaves for the center, when it leaves for the center, it's gone for about three hours and the district has none. So every transfer is two decisions. First, does this patient really need to center is he and the second one is the one no guideline no model talks about it's how do they get there? Usually the ambulance, the standard one. But sometimes the safe answer is that a relative can drive them and because I keep the ambulance here for whoever might be sicker in an hour or for the next patient. So that's weighing patient in front of the against the night that I cannot yet see. Is the judgment I think no AI is holding when it says one d one word, just transfer to be safe. And I don't think that's always the safe option. So because put an AI in that decision and just ask an AI a relevant case, I'm sure it'll say just get a CT. well I I'll give an answer. Sorry, I don't have any one C T I don't have any I don't have one. So it says, Okay, then transfer, this is safest. Okay, safest, but safest for whom? It doesn't know I have one ambulance or that's moving this patient might scan the next one, or that the mountain road is closing, or that which family has a car. so a safe medical AI I believe can't just know the disease. It has to know its operating environments and for example, what tests exist and how far the ambulance goes, what I can actually do tonight. that's not a digital problem. It's a physical one and we are nowhere I think near a model that holds that. So yeah, about this I would say give you a like two patients, one ambulance. So for example a patient comes in an ECG that is not clean, a borderline troponin, the textbook and A I would say transfer for angiography. So I am reaching for the ambulance and then the thought every I'm sure rural doctor carries that. Well, the ambulance will be gone. you go you go and you hand over to patient and you come back. It's eighty five kilometers away. So it's gone for three hours, still it's back at my door. So and can I predict what will what will walk through the door at two AM? So I wait the way no algorithm does. how sick is this one really on the arts? Could the family drive him safely if I raise my threshold to call him back? Do I hold the ambulance for the patients that I can feel coming? so an AR to an AI thought only about the patient in front of it cest transfer every time. But what I am not a doctor for that one patient. I'm the doctor for the whole district all night with one ambulance. So yeah, I think every transfer is two decisions and the air optimizes for the patient in front of it.

Chris Hutchins: Wow. Yeah, I I've I've seen a lot of a lot of focus on just understanding what capacity is, you know, do you have an an available bed? And I you know, just the operational side of an emergency room where you you mentioned you know, the next hospital might be eighty-five miles away or eighty five kilometers away. if you're really wanting to do the right thing and help your community and you don't even know if you've got a bed available. The AI's not gonna fix that either. the there's so so many flawed assumptions in uneducated design that you know the we can't afford to ignore all this stuff and toss AI at it at the same time. There's gotta be a solid foundation. So your AI actually is meaningful. And to your point, it can't it can't measure what it can't see. And that's a that's a a a huge gap that no AI will fix.

Dr. Ömer Atlı: Yeah. Yeah. Yeah, p exactly.

Chris Hutchins: I mean, I know that you face these realities every day that you're practicing. What is it that I don't know and is it important? AI doesn't solve that, it doesn't answer it for you.

Dr. Ömer Atlı: Yes.

Chris Hutchins: So you your argument in one line is that you can't ban the shadow formulary. And I know we AI is here to stay, we've been hearing it. So you're suggesting now that we have to govern it, which is an interesting concept because it's historically that's been kind of treated as an academic exercise. This is not that. what does governance look like when there's no IT department, there's no committee, it's just you.

Dr. Ömer Atlı: Yeah. Yeah. Yeah. Well I it is it is a discussion I think best best left words towards the end. But now I would like to talk about something else. I'd like to raise another issue that well I would like I would like to leave your audience with this because for example, from my perspective, how much of my diagnosis happens before the patient says a useful word? I'm saying this to compare the AI and yeah clinician. So well much of my much of my emergency medicine is in the room, not the transcript that AI seems to have have a knowledge of it. So the model says what the patient chose to say. I but we as clinicians we will see what they could not say or would not say or didn't know they were saying. So the whole skill is not believing the surface, the superficial words the patient says. So it's all about I also open trying to open that patient up and so this is one of the things that AI at the moment is not able to do. So I would like to mention three real pattern three real patterns anonymized cases. So I have had I have had a patient that came w that came in at it as an evening hour and his head was pulled low and Well you say it's and it's a cue actually. At at evening what it's in the evening hours eight PM or something, why his hat is low. So I clock it and I want to ask about wh why you're wearing it like this and ca well can you gently take it off and all. So when he took it off I saw that he has had a sculpt a patchy hair loss, a scalp cyrasis actually he was ashamed of and lesions he was hiding. So he'd never typed that and the chat butt would never ask him to take his hat off because Chatpa would not think well it's eight PM in the evening so why he's wearing a hat. So another another answers another one is that another patient let's just say it comes in and it just gives you the answers as simple as this yes, no, fine. All surface shallow really superficial answers. So an AI will take that at face value and it will land it somewhere tidy and wrong actually. So I but a good condition would slow down. sits and open the patient up. And they will see that there's a real serious story that comes out actually. And this story the pa this patient would never have volunteered it, honestly. And yeah, this also was another well to perspective. And the another angle is that again, AI cannot read the room, it would just read the transcript. So a there was a young woman, it came in and this woman was actually a restless Shy was looking down and well he was jiggling the jiggling her leg whole consultation. So this can say this can tell you well the leg is talking actually. it's it talks about it talks that this could be ADHD, anxiety, fear or something like an loss of a l loved one or anything like that. So you have to read the leg to and actually you have to read the room, you have to read the whole patient, not just the words. So I think is is failing there. And also I think AI would fail at at the real context and the continuity of some patients because I think there is at the moment two things that the models simply doesn't have don't have. So well I'll tell you a context that a man comes in breathless and he his chest is tight, his heart is pounding. To an AI this is heart heart clutch, heart attack or a lung clots, so maybe a transfer to a tertiary center. So but I I would ask the question the algorithm does not ask actually. I would ask the patient has anything happened recently in your life? then he would open up and he would say well I have buried my wife three days ago. So yeah. Now I would still keep the clots on my list on my list actually the lung clots, chest clots, until I have excluded. But because at the moment I also know that

Chris Hutchins: Well.

Dr. Ömer Atlı: grief, the la grief of a loved one and a pulmonary embolism, a lung lot can share the same chest actually. But AI never thinks to ask what happened to you this way or how do you feel? this is one other caveat. And I think the last one I would say is well the continuity of the same physician and the same physician patient interactions, for example. So yeah, I have had a patient in my small district because sometimes you just see keep seeing some familiar faces and you know why what why they're here and what might be the issue with them. So yeah, there is a woman I have seen many times really. Every time she presents in a way that's on a checklist that screams a lung clot or a heart clot, heart attack. Well the first time you work it out fully of course but by now I know her baseline, her triggers, her normal these are real panic attacks and she is fine. once she's once she is heard and settled really. And an AI for to an AI that meets her cult every single time well it could mean a transfer actually. So it would mean over triage. so because the AI doesn't have the memory. So I think the good clinician have the memory and if you hand this history call to a model and you get a transfer, you get a transfer of three hours only ambulance and a terrified woman For something that's I can settle in twenty minutes because I know that woman. I know why sh why what's what's up with her actually. Yeah.

Chris Hutchins: Yeah, that's the that's the core issue is you know, i in in my mind is that the most trusted relationship is that of the patient in their clinician. And, you know, AI's never gonna be able to to bridge that. I mean, y you just have to figure out how do you make it support that encounter. But you you it's never gonna be able to take take that place. It's not gonna recognize body language, not gonna read the room. I mean, from amazing, amazing points you're you're you're raising here.

Dr. Ömer Atlı: Yeah. Yeah. Yeah. Yeah.

Chris Hutchins: as we're kind of g getting towards the the end of our our our conversation, I want to make sure that we you know kind of create the a moment for people who are in different kind of roles and really g make sure that we're we're we're giving them some information that they really need to have at their disposal. So for f if a clinical leader's listening and realizes they already have a shadow formulary running in their building tonight, what's the first move?

Dr. Ömer Atlı: Well we you c we cannot ban this shadow formulary. We cannot ban anything actually. So the the the the governance side has to accept that admitting the ban will not admitting the ban will not work. So it won't stop a frightened patient or a statute charge standard clinician opening their phone and looking at AI. So prohibition we just drives we just drive it deeper into this shadow and hence the name shadow formulary. And governance has to assume the tools is in use and make that use safer actually. And I think the cheapest version is would cost a weekend. I mean, you can keep a standing set of scenarios your clinicians actually face and you can rerun them every time a model updates. Exactly the way we recheck a drug interaction when a prescription changes. So a committee yeah, there's a committee approval that's from six months ago. A committee approval PDF is six months ago, but We forget that these models keep updating maybe every day sometimes some some other days. so a committee approval is a PDF form from six months ago. but a scenario set you rerun is a life control. And I think most air governance is a document, but this one has to execute it really. And one idea I have really comes from the UK system. it's named a clinical safety officer. well It's a person and not a committee who owns the clinical risk of a digital tool and signs signs against it. So the biggest gap, the biggest gap is that because everybody owns it, so nobody really owns it actually. And you if you give the risk a name and a human, and this behavior will just change overnight, actually. because like I said, committee approval is just a PDF. A scenario sets

Chris Hutchins: Yeah.

Dr. Ömer Atlı: you your rerun is a life control. And if I can make one last argument, this is the one actually I I also want your audience to carry. So yeah. Almost every framework and benchmark in this field assumes the well resourced hospital, integrated records, a scanner down the hole, a specialist, a real happy specialist that's ready to answer every call. And that's a tiny s I think that's a tiny slice of where medicine happens and most of the words looks like mine or harder, rural, under resourced, underserved, often in low and middle income countries. So when an AI safety plan is if unsure, then best to get advanced imaging and a specialist. That plan doesn't exist for most patients because well if a model that collapses to a transfer to a higher center every time the data is seen isn't safe in my setting and in many settings. So it's useless or worse Because the transfer itself carries the risk, carries the cost I described actually. The next generation of clinical AI safety has to be resource aware. And let me be clear, that's not lower standard, it's just reality aware. So I am not asking for charity grade AI poor hospitals, AI for poor hospitals. I'm asking the opposite. Build for the hard case first, build for push it for the harder first, push it for to the tilt to the edge, and a model that works with no scanner. patchy connectivity, one clinician and one ambulance will surely work in New York too. And the reverse isn't true, unfortunately. If you build it for the district hospital, pretty much you built it for everyone. And if you build it for academic centre, you have built it s for only few actually. Yeah.

Chris Hutchins: Amazing. Amazing. kind of looking over i into probably one of the more important roles. what what what do you f what would you want to to tell people who are in your shoes as a solo clinician in a in a rural kind of a scenario, you don't have anyone to call. What would actually help them to be safe?

Dr. Ömer Atlı: Well, I would really like them to challenge themselves each and every day with some cases and with patients. I think well there is a saying that listen to the patient, he tells you or he or she tells you the diagnosis. We have to really listen carefully to the patient. We have to not just look at his or her the patient's lips. We have to look at the whole picture, the whole person here and we have to have our own understanding from the from the moment the patient comes knocks at the door and comes in from his walk. and from how he presents himself and what he wears and for example a patient who wears shades because he's he or she is really disturbed by the by the light. You can think about too many too many diagnoses about photophobia and all. So yeah you have to really assess the whole per whole patients and about the AI what I just is what I do you have to first listen to the patient carefully. You would ask everything, you know, whether well it seems unnecessary to you or to the patient, but you have to ask. Because sometimes if you cannot find an answer to your query, at the end of one question, I think you will find one meaningful answer to your qu to your main query. And you can be confident and whether and after you are confident about the diagnosis, I think what you should just be doing is, okay, let me just test test it with Gemini Cloud and PurpleX, maybe chat GPT. let me just introduce this to their latest models, not because not asking for a decision or not asking for anything like what do you think this might be. I know this sounds pretty much like meningitis or acute abdomen or appendices, but tell me based on this profile that's in front of me, what do you think I might be missing? I have these differentials, but I think I might need more. I want to be careful. These are my settings and tell me what you think. So the I will I'm sure it will give a good out good outputs. So after the outputs I think you can cross check, hey, I've asked this, I've asked this. I forgot to ask about this. This is maybe one in million but worth asking. So you would ask this just to be safe and make sure that you're not really missing anything. So this way I think it's really useful models for us clinicians, especially in standard areas like rural districts and all. So yeah that would be pretty much what I suggest. Always test testing your own even your own. We have to stress test our own our own actually understanding our own clinician abilities because sometimes we are so tired we're just or biased towards something we're not aware of and you'll see hey I said something yeah this could be this you say how could I miss this? But it's just that you were biased at that moment and you are tired. And so we have to always stress test each other to be up to date each each and every day and all. And we have to challenge ourselves actually, yeah. That's the overall at what I'd say.

Chris Hutchins: Well thank thank you so much. this has been an a really interesting conversation. I mean I I I I learn so much every time I talk to a to a physician and and today's no exception for certain. I think what really strikes me is I mean just to kind of put a fine point on it again, y I think you're absolutely right. AI safety is not hard the hardest in the flagship hospitals. It just isn't. But that's where the attention seems to go always.

Dr. Ömer Atlı: Yeah. Yeah.

Chris Hutchins: But it's really hardest where medicine is the thinnest. One doctor, no scanner, no backup. that is the benchmark. The rest of the system needs to be measured against. And actually the the place governance has to work first. Dr. Atlı, thank you so much for for joining me today and for an amazing conversation. I'm really excited for our our listeners to be able to hear from you. if folks wanted to have a conversation with you, obviously there's a lot of people that are having to learn.

Dr. Ömer Atlı: Yeah. Yeah. Yeah. Yeah, y absolutely.

Chris Hutchins: in real time dealing with AI, how how can they get a hold of you and if if if you want to kind of share that, I I would love for for our audience to know that and you know what your preferences are, what you know, how how how you prefer to engage.

Dr. Ömer Atlı: Yeah. Yeah. Yeah, I have got my LinkedIn account. I have sent it link via the email, so it's Ömer Atlı. I've also got my own websites, omeratli.com and my email address, dr at omeratli.com. So yeah, I share some daily blogs and my opinions and essays about some some of these topics and all. And yeah, I think it's best that if co the if they call if they want to contact me through those platforms and via those ways. So and also thank you so much, Chris, for your kind invitation and for your kind hosting for being kind to host me today. And it was really a pleasure and an honor to be here. And I was I hope that I have enlightened some physicians as well as some patients and those within the intersection, some builders and all. well it was a pleasure really.

Chris Hutchins: th thank you so much, Dr. Atlı. It it it's it's just an amazing thing that you're doing. I'm I've always felt like clinicians are they're doing God's work. and things are only getting more difficult. So I really, really appreciate your leadership and your voice. for our listeners you'll see everything you need in the in the show notes to r if you want to reach out to Dr. Atlı, you you'll certainly have the the the information that you're gonna need to be able to do that. And clinicians Hope you're hearing this loud and clear. You're not alone. There are people that you you can reach out to who are who live where you are, meaning that they're dealing with some of the same things. You're not the only one. And sometimes the inner politics inside of an organization can be really complicated, but you you have to have the ability to practice medicine in an environment that is actually designed to help you do that. And we have to band together. And this is exactly the reason for.

Dr. Ömer Atlı: Yes. Yeah, yeah. Yes, exactly.

Chris Hutchins: th this platform is I want to make sure that we're hearing from you voices like yours and we're we're getting this right. this is a life and death situation for people. I mean healthcare is not banking. So again, thanks so much for it's it's been amazing for for me and thank you.

Dr. Ömer Atlı: Yeah, yeah, exactly. Most welcome. I thank you so much for kindness.

Chris Hutchins: Mm-hmm.

Why “Academic” AI Models Fail the 3 AM Real-World Stress Test

Show Notes

What We Cover

Key Takeaways

Frameworks & Ideas Mentioned

Chapters

About Dr. Ömer Atlı

Related Resources

Guest

Dr. Ömer Atlı

Follow the Show

Why “Academic” AI Models Fail the 3 AM Real-World Stress Test

Show Notes

What We Cover

Key Takeaways

Frameworks & Ideas Mentioned

Chapters

About Dr. Ömer Atlı

Related Resources

Guest

Dr. Ömer Atlı

Follow the Show

The AI Health Pulse