Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
AI assists clinicians in responding to patient messages at Stanford Medicine (stanford.edu)
67 points by namanyayg on April 7, 2024 | hide | past | favorite | 72 comments


Fantastic. An additional way by which someone looking for help or support from a human can be denied that help and have their needs offloaded to a technology that probably can’t help them.

Here’s how I see this playing out: practices start to use LLMs to answer patient queries to save money. People realize their queries aren’t actually being answered fully (or at all?) and stop contacting their doctors at all. Practices save money and patients get worse outcomes. Exactly as we’ve seen in customer service over the years - talking to a human is nearly impossible.

An alternate possibility is the LLM gives such bad advice that it causes irreparable harm to a patient and nobody can be held liable because the doctor didn’t issue the advice and neither did the company which produced the LLM.

The universe where an LLM used in this situation does anything positive for the patient is vanishingly small. But I’m sure providers and practices will love it, since it effectively will allow them to not do part of their job and increase margins respectively.


An alternate possibility is the LLM gives such bad advice that it causes irreparable harm to a patient and nobody can be held liable because the doctor didn’t issue the advice and neither did the company which produced the LLM.

I think someone will be liable; this has already been tested in at least one court with something less life-critical: https://news.ycombinator.com/item?id=39378235


The doctor who signs off will definitely be liable. Unfortunately we live in a society where somebody both has to die and have litigious surviving relatives to make sure the problem becomes actionable.


Is it just me or has customer service never been easier, with real humans easily accessible for most major corporations


Real humans who are unable to resolve your issue are quite accessible indeed.


Like… who? I find that large banks, telecom, and what not have largely shifted their call centers to either America or the Philippines and you now get much more proficient customer service than, say, 10 years ago when everything was going to India. Chat based humans are also much more efficient imo.


I don't think you understand how stupid and googleable most customer support questions are.

Doctors will have more time for real questions if they aren't answering idiotic questions. You are living in a bubble.


Doctors would also have more time for real questions if they told fat people to exercise and quit eating so many pop-tarts instead of giving them shots with a host of risky side-effects that then have to be managed.


I promise you, doctors have been telling fat people it's their fault for many, many decades. It's one of the most common complaints of fat people in a medical context - that the first response to nearly any medical issue is "well you should lose weight".

https://www.nbcnews.com/health/health-news/doctors-move-end-...

> When Melissa Boughton complained to her OB-GYN about dull pelvic pain, the doctor responded by asking about her diet and exercise habits.

> On this occasion, three years ago, the OB-GYN told Boughton that losing weight would likely resolve the pelvic pain. The physician brought up diet and exercise at least twice more during the appointment. The doctor said she’d order an ultrasound to put Boughton’s mind at ease. The ultrasound revealed the source of her pain: a 7-centimeter tumor filled with fluid on Boughton’s left ovary.


A silver lining of being mentally ill is that you don't suffer from the diseases that normal people do. Doctors know whatever the complaint it's due to the mental illness.


So the doctor gave the best correct general health advice to the patient, AND did the appropriate test, and appropriately reassured the patient regarding the pre-test probability of a good vs bad outcome of the test AND followed up the rare but possible bad test finding appropriately? Would you prefer the doctor give factually incorrect advice in order to avoid upsetting a patient?

It’s interesting that you frame it as “faulting” the patient - if a patient comes in with shortness of breath, should doctors avoid asking patients if they smoke? If they ask a patient about previous surgery, are they blaming the surgeon? If a doctor asks about family or work history, are they criticizing family or work choice respectively?


That’s… certainly a way of reading this.


In this case it’s the right way of reading it.


It really isn't.

https://www.nature.com/articles/s41591-020-0803-x

> Evidence suggests that physicians spend less time in appointments and provide less education about health to patients with obesity compared with thinner patients, and patients who report having experienced weight bias in the healthcare setting have poor treatment outcomes and might be more likely to avoid future care. Obesity also adversely impacts age-appropriate cancer screening, which can lead to delays in breast, gynecological, and colorectal cancer detection.

> Negative influences on engagement with primary care were evaluated and ten themes were identified: contemptuous, patronizing, and disrespectful treatment, lack of training, ambivalence, attribution of all health issues to excess weight, assumptions about weight gain, barriers to health care utilization, expectation of differential health care treatment, low trust and poor communication, avoidance or delay of health services, and seeking medical advice from multiple HCPs.


… Doctors have, of course, been doing that for about a century. It isn’t particularly effective.

Ozempic actually is effective, and will likely lead to significant improvements.


Are you really trying to blame obesity on doctors? Impressive.


It is in a sense their fault for prescribing something (diet and exercise) we have known is utterly ineffective for over a hundred years. Likely because until now there weren't non-surgical alternatives, and caloric restriction and exercise do have other non-weight-loss benefits.


[flagged]


No, they’re ignoring the medical fact that “tell people to lose weight” has a basically zero percent success rate.

The only intervention known to work reliably was bariatric surgery until stuff like Ozempic showed up recently.


> No, they’re ignoring the medical fact that “tell people to lose weight” has a basically zero percent success rate.

Lots of people aren't overweight, so apparently it works on a lot of people. Eat less is perfectly fine advice, it might not work for everyone but it works for everyone who can keep it. Some need additional help after hearing that advice to ensure they actually eat less, others are fine with just that.


Note that GLP1 receptor agonists are a whole class of drugs, and they were known to be effective in treating diabetes as well, and have been for a while.

Recent commercialization and buzz is definitely new and there have been new formulations of course.

It is quite remarkable that people might think simply telling people to "lose weight" will work still, in this day and age.


The reality is that insulin isn't a treatment for Type 2 -- it's a treatment for Type 1. Type 1 is basically internal starvation. Your cells can't obtain the glucose they need from your blood because your body can't produce insulin.

Type 2 is the exact opposite. Your cells are saturated with glucose and your body literally can't produce any more insulin to stuff that glucose away. Giving insulin just forces the excess blood glucose into tissue and causes additional fat gain. The additional fat gain makes Type 2 worse. Weight loss is the treatment for Type 2.

We've kind of known this since 1915 when Type 2 was treated with fasting. Right up until insulin was discovered. [1, 2, 3]

That's why GLP-1/GIPs work for diabetes. They make you not hungry, so you don't eat, so you lose weight, and presto, no more insulin resistance.

> [...] we now understand that people with type 2 diabetes who lose significant weight and improve other factors related to diabetes can achieve remission. [3]

Fun fact.

[1] https://www.endocrine.org/news-and-advocacy/news-room/2022/i...

[2] https://jamanetwork.com/journals/jamanetworkopen/fullarticle...

[3] https://www.niddk.nih.gov/health-information/professionals/d...


Cutting losses to spend time with other patients isn't ignoring that little medical fact. It's an acknowledgement of it; all the effort spent treating the symptoms of obesity is basically wasted, squandering the resources (time particularly) of the medical system.


Except the post directly criticizes one of the major new ways of reducing that effort.

Someone genuinely worried about obesity wasting doctors' time should be all for stuff like Ozempic.


As far as I see, the only one directly talking about Ozempic is you.


I welcome your alternative explanation of what "instead of giving them shots" meant.


Are you suggesting that the purpose of the medical system is not to treat medical conditions? I suppose if they took that tack, yes, the system would be more efficient.


> Doctors would also have more time for real questions if they told fat people to exercise and quit eating so many pop-tarts instead of giving them shots with a host of risky side-effects that then have to be managed.

There is no scientific evidence, whatsoever, that diet and exercise is an effective way of losing a clinically significant (>5%) amount of weight and keeping it off for a long period if time (5y). Go ahead and try and find even one study that shows this is the case.

When you diet and exercise, your basal metabolic rate slows down as much as 20-30%, permanently, and your hunger increases. Your BMR is where the vast majority of your energy expenditure goes, no matter how much you work out. In fact there's reason to think that more exercise will actually slow your BMR. Body weight set point is principally genetic and epigenetic, as evidenced from twins studies.

Maybe we'd make some progress on this particular topic if we stopped throwing out tired tropes and blaming people. The only scientifically proven methods of achieving significant, long-term sustained weight loss for most people are GLP-1/GIPs or bariatric surgery (but even there, only a gastric sleeve or roux-en-y work, lap bands do not).

Here's a 29-study meta-analysis which walks you through what I said in more detail [1] and of course the famous Biggest Loser study where everyone on that show regained all the weight in the six years following. The more they lost on the show the more they gained back. [2]

Let's not even get started on the use of insulin to treat type 2.

It's pretty wild how backwards our approach to metabolic health is from a clinical perspective. Your response here is a perfect example.

Now look at Tirzepatide. 90% success vs. 5% success. [3]

[1] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5764193/

[2] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4989512/

[3] https://www.nejm.org/doi/suppl/10.1056/NEJMoa2206038/suppl_f...


Why is there always this kneejerk "bad!" reaction to AI? You don't think there's any conceivable way that this may actually help people? You don't think that Stanford Medicine may know a bit more about the problem space than you do?


Because “AI” or LLMs have no understanding of what they’re producing, and every single attempt to use them in an environment where basic factual correctness is involved has confirmed this?

Or the fact that the entire sales pitch is “this will be cheaper than actual people who know what they’re doing”, maybe a side claim of “outperforms people on our training set” coupled with “we accept no liability for inaccurate advice or poor outcomes”.

The sole purpose of programs like these is to save money. It has nothing to do with helping anyone. The goal is to sound like your system is providing the service people paid for, without actually spending the money to provide that service.

I will believe a company will is doing the right thing when the developers personally accept the criminal and legal liability that an actual doctor or nurse providing the service is subject to.

But then to add insult to injury this just further increases the wait time for people by forcing them to deal with the “AI” before they get through to the further reduced pool of actual specialists and correspondingly increased wait times there as well.

This is not rocket science to understand, and literally nothing I’ve said here is new - not the clear prioritization of cost cutting, the overselling of “AI”, and the generally unsafe use of randomized statistical models to provide factual information.

My question is why you think that this time is different vs literally every other example?


I think it's conceivable but if the goal was to help people it would already have been done: address the systemic roadblocks that prevent more people from studying medicine, increase the supply of doctors and clinicians so that people don't have to wait months for a routine appointment.


That's a false dilemma. We can do both. Some people are addressing roadblocks (or attempting to; politics are hard), and some people are working on AI-assisted support.

I'm as skeptical of GenAI as they come, but let's not pretend that the people working on GenAI customer support are the same people who should be working with the AMA to address the artificial constraints in medical education.


> That's a false dilemma. We can do both.

The thing I notice about every argument where I see "we can do both", is that in reality we never actually do both

That's why people get so adamant about choosing their way instead of the other way. There's always two options, we could do both, but we never, ever seem to


Who’s this “we,” kemo sabe?

Let’s summarize the situation here. A group of people have chosen to work on GenAI customer support. You’re criticizing them for not working on a difficult political problem instead.

Meanwhile, what are you doing? Which of the political activist groups working on the problem you care about so much have you joined? How have you offered your support?

If this issue is so important to you that you think “we” should be devoting our full attention to it, surely you’re out front leading the charge, right?


There is by far enough enthusiasm for AI already.

>You don't think there's any conceivable way that this may actually help people?

I don't. I believe that exactly as OP described people will notice they are talking to some AI and just give up, as they correctly understand that the AI does not have the abilities and understanding that might actually help them.


Because the vast majority of commercial deployments of "AI" (llms) are extremely unhelpful chat bots that just waste customers time to try and avoid having to have a human talk to them.


Decent LLMs are < 1 year old. How much commercial deployment they’ve seen?


Too much


Too much where? Let's ignore AI "startups" with no business idea beyond reselling system prompts for GPT-4 to end users. Does your bank have AI like last summer? A clinic? Any utility service? It is unlikely for most established businesses, cause they cannot move that fast. I just did a quick search on "{bank,hospital,...} integrated gpt-4 for client support" and similar queries and there's various "will be", "may use", but nothing specific. I believe ggp is mixing regular chat bots with integration of SotA in their statement.


Off the top of my head: Air Canada, shopify, ikea, car dealerships, ... It's common enough I'm sure I've seen more than that, those are just the ones I remember because they've either personally annoyed me or been in the news.

Health things around here have legal restrictions that definitely prevent them from using GPT, thankfully.


And the vast majority of commercial deployments of real humans to solve the same problem are extremely unhelpful that just waste customers time. Whats your point?


This is because this is what the tendency has been like in practice for many years. While the possibilities are good, living in the world of what-if for ML is very similar to the world of what-if for blockchain. It is very promising but unfortunately the past trend with reality at both a surface and deeper dynamics level does not seem to agree with it being a solid motif long-term.

This is why people are pessimistic about ML/AI in people-facing positions. It has happened with automation in the past, it is _currently happening_ with ML/AI systems in customer-facing applications, and the momentum of the field is headed in that direction. To me, it would be very silly to assume that the field would make a sudden 180 from decades of momentum in a rather established direction, with no established reason other than the potential for it to be good.

This is why people tend to generally be upset about it, is my understanding. It's a good tool, and it is oftentimes not used well.


> Why is there always this kneejerk "bad!" reaction to AI?

I don’t interpret the OP as slamming AI. It’s criticism of the way in which companies find ways to save a few dollars.


Given the long, unfortunate history of computerisation of medicine, I don’t think there should be a presumption that they necessarily know what they’re doing. If this is a disaster, it won’t be the first, and some previous medical-technical disasters have involved human factors; the operator either becoming too trusting of the machine, or too casual with its use.


“Just review the AI draft” is likely gonna kill some people by priming clinicians to agree with the result.

https://www.npr.org/sections/health-shots/2013/02/11/1714096...

> He took a picture of a man in a gorilla suit shaking his fist, and he superimposed that image on a series of slides that radiologists typically look at when they're searching for cancer. He then asked a bunch of radiologists to review the slides of lungs for cancerous nodules. He wanted to see if they would notice a gorilla the size of a matchbook glaring angrily at them from inside the slide.

> But they didn't: 83 percent of the radiologists missed it, Drew says.


If you put an ascii art comment block of a gorilla in a code base and told me to “find the race condition”, I would probably skip over the gorilla.

At the end of the day the metric of success for these doctors is to find cancer, not impossible images of gorillas.

The study is kinda “funny”, but it doesn’t really mean anything. Even the article itself does not claim that this is some kind of failure


The point of this study is that if you’re told there’s a race condition in the code, you are much more likely to miss a big security hole in it.


A better study would be to place an unlikely but detection worthy artifact into the image. Say, you’re looking for a broken bone and you put a rare disease in an unrelated part of the body but visible in the radiograph. Bet the clinicians spot it. Because in the real world, this is how things get caught. But I’d love to know the missed detection rate.


This has been studied in various ways. Lots of perception effects documented. For example:

https://cognitiveresearchjournal.springeropen.com/articles/1...

> For over 50 years, the satisfaction of search effect has been studied within the field of radiology. Defined as a decrease in detection rates for a subsequent target when an initial target is found within the image, these multiple target errors are known to underlie errors of omission (e.g., a radiologist is more likely to miss an abnormality if another abnormality is identified). More recently, they have also been found to underlie lab-based search errors in cognitive science experiments (e.g., an observer is more likely to miss a target ‘T’ if a different target ‘T’ was detected). This phenomenon was renamed the subsequent search miss (SSM) effect in cognitive science.


So give the a reviewer a rubric of things you care about. Race conditions, security, etc.

The point is, an image of a gorilla is literally completely impossible and has no relevance. That’s why a security vuln is not analogous


In the case of radiology, that rubric is "anything clinically signifcant you can find". A kid in being treated for possible pneumonia might have signs on a chest x-ray of cancer, rib fractures from child abuse, an enlarged heart, etc.

There's a risk to "rule out pneumonia" style guidance resulting in a report that misses these things in favor of "yep, it's pneumonia".


With all due respect there is so much absurdity in the assumptions being made with your linked article that it is almost not worth engaging with. However, I will for educational purposes.

As someone who is trained in and comfortable reading radiographs but is not a radiologist, I can tell you that putting a gorilla on one of the views is a poor measure of how many things are missed by radiologists.

Effectively interpreting imaging studies requires expert knowledge of the anatomy being imaged and the variety of ways pathology is reflected in a visibly detectable manner. What they are doing is rapidly cycling through what is effectively a long checklist of areas to note: evaluate the appearance of hilar and mediastinal lymph nodes, note bronchiolar appearance, is there evidence of interstitial or alveolar patterns (considered within the context of what would be expected for a variety of etiologies such as bronchopneumonia, neoplasia, CHF,...), do you observe appropriate dimensions of the cardiac silhouette, do you see other evidence of consolidation within the lungs, within the visible vertebrae do you observe appropriate alignment, do the endplates appear abnormal, do you observe any vertebral lucencies, on and on and on.

Atypical changes are typically clustered in expected ways. Often deviations from what is expected will trigger a bit more consideration, but those expectations are subverted during the course of going through your "checklist". No radiologist has look for a gorilla in their evaluation.

It is pretty clear that the layperson's understanding of a radiologist being "look at the picture and say anything that is different" is a complete miss on what is actually happening during their evaluation.

It's like if I asked you to show me your skills driving a car around an obstacle course, and then afterwards I said you are a bad driver because you forgot to check that I swapped out one of the lug nuts on each wheel with a Vienna sausage.


My dad is a radiologist and… not so dismissive of this study. Missing other conditions on a reading due to a focus on something specific is not uncommon.

Things like obvious fractures left out of a report.

https://cognitiveresearchjournal.springeropen.com/articles/1...

> For over 50 years, the satisfaction of search effect has been studied within the field of radiology. Defined as a decrease in detection rates for a subsequent target when an initial target is found within the image, these multiple target errors are known to underlie errors of omission (e.g., a radiologist is more likely to miss an abnormality if another abnormality is identified). More recently, they have also been found to underlie lab-based search errors in cognitive science experiments (e.g., an observer is more likely to miss a target ‘T’ if a different target ‘T’ was detected). This phenomenon was renamed the subsequent search miss (SSM) effect in cognitive science.


The study you linked effectively reinforces the points I made above. Given the search pattern used and the comments I made before about expectations maintained during a read, it follows that the described SSM effect is a source of errors.

Putting a gorilla on a view and then posting to NPR a sensationalized article about how "83% of silly radiologists just can't see what you managed to see" is not that.

In fact I would argue the SSM effect is present in many aspects of medical decisions and likely other industries. The other way to frame the SSM effect is to call it the "this case has the initial patterns of the thousands of other routine cases of disease X, so it is almost guaranteed to be disease X and I have to get home for my daughter's dance recital." effect. It's an optimization strategy that works most of the time.


Doctor here. Agree with this. Plus, when requesting an imaging study, we provide the patient’s history and list of differential diagnoses. The radiologist looks at the X-rays in that context.

Interpreting X-rays in isolation has little relevance to actual clinical practice.


Exactly. Everyone laments issues with provider shortages and resource issues, but when the people doing the work optimize their strategies to get through their work faster, we get dumb articles like the NPR one linked above.

As helpful as it is for rads to catch unexpected incidental findings, that's not the point. If I'm waiting on a stat read, I've got specific questions I'm looking to have answered quickly. I literally don't want the radiologist wasting time hunting for random oddities if it delays getting the reads back.


From what I've seen so far, some will absolutely notice and complain loudly about the mistakes that AI makes, and some won't --- praising it as another great solution. I suspect this will create another highly divisive issue.


Based on the code that CoPilot has produced for me, I can 100% see that happening, and rather a lot.

So far, the code it produces looks really good at first glance, but once I really dig into what it's doing, I find numerous problems. I have to really think about that code to overcome the things it does wrong.

I think many doctors are used to doing that already, since they have to correct their interns/assistants/etc when they come up with things that sound good, but ultimately aren't. But it's definitely something that they'll fail at sometimes.


This is what I really worry about in programming contexts. The norm is already for a distressing proportion of engineers to barely (if even that) understand code they’re working on, even when they’re the only author.

In that situation it’s extremely easy to just blindly trust an “expert” third party, particularly one where you can’t ask it questions about how and why.


They don't even deserve to be called "engineers". They're just guess-and-test amateurs.


I haven't finished reading the paper closely, but it is disturbing that this is the only real reference to accuracy and relevance I could find:

> Barriers to adoption include draft message voice and/or tone, content relevance (8 positive, 1 neutral, and 9 negative), and accuracy (4 positive and 5 negative).

That's it! Outside of that it looks like almost all the metrics are about how this makes life easier for doctors. Maybe there is more in the supplement, I have not read closely.

Look: I am not a policymaker or a physician. It's quite plausible that a few LLM hallucinations is an acceptable price to mitigate the serious threat of physician overwork and sleep deprivation. And GPT-4 is pretty good these days, so maybe the accuracy issues were minor and had little impact on patient care.

What's not acceptable is motivated reasoning about productivity leading people to treat the accuracy and reliability of software as an afterthought.


Those are just free-text comments from the ~80 physicians who used this tool. Tables 2-4 show the researchers measured various characteristics like message read/write time, utilization rate, work load, burnout, utility, and message quality.

It's also worth noting that, from the abstract, the study's objective isn't to study the LLM's accuracy. This is a study of the effectiveness of this drafting system's implementation in the hospital. I'm not saying the accuracy isn't an important component of the system's effectiveness, but it's not the question they're answering.


My point is that "accuracy is not the question they're answering" is a fatal flaw that makes this research pointless.

Say I released a linear algebra library and extolled its performance benefits + ease of development, then offhandedly mentioned that "it has some concerns with accuracy" without giving further details. You wouldn't say "ah, he focused on performance rather than accuracy." You'd say "is this a scam?" It is never acceptable for healthcare software researchers and practitioners to ignore accuracy. LLMs don't change that.

The only thing that sort of cynical laziness is good for is justifying bad decision-making by doctors and hospital administrators.


It is worth studying the quality of the generated messages, but the system's effectiveness does not hinge solely on the quality of those messages.

There's not a singular metric like accuracy that summarizes the quality of a generative system. What would an "accurate" message be? I can think of several aspects: mimicking the doctor's tone, addressing the correct patient concern, not providing clinically false information, not being too long, using the correct language. A good statistical characterization of this system requires rating multiple aspects of the content. These researchers measure the system's quality through user surveys and interviews. Is there a better ground truth to compare it to?

There's an important distinction between accuracy and effectiveness. These researchers are interested in the system's effectiveness. An algebra library can't be effective without being 99%+ accurate. Plenty of tools can still be useful without being perfectly accurate. Diagnostic tests are a great example. Conversely, you can have tests that are 99% accurate, but not useful, depending on the context and implementation.

I think that's why these researchers are less concerned with accuracy. Effectiveness is a separate concern from quality/accuracy and can be studied separately. They're different research questions.


Notice how all the benefits they mention are for physicians and not for the patients. That's a pretty backwards approach to healthcare


It's a benefit for health system C-suit execs because it makes physicians "more efficient," and it's ultimately the health system C-suite who will be signing off on the contracts for five years of AI messenger service or whatever, not the physicians.


Physician experience is important for patient experience. Doctors are by and large overworked, and respondih to too many patient messages are a major contributor to burnout.


Approaching the problem of doctor burnout by spamming patients with half correct responses to important medical questions is not a solution


Stanford medicine in a nutshell. Stanford medical has spread like a cancer around me, and all the clinics that were acquired immediately became awful.


Yes, that's how 'reputation laundering' works: just acquire all clinics and fool people with the tag "Stanford".


Likely paying some tall dollars to deal with cheap first line support. Probably a "direct to human" fee available soon.


I bet they’ll still charge the regular “Physician Fee” though


this is horrifying




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: