> They performed shockwave therapy on my shoulder even though a recent clinical practice guideline says clinicians should not use or recommend shockwave therapy for rotator-cuff tendinopathy without calcification; I was told during ultrasound that there was no calcification.
Ultrasound isn't a great way to assess for calcification. It'll find large calcification but easily miss small ones. Plain radiograph would be more helpful, but the MRI may have revealed it as well. Either way, shockwave therapy isn't harmful in the absence of calcification--it's just not helpful.
Edit: when a radiology report says something isn't present, there's always an implicit caveat that the finding isn't present within the context of the modality and images obtained. So an ultrasound report can state there are no calcifications while a plain radiograph can report the presence of calcifications without being inconsistent. Obviously very confusing to patients and people unfamiliar with medical jargon, but clarifying this in reports would make them sound even more qualified, "hedgey", and annoying to read than they already are.
This is being overly nice, I think. Anyone who doesn't understand this is an idiot imo. You would have to assume that every type of diagnosis instrument has infinite clarity and is always correct to be confused in this case.
Reminds me of the Babbage quote where somebody asked him, if I put the wrong question into this computing device, will it still give me the right answer? His response, paraphrased "I can not fathom the logic of the minds which would come up with such a question".
Edit: I should mention that ultrasound is basically unusable for evaluating bones. Sound waves can't penetrate bone, and so you end up just seeing a huge black void. That's a huge orthopedics use case that ultrasound just can't benefit. However, ultrasound is fantastic for evaluating muscles, ligaments, tendons, and other superficial soft tissues.
Since MRIs are more expensive, private doctor's might order them instead of an ultrasounds.
(I'm a doctor)
Any comment that doesn't start with this or similar qulaification should be taken with a grain of salt (yes, including this one).
Medical imaging is one of those things everyone thinks is simple because they don't know what they don't know. I'm a cardiac sonographer, and I have to assume radiologists hear at least as many eye-rolling takes on AI coming for their job as I do.
Full sarcasm, is there one that’s that’s more immune?
I know my anatomy and etc and have done a short stint in ultrasound. I have no idea what you are doing or looking at and can identify pretty much nothing.
Echo techs are going to be around a lot longer than MR techs.
Someone on reddit claiming to be a radiologist claimed that.
I wonder where the savings will go when those jobs are gone.
The radiologist I know does not, but they are paid very well (and these numbers are always dumb when you're not sure if they're living in Manhattan vs literally anywhere in Kentucky)
Like most medicine, a large % of the job could be done by any decently talented person willing to follow instructions and shadow for a few months.
Like most medicine, the remaining % is what you're paying for, because it is literally life and death and you can't do things like "pull the logs" or "lets turn it off and take it apart" or "huh i need to put this down and come back later". Even in radiology, because "well lets just do it again to be sure" is often not a viable option.
While there is a problem in how we have inflated the cost of education for medical fields, the insane health insurance issues (US obviously, but it does have some effect globally when the expert radiologist you hire from the US to help with research costs that much), and probably some better ways to approach splitting the work for the entire field, like most professions dealing in life or death, medicine likely will always be paid well.
This really is key. We know we can't trust the AI, but at the same time we're also more comfortable asking the AI for clarifications or confronting it. Not having a time-bound appointment or paying by the hour helps a lot. But even then, more information doesn't necessarily help!
I once brought my 11-year-old car, a Civic with 150k miles, to multiple garages. I figured I'd play the "second opinion" game to correlate what the garages recommended to decide on what needed to be done...
I got 3 completely unrelated recommendations, including one that I knew was invalid! I felt worse off than when I started!
The solution to uncertain information isn't more information, which the AI can certainly provide, it's better information, and AI cannot currently provide that.
A mystery is worse. With each additional piece of data, the goal gets farther away. Everything is more and more confusing.
(Popularized by Malcom Gladwell)
Everything is a puzzle: there is one "Truth" or one diagnosis. You (a smart human) should be able to converge on it by cross-examining your LLMs. By themselves, they have no interest in revealing this, no stakes, which makes them tools only useful at the hands of a capable investigator.
What makes you think this is fundamentally different from cross-examining ELIZA? There is no guarantee that the LLM will help you converge on anything. Indeed actually calling out an LLM on BS tends to eventually produce an "I don't know and can't help you further" answer (as it should).
When I ask a question outside of my domain of expertise I like to ask all of the LLMs I have access to. I also create separate sessions and ask the same question multiple ways.
It’s revealing to see how many different and contradictory answers I get, most of which are presented confidently.
The last time I ran a medical question through Claude I couldn’t even get consistent answers between sessions.
It’s also scary how easily you can lead each LLM to the answer you have in mind. When I would start asking questions about different options that other LLMs had presented, each session would drift toward that explanation.
You might end up with the answer from the most persuasive LLM, but you might also end up with better results.
Wonder if there is a paper out there on this.
I'd argue that AI _can_ currently provide that, but that it can't do it _reliably_, and that to non-experts it's impossible to differentiate, which makes it all the more dangerous.
What is needed are studies that will take a cold look at the actual results because AI seems to be required to be perfect or it is useless. It just needs to be as good as a human for most stuff, but in the long run it will be much better. At least that what extrapolating current reality shows us.
Some of this might be applicable to LLMs, but some isn’t and much of it would be resisted. This is one reason we’re not likely to get “as good as a human” because at some level we’re not optimizing for the outcomes; we’re optimizing for speed, convenience, some participant’s economics, and underlying beliefs.
Given human body complexity, the diagnosis is a compound output of the experience, knowledge gained throughout the career and diagnosis methods/equipment, the title (like Dr) is a certification imposed by the state so its "safe" to let people practice since they passed "the bar" - but that doesn't imply everyone will be treating the same.
Some specialists update their knowledge monthly, some yearly and some don't do it at all, there are so many variables in play here (geo, politics, even weather haha).
Having said that, choosing the specialist is really important, getting opinions about their practice and their speciality, you can only maximize your chance of getting the right diagnosis, but don't expect to get it right just because somebody is called a Dr.
In a community largely made of people whose job it is to produce such functions, I'd say it's to be expected
There is absolutely one "The Diagnosis". Human body is a machine, albeit a very complex one, and all measurement sources have noise. But they are all measuring one reality, and if there is a problem, there should be one explanation that all measurements align with. They can be noisy but can never be conflicting (instrument error notwithstanding).
Physicians' ability to arrive at "The Diagnosis" would vary, but it does not mean one does not exist. I am not sure if characterizing human body as derministic or not is relevant here.
I also had a pretty painful shoulder issue at one point, where the pain just wasn't subsiding for months. I tried massages and acupuncture as I didn't want to do surgery, but it wasn't helping at all. The thing that fixed it for me was just really focusing on doing pull-ups. I couldn't do them at all when I started, so I began with dead hangs and scapular pull-ups, eventually progressing to regular pull-ups, and then training with a "grease-the-groove" method once I could get a few per set. I stopped the training schedule once I was getting in around 17 pull-ups per set, and now just do 6 sets of about 7-8 pullups 3x per week spaced throughout the day. I'll also do some shoulder mobility drills [1].
Whenever I get lazy about keeping up with them inevitably discomfort will start arising again, but it goes away once I get back to strengthening.
It really seems like if you, as a patient, go looking for a quick fix, that’s what you’ll be offered. And if you educate yourself a bit and then go t for the best fix for you, you usually get they.
With calcifications, physio without the shockwave component definitely doesn't allow going back to the normal gym routine. It's just not enough.
Before I was admitted, I quickly found another radiologist, who diagnosed pneumonia instead. I sent his report to the chief doctor at the tuberculosis hospital, and after some deliberation they concluded that the original reading was wrong. Turns out the doctors there can't read scans at all and just believe whatever a radiologist says...
The funny thing is, they had already officially put me on the tuberculosis register and didn't want to admit they had made a mistake. So instead, they simply gave me another paper saying that I had been cured of tuberculosis by them... in 7 days. I'm probably the only person in the country to defeat tuberculosis in a week :)
So if you don't trust the radiologist/doctor, maybe find another doctor if you can afford it? You can compare their conclusions and see if they match. Two unrelated doctors or radiologists saying the same thing is probably about as close to the truth as you're going to get. I'm not sure though whether I should trust AI or humans more. AI can hallucinate, but I've been misdiagnosed by humans so many times too...
I forgot to mention that, besides getting a second opinion from another radiologist, I also took a more modern test at another private clinic. That test has better detection rates than the one the state clinic used, and it came back negative too.
I have suspicions they had some kind of government quota to keep the hospital staffed with patients in order to receive funding. Or they were just completely incompetent. I pushed back by bringing them another radiologist's report and the results of a better test that I paid for myself, so I guess they decided to back down.
ChatGPT surfaced a NIH study that concluded that 20% of people have allergic reactions that are isolated to a body location, and that shoulder "skin prick" testing may not reveal. I asked him about that and he said "that's not how allergies work". Full stop. He was unwilling to even look at the study.
He prescribed a CPAP and regular nebulizer treatments. Side story: the CPAP place sent me a SMS message that I couldn't recognize was not a phishing attempt, and when I reached out to inquire who they were they never replied.
So I decided: Let me just try taking a second-gen allergy tablet every day and see what happens.
My sinus infections have gone away. Previously I was getting a major sinus infection at least quarterly. Maybe he's right that allergies don't work that way, but allergy tablets have absolutely solved my problem. Which I'm thankful for because I tried a CPAP for a solid month a few years ago and I just could not get used to it, and was sleeping like crap.
Which moves us to the next two issues: liability and time. Any moment that you ask someone to revise a decision and specially with the stakes that the medical profession has that nobody has the time nor the inclination to open themselves for a mess.
Now, if you really want to be successful, you have to, before they even have a case with you, and specially before the diagnostic loop closes, to suggest the tests that the study has, since that has the biggest chances of looking at the right thing to look. Just be straight that you walked in with a theory. Doctors notice when they're being steered way faster than they notice when you're actually right. That's how you work with the systems that have a overworked mass trying their best.
My problem is that I needed information from 2 ENT visits to feed into ChatGPT to get that study. On the first visit he scoped my sinuses and immediately said "I can see evidence of allergic reaction, see those white bumps?". On the second visit I got an allergy stick test and it came out negative.
Those helped lead to that NIH study. It would have been very hard to have walked in with that study in hand.
All I can find is about 1st gen antihistamines (i.e. Benadryl, which I doubt many people take daily, because of the drowsiness).
Even for those, evidence seems to be mixed at best. "Huge increases" seems like hyperbole.
https://www.myalzteam.com/resources/zyrtec-and-alzheimers-me...
There IS one year-old finding that suddenly stopping Zyrtec after daily 3-month use may lead to nasty itching, and if that happens you can re-start and then taper off. https://www.fda.gov/drugs/drug-safety-communications/fda-req...
Only first-generation antihistamines with anticholinergic effects are associated with cognitive decline in elderly patients.
Yet here we are, warning each other about the dangers of LLM hallucinations. Humans "hallucinate" (provide random authoritative-looking information without anything to back it up) pretty often too.
Current Siemens MR software ‘Deep Resolve’ makes up the signal (adding about 50%), then makes up every second pixel, and then, for 3D sequences, makes up every second slice. It’s locking about 59% of the time off each sequences. And it’s really really good. I’m an MR tech.
After years of collecting artifacts and errors, I have more and more respect for the tool.
But it’s jarring. I open a sequence, decrease the acquired resolution, add the AI and get a scan that’s quicker and higher resolution.
It’s an amazing time to be an MR tech.
No radiologist is buying "AI" scanners. Radiologists are probably among the most jaded of an audience about the word "AI" due to decades of undelivered promises. AI is synonymous with "worthless trash" to them, not to mention everyone says "AI" is going to put them out of work. lol
https://marketing.webassets.siemens-healthineers.com/2861d15...
Actually, I'm curious what ChatGPT 5.5's ELO is- I wouldn't be too surprised if it's 2000+ just from its basic understanding of chess principles from all the content it has digested.
LLMs truly are marvels with text but anything spatial seems to really mess it up, somehow.
Dr. GPT is a good brainstorming tool. It helps synthesize information in a way that primary texts don’t. But it does force you to say “that doesn’t make sense”.
I do think that people saying “doctors don’t know the state of the art” have a weaker case. If you think about it in terms of token density during pretraining and how post training datasets are constructed, I think it would take us a very long time to adapt to any fundamental shifts. If we have forgotten how to cure scurvy, how many journal articles would it take before we adapt to a discovery?
Again, this is just one single person's experience. So not worth much.
I don't understand why doctors don't prompt LLMs before saying wrong things. Is it ego?
I can understand for radiology because you need a specialized convolutional network, but for more knowledge based things...
I think we’ll see a lot of specialized VLMs that provide real value.
And yea, I already did all the standard things. CBT for insomnia helped somewhat. My insurance didn’t fully cover it either, unless I was willing to wait for 8 to 12 months.
And I recently met someone with slow moving metastatic cancer. Thanks to LLMs they will most likely live another 3 to 5 years extra since the Dutch conventional mainline treatment hasn’t been taken yet. But it is German doctors that helped them and Belgian doctors that pointed out in a second opinion that a lot more can be done.
LLMs have a part to play. The false positives are awful, but I have seen an average of 5 out of 10 care when things become too complicated.
Except for trauma treatment. The Dutch healthcare system is amazing once they diagnose classic PTSD.
So it’s definitely not all bad but the trust I had when I was younger has been eroded quite a bit and LLMs can meaningfully step in, in my case at least.
[1] I know there are worse systems. But from what I have heard there are clearly better systems nowadays. It has slipped a lot
So 3 days out of 7 days I have guaranteed good sleep. The other 4 days are a toss up. But an average of 5 days of good sleep is much better than 3.5 days out of 7 days.
The dad was a retired neuroscientist who delayed cancer treatment against medical advice because he was certain he had been misdiagnosed based on his own research that he did with the help of A.I.
https://www.nytimes.com/2026/04/13/well/ai-chatbots-cancer.h...
There's a comment on the article from Ben Riley:
> I am very grateful to Teddy Rosenbluth for sharing my father's story with the world, her kindness and curiousity proved to be restorative in ways I didn't anticipate.
> The two words that everyone used to describe my dad: "intelligent" and "kind," and he was indeed both of those things. The sad irony here is that it was his human intelligence, combined with these strange new tools that purport to be a form of 'artificial' intelligence, that led to his ill-advised decision to forego the treatment he needed for his CLL. A doctor has already commented on this story with the observation that AI "confidently asserts erroneous conclusions," and we simply have no idea how often this is happening or the magnitude of the harm that results.
> Not a day goes by that I don't feel the pang of my father's absence. He might still be here if not for AI. I try not to think about that, but sometimes I can't help myself.
This is the real root issue.
At 75 years old, he was stubborn. Is that reasonable ? Yes, perfectly. Could he have been right since the beginning ? Certainly. Did he deny evidence ? Yes.
Zero doubt that he was intelligent, everything points toward that direction, but that doesn't make a person less stubborn, because accepting the evidence, is also accepting that you were wrong if you initially postured yourself as adversarial instead of cooperative.
He would have read Wikipedia, scientific papers, etc, even without AI.
He did not want to be convinced. It works both ways:
https://www.foxnews.com/health/woman-says-chatgpt-saved-her-...
or
https://www.today.com/health/mom-chatgpt-diagnosis-pain-rcna...
Nonetheless, someone very smart, just didn't want to move from his position.
Your comment is akin to saying "Karen from facebook who is a human pushed essential oils and ivermectin as a cure to cancer. Now doctor Y is suggesting chemo. Both are humans, humans cannot be trusted!"
I told my mechanic the film flam is broken but he said it was the rim ram. He fixed it and we all went in with our lives.
But doctors insist on this God like status so it’s a “nightmare” when patients try to help themselves.
It's a 180 for me: While I believe doctors should explain diagnosis or treatment decisions when asked, I don't believe they should be taxed with explaining away alternatives. In my anecdotal 2nd- and 3rd-hand experience, doing that is taking at least a third of their time (on roughly 5% of the patients who think demanding answers will make things better) -- with zero improvement to diagnostic accuracy or treatment effectiveness. Doctors already consult with other doctors, and it makes no sense for them to have to consult with ignorant patients or treat their AI psychosis on top of their disease. It doesn't increase patient autonomy any more than adding a steering wheel for child car seats would help toddlers learn to drive.
I wouldn't trust AI to make a diagnosis, but I would absolutely trust it to notice where procedure hasn't been correctly followed, where a treatment is counter-indicated because someone has missed a line on a health record, or where there's a clear potential alternate diagnosis which has been missed for spurious reasons. Also, unfortunately, where doctors aren't doing a decent job - often because they're overworked or underfunded.
The same issues that were present with search-engine self diagnosis are still present with LLMs. If you provide Google with an incomplete list of symptoms and can’t interpret the information you find correctly, you will likely get an incorrect diagnosis. The same is true for LLM output.
But AI's problem is that its completely full of shit, sometimes, and the people most qualified to evaluate whether its full of shit are the doctors, not the patients, but just like OP's original article, patients are left feeling like their second opinion from AI might be more trustworthy than their doctors opinion.
Examples of things normal people can verify
- procedural errors that Claude can capture like some blatantly high dosage (grams instead of milligrams)
- outdated treatment plan, maybe there’s a credible new treatment plan that’s been used for years but the doctors were not updated
- literally being injected homeopathic drugs (takes no smart person to flag this)
Let’s stop talking as if doctors have a divine right here. And let’s accept some agency.
Studies have found that newer reasoning AIs are about as good at diagnosing illness from a written description of symptoms as doctors are.
Granted, it cannot actually examine a patient, so we're not replacing doctors anytime soon. But your view is obsolete.
It may have some utility after diagnosis, but this test doesn’t demonstrate utility for patients.
The more training data, the more questions it can answer with a reasonable degree of probability of accuracy.
Throwing away a potentially useful analysis just because it’s probabilistic seems a bit like throwing the baby out with the bath water.
[0]: IF.
The clanker said I'd be fine, I just needed some rest and OTC meds.
The medical staff immediately turfed me to surgery because the same set of symptoms I told the clanker were enough to concern them that I needed emergency surgery.
Had I have listened to the clanker, I'd be dead because I did need emergency surgery. (Hell, I almost kicked the bucket because I waited for someone to wake up to give me a lift because.my insurance probably doesnt cover an ambulance ride.)
We need studies that quantify error rates from each source type, then we need to account for the fact that the artificial type will keep improving.
Pretty much the like most manager these days, so I understand the frustration of the GPs.
Like any domain, when you have questions or need a solution, you make research first, then you ask a specialist.
If you explain well the symptoms and context you can have proper advices and then decide on the path next:
Case A) It looks benign and advices / information that you collected seem reasonable, then you go your way.
Case B) You need second opinion of a specialist because the subject is too complex, or there are medications that you need approval.
Once you have challenged LLMs, and read about the topics over and over then you genuinely become really good at understanding it (especially if you triangulate over LLMs and ask them to challenge, you start to have genuine questions). No matter if the answer is right or wrong, you have elements. Maybe you missed the point, but you come prepared.At home you have the time to assess the options, pros and cons of each approaches, the possible questions to ask and then challenge the doctor.
Shared decision-making is an actual evidence-based model of care, and patients who arrive understanding their condition and carrying specific questions tend to get better attention and better outcomes.
Some doctors get annoyed, because they have big ego and choose to be patronizing, but it is exactly their job to answer such questions.
With LLMs, it's quite good, you get nuanced and rather useful answers.
Before LLMs, no matter the topic you searched for, the answer was the same: "you have cancer / an [obviously deadly] rare disease"
The other problem, in many places: • The doctors are not affordable
• They are too busy for you (< 15 minutes)
• You may need to wait months to get an appointment
• They are not good (country-side is an example, and sometimes even country-level)
+ you can have all of these factors together.So, you have something deeply bothering you, your only appointment is in 4 months. It would be insane not to take the time to explore different solutions and not to come informed about the topic.
If you express your prompt properly and do not rely on imagery, you can absolutely have top-tier advices.
A con artist, a fraud
There are other commenters saying this is a good practice they've also done for other injuries. You are saying you are an actual radiologist and immediately clock the problems with its advice.
I have seen this pattern over and over again. Anytime someone is an actual expert at anything, AI output appears insufficient or incomplete or outright misleading. It is only when you do not know what the AI is being asked to do is it likely you will find the output helpful.
This is itself alarming to me, but no one else seems to find this to be quite damning for the AI services being offered, preferring instanced to be wowed by the convenience and speed at which they can be delivered unreviewed and unproven information.
It is weirdly religious in a way, because if you were to present contrary evidence (e.g. experts in a field weighing in about how plausible sounding responses are bunk), you would only be told you don’t believe enough in the long term potential and capabilities.
Don’t get me wrong, I think we all agree capabilities will eventually improve (and farther-future capabilities could reasonably surpass experts), but really is unclear if the current transformer architectures with their probabilistic/hallucinatory outputs will plateau before they surpass current experts abilities in all promised fields.
And it's so much like listening to someone in a church congregation sharing their experiences with god. Clear and obvious gaps are hand-waved away exactly how you're describing.
The problem is that AI psychosis is fundamentally the belief that an LLM is "thinking" at all. Outputs are just believable word vomit which resembles factual information.
The problem is real but I don't think positing a philosophical root is helpful
If "agency" is making decisions and performing corresponding actions in the real world, then LLMs most definitely LOOK LIKE they're making decisions (what's the next token? which tool to use? what's to say, in general? what idea to convey?) and performing actions (tool use). Can we tell whether they are ACTUALLY making decisions? Well, are the people around me "actually" making decisions? Or are they simply pushed around by circumstances and external forces?
Am I actually making decisions? Did I like DECIDE to write this comment? Maybe? I have no clue...
It's quite simple, the agency that the LLM appears to have is actually your own. Without a prompt an LLM does nothing. It has no thoughts between prompts about you or your problems.
So when it's not active, not responding to a prompt, it's of course not thinking. I'm pretty sure nobody actually questions this. Is your computer "thinking" when it's powered off? Can a piece of metal think? Probably not. So there are no thoughts between prompts, this seems obvious.
Thus, this is a question of "discrete time vs continuous time". LLMs "live" from prompt to prompt. Humans are alive continuously. In some sense, we're prompted by a lot of things all the time. As I'm writing this, I'm seeing stuff, I'm hearing stuff, I can feel various parts of my body, I'm thinking about my problems, my goals, other people's problems and goals, etc. When I'm in a sensory deprivation tank, my brain keeps "entertaining" me by "self-prompting", like a recurrent neural network (I guess it literally is a massive RNN).
So it seems like your definition of "thinking" hinges upon the LLMs being discrete-time and single-threaded (can't think about multiple things in parallel).
IMO a more interesting question is whether an LLM is thinking WHILE IT'S GENERATING A RESPONSE, while it's "alive".
Twice in your comment you suggest things that you think that I believe, please do not do this.
You are anyway, I don't see anyone up the chain saying that.
And context window work very well. You can 'teach' an llm a new programming lanuage and other things through it.
A lot of the models up to this point have been benefitted - like Google did - from essentially ‘pre SEO’ internet.
Now the same tools are being used to generate nigh infinite good sounding bullshit, which poisons the dataset in all sorts of hard to detect ways.
To add insult to injury, the human experts are also not as. Naive, and have many incentives to poison their own input in subtle ways too.
For one, if your website/book is poisoned, who is going to trust it for anything at all, much less for training models?
For two, all the major AI labs hire or contract for subject matter experts to create curated data sets, evaluate model performance, etc.
Unless they hire malicious experts, this will provide a growing, high quality data set that should drown out any poisoned pretraining data.
If it's easy enough that some randos can do it for fun, what do you think happens when there's commercial interest behind it?
Obviously companies are going try nudging AI towards recommending whatever they're selling. It's a logical extension of SEO - and that's a 100 billion USD industry.
Additionally, if I believed myself to be in some sort of spending - err - AI race, I'd try to poison the data sets of my competitors by putting crap out there for others to ingest.
What does it mean, Is it like when somebody used some coding agent to develop a feature and later input prompts and a resulting PR can be used for training by a presumption that final PR was a correct implementation of a prompt?
This is how we get LLM summaries presenting something mentioned once by some nutjob in a reddit thread as bona fide FACT
Yes AI scrapers can easily spoof user-agent, but they fall out of date as the browser updates.
Bit harder to catch them in tarpits and then serve nonsense to whoever ever triggered the tarpit.
It’s a hell of a lot easier for a company to ensure that its scrapers all report the latest user agent string than it is to get everyone and their mother to update their browsers in a timely fashion.
OpenEvidence claims
"More than 40% of U.S. physicians use it daily, and it handled around 20 million clinical consultations per month. Over 100 million Americans were treated by a doctor using it in 2025."
https://www.cnbc.com/2026/01/21/openevidence-chatgpt-for-doc...Here is an example. My provider sent me this note. I'm quoting verbatim here from my MyChart record:
"Your liver enzymes are high, I would like to order acetaminophen containing medication like Tylenol, I would like to order liver ultrasound I placed ultrasound order in the system, make an appointment for radiology, I would like you to get hepatitis panel lab work done, obtain blood work order, please schedule a well visit to get it done"
When I queried it, this is what I got back. It was a dictation error. You could almost hear the panic in the message:
"Sorry for wrong message earlier, I was dictated message- so could not realize that it was written to take Tylenol type of medicines- I DO NOT RECOMMEND ACETAMINOPHEN CONTAINING MEDICINE - LIKE TYLENOL AND ALCOHOL DUE TO ELEVATED LIVER ENZYMES."
Again the problem is not dictation, or LLMs. The problem is humans ignoring their responsibility to check the output of a machine.
100%. Also, management.
I wish someone would go ahead and coin an AI version of Amdahl's law that states the work speedup from AI is dependent on amount of unverified AI output used.
Iow, if you 1:1 verified everything, there would be no time savings.
Ergo, you get management saying (1) we demand time savings due to AI & (2) we demand you fully check anything you use AI for.
End result? People skip (2) to hit (1).
Then management burns anyone at the stake whenever inevitable mistakes happen.
Which means she ends up spending just as much time as if she’d done it herself as it needs to be verified for accuracy every time…
If a physician uses Google to search for a dosage chart for some drug they rarely prescribe, you wouldn’t say they are using Google to diagnose the patient. You wouldn’t say that either if they used Google to search for the most recent studies on a topic.
The fact that they use it doesn't make what the result is any worse or less trustworthy - arguably it makes it better.
It only becomes a problem if they offload all of the thinking to AI.
An expert already knows they don't know everything. That was never the point. Critical thinking cannot be delegated to AI any more than it can be delegated to a book. There is nothing new going on here.
Do you think it is any more possible to have a proper discussion with someone who preemptively paints the other person as mentally ill? Or someone who preemptively victimizes themselves?
Cause I don't think these are the hallmarks of an honest discussion. See also the entire past decade of political discourse.
Like, consider this:
> It is weirdly religious in a way, because if you were to present contrary evidence (e.g. experts in a field weighing in about how plausible sounding responses are bunk), you would only be told you don’t believe enough in the long term potential and capabilities.
A trivial counter to this is that you can just be an expert at something (e.g. your own work), use the damn thing yourself (professionally), and evaluate the outcomes for yourself. Then maybe remark "LLM good".
Now you come and remark "LLM bad", and point at random "evidence", either of outright other workloads, or even the one at hand: you're asking someone to reject the reality they've already experienced, entirely based on the assumption that they're "merely religious" or "in psychosis". You tell me if that's any more epistemically rigorous and sensible than their story.
While I can understand being skeptical of non-experts' claims that such answers are enough, I don't understand why you call it "psychosis" and not simply naivety or lack of expertise.
At the same time, the new so-called "models" haven't been pure transformer-based LLMs, but entire systems with tools (with access to the Internet), data storage, and the options to trigger additional instances for different tasks.
"Oh you like LLMs? You must in AI psychosis!"
Let's not pretend it is anything more than the run of the mill wet fart of a culture war label. It's quite literally the "TDS" of the anti-AI crowd.
The idea here is to signal that you can absolutely use LLMs to help you figure something out. But also, they're wrong a lot. So use your own brain too.
We've known since the beginning that AIs confidently say incorrect things. But now that they can speak confidently about very complex topics, and mostly say correct things, we are letting our guard down and lots of subtle falsehoods are slipping through.
*In one case, I was able to put things back on track because the AI suggested my colleague talk to me; somehow it figured out we were co-workers.
Absolutely agree. Have seen this first hand
Do these LLMs make mistakes? They sure do, I see it all the time. But they can also help people make breakthroughs.
And this isn't the only time that Gemini has helped me diagnose long-term health issues, either.
I am not advocating to trust anything they say blindly, but they can be a great place to form new hypotheses and learn the right terms to look for when you are unfamiliar with a subject.
Similarly with LLMs, you can't just write them off entirely because they sometimes provide misleading or incorrect advice. The positive utility maximizing view is to learn when you need to call in an expert. I recently moved in to a new house and have used Claude extensively to figure out basic things (e.g., adjusting the garage door height, how to mount a TV). However, when the HVAC suddenly stopped working, I gave Claude a shot for an hour and tried some non-destructive fixes, but then realized I had to call in an HVAC expert.
I find Claude is surprisingly similar to a confident but incorrect coworker, with the benefit that Claude will reevaluate when I correct it.
I guess to me it has to be comparable to be an alternative.
Like, I don’t consider doomscrolling x an alternative to reading Wikipedia but I might consider it an alternative to CNN, even though they’re all technically and very broadly activities that I could use to inform myself.
In that same way I don’t consider the multitude of ways I could use my free will necessarily alternatives to each other even though they technically are. It kinda sucks but going that broad feels to me like it breaks the concept of alternative and makes it kind of meaningless.
I'm seeing this fairly often and when it isn't garbage it's a capable person who has gotten inspired by their 'collaboration' in which the busywork is being done by a machine, but they're doing so much directing and correcting that it's not unlike what would happen if they got heavy into meth and went on a tear.
You absolutely can write them off entirely and decide for yourself what your comfort level of human-killing speed-freakism you want to pursue in your productivity. There's a long history of humans managing astonishing levels of productivity through self-destructive means. This is not even cheaper, once the 'first one's free' wears off: it's just a novel method of getting humans to burn themselves harder in the belief that they have a magic feather.
The ones who're really throwing themselves into the situation are the ones who'll burn out, but who aren't setting themselves up for atrophy and learned helplessness. Anyone who believes the technology lets them be a lazy manager just getting paid, is in for an unpleasant discovery.
Yes, this is exactly so. AI is able to confidently sound plausible enough to convince laypersons or anyone who isn't very familiar with the subject matter, which is a big part of the mass-appeal "magic" of ChatGPT and other similar tools. It's like having a know-it-all friend (who also makes shit up to bridge their own knowledge gaps).
In many non-advanced non-specialized situations, AI is right enough to be at best useful or at worst not harmful (usually landing in the middle somewhere).
But speaking for myself, in areas where I consider myself quite proficient, I can very easily spot the subtle inconsistencies and naive conclusions that AI responses provide, and I have to guide/steer/correct it a lot to get good results when the subject matter is complex enough.
The LLM may have, from its "perspective", implicitly thought the OP was telling it that he had strong reason to believe there was no calcification and was not considering the bigger picture of possibly receiving an incomplete/poor assessment from the medical staff. In fact, the issue here may be the LLM overly trusting doctors vs. trusting its own expertise.
Software is one domain where it excels because of structured training data and simulation environments, so I'm well aware it's better here than other areas.
Still there's somewhere balanced between saying every time it's "insufficient or incomplete or outright misleading" and "just trust AI". AI's a useful source of information/reasoning/research, but know you need to validate it's answers for important decisions.
I have seen outputs that look good but the actual content is bad. If you’re inexperienced in a field you can’t see it because AI makes anything look right.
I have gotten very good results with AI but you can’t take the first answer at face value. You need to be suspicious and challenging until you tweak out the right answer over time.
"Be wowed by the convenience and speed", or merely "take advantage of the mere availability"? What most people find to be damning about expert advice is that they simply can't get it anywhere, at any cost that they can afford.
Properly emotionally processing this fact and your complete inability to do anything about it is called an "existential crisis" and if you haven't had one or several yet, you will.
Putting that aside, your philosophy sounds shallow. Death is certain, but how long you have to live and the quality of that life are not predefined. An incompetent passenger-pilot trying to save you from a crash will at worst make no difference. But an incompetent doctor can teach you that death isn’t necessarily the worst outcome.
Who do you choose to be coached by an expert on the ground?
The first: Has no clue about anything and therefore no useful knowledge and cannot challenge me
The second one: Is proven to willfully give wrong information and will make me do mistakes for sure.
The LLMs will do their best, even if imperfect, since they summarizes what appeared in books.
I prefer to be grounded on what Airbus / Boeing manuals, or on what pilots training book said, than two far more unreliable sources.
Ok for pain in your shoulder it might not, but how about a woman with a lump in her breast waiting for the mammogram interpretation? How about someone trying to understand disturbing lab results? People are also often pushed these days to move through visits with doctors at a breakneck speed, but the AI will "hear you out" all day.
Part of this is a problem with the AI, part of it a problem with our healthcare systems, and part of it is simply human nature. If you think that OpenAI, Anthropic, Google and the rest weren't aware of this going in you must have very little faith in the intelligence of their members. It's not hard to imagine the future of LLM's should involve a hell of a lot of liability on the companies running it, but for now it's the Wild West.
Whatever scenario you come up with my answer is the same.
As an adult I’d like to be able to choose what tools I use to learn about my condition regardless of how well it works or even if it’s likely to mislead me.
There’s risk in every aspect of life and we can’t baby proof everything.
Even if it "works" so poorly that you're not actually learning about your condition?
So if you MUST have answers that are at most random guesses, I'd suggest saving a few bucks and asking a coin before flipping it.
Current trend is that the models will try to explicitly steer you towards "asking better questions from your medical provider", rather than providing diagnoses. They do also evaluate whether something can actually be established rather than just listen and nod along. And so the "you must have very little faith in the intelligence of their members" goes right back against these failure mode ideas.
Now of course, given a sufficiently desperate person, they can probably torture anything they want to hear out of these models. But so can they out of actual people, so that's kind of a high bar. When you get to the point where people are willfully misreading a given piece of text, bets tend to be rather off.
I always recommend people try asking LLMs a lot of questions on something they know first. Programmers should start by asking LLMs to work on a codebase they’re familiar with first.
You’re overstating the problem, though. Even for an expert the LLM will get a lot of things right and can be helpful under a watchful eye.
The real problem is knowing how to identify when it’s on the right track and when you need to correct it, because both cases are presented with the same tone and confidence.
An expert can better identify when the LLM output doesn’t sound plausible. Someone unfamiliar with the topic will think everything it says looks correct.
This is completely different than asking for general medical reasoning which is more derived from papers, public standards and textbooks.
Text exists at the right scale but images don’t.
A real doctor is accountable.
They might both "know" a lot of things but implicitly the party who is accountable is going to be more trustworthy.
And I don't see that going away until AI companies must be licensed for application x and can lose their license / be sued if engaging in malpractice.
For example, we had to advocate for certain practices during the birth of our first child that became routine during our second several years later.
So, neither side is guaranteed correct, doctor or citizen researcher (which did not include LLMs in my case, for the record). The truest answer is also the most useless one, applicable to all fields: it depends.
The real question is: if you embrace being a layman, whom do you trust more: LLMs/the internet or experts, like doctors? I think the answer is pretty clearly experts.
media is awash at the moment with experts chiming in to support AI, saying their fields are being revolutionized, etc.
it seems unsurprising to me that the laymen opinion would follow the loudest media trumpets.
More on topic: if the article's author arrived at a definitively negative result would this have shown up on HN?
AI is much worse.
Then to say "Aha, but all of that is AI psychosis" makes obviously no sense: Why would we trust experts when they offer critique but not when they say "this is helpful"?
Overall: People are not insane. AI makes mistakes and, often, fails completely. AI also helps them do things better, quicker, increasingly so. The jaggedness of AI is confusing and real.
There is a huge difference between having a chance of a good result, which can be useful for experts able to filter out the bullshit, and consistent success. I would generate code as a helper, I would never allow a guy from marketing to merge unreviewed AI code.
But see now we are talking about something else entirely than the claim that I found dubious, which was: "Anytime someone is an actual expert at anything, AI output appears insufficient or incomplete or outright misleading."
Consistently good enough !== anytime insufficient
As an industry we've been promising people for decades that if they put all their data into our special softwares they can get all sorts of information back out that will make life easier for them, reveal new insights and otherwise improve their understanding. But the unspoken caveat has always been that you have to put the right data into the right places, in the right format, in the right way and then you have to ask the right questions, in the right syntax, with the right tools. And if you get any one of those parts wrong, you're not going to get the right answers (or possibly even any answer at all). How many people have had their excel worksheet that they (or someone else they asked/employed) built for some task that has been working fine for the last year suddenly stop working or start throwing out nonsense numbers because some input changed? Or how many people have experienced their system seemingly throw out meaningless garbage because daylight savings changed right at the moment the report was being run? Or spent months operating on wrong data because the person who wrote the query misplaced a parenthesis and the query was searching for "(foo AND bar) OR baz" and not "foo AND (bar OR baz)". For most people, the computer and the programs they use to do their jobs are magical black boxes that most of the time produce mostly the right answers and sometimes get things very very wrong with no indication of what has changed. Which is effectively the same experience they will have with an AI, but now instead of needing to figure out some arcane excel pivot table and VBA script, they can just dump some raw data and a "natural language" question into the AI.
And that's not counting the fact that their experience with looking information up online is about the same as well. How many absolutely confident wrong takes have you encountered online for things you're an expert in? How many of those wrong takes have come straight from supposedly trustworthy sources like news companies or even other people in the field?
For most people, using a computer has always come with the asterisk that you should always be aware that the source you're reading could be very wrong, that the output is only correct assuming all the inputs and all the parts processing that input are also correct and that everything you do should be accompanied by vetting by experts, whether those experts were software developers or domain experts. For most people the only thing that's changed with AI is that it's a one stop shop for their "probably directionally right, almost certainly wrong in the details" access to the digital oracles.
In fields where I'm an expert... it makes a lot of silly mistakes that are annoying and I feel like they would just cascade if I didn't correct them early. (I still think it's a net win, but... I watch it and it watches me, and we both do better work. I'd even apply the "magical" adjective when it does stuff I hate but know how to do, like edit Helm charts. What would normally be 20 minutes of me griping about YAML indentation is just a correct diff in seconds. I'll take it!)
So with that in mind, I tend to distrust output that I can't verify. If a doctor was recommending surgery and I thought the plan was too aggressive, I'd get a second opinion. I don't expect Claude Code to have much medical diagnostic ability, as that is really not what the model is trained for, and I know how it performs on work that it's trained and fine-tuned for. That is not to say the output is wrong and that it can't have diagnostic value, just that I personally wouldn't feel safe trusting it. Wrap up the same model with fine-tuning in the domain and a harness that reminds Claude to do a lot of sanity checks, perhaps with a human in the loop to guide it back onto the rails when it gets hyperfixated on something that doesn't matter? That could very much be a useful AI product.
The term for when the press "gets it wrong" is Gell-Mann Amnesia (https://en.wiktionary.org/wiki/Gell-Mann_Amnesia_effect).
In that case, when you have personal knowledge of the facts, or know the specific domain area, you can see where the reporter mixed things up.
AI is no different, it's just a bunch of matrix math substituting for "the reporter" regurgitating what it was previously told. So the Gell-Mann Amnesia effect would apply just the same. If you have domain knowledge, you immediately see where the AI got it wrong. When you do not have domain knowledge, you have less chance of seeing where the AI was wrong.
AI isn't even the first instance of this phenomenon, news articles are like this as well.
AI assistant are industrializing the Gell-Mann amnesia effect.
It has been like this since the rise of "AI". The only people enthusiastic about it are usually the ones hoping to make a profit in one way or another.
I.e. nothing this radiologist said was related to the LLM’s advice.
Apply that to the Internet at large, and realize where LLMs got their training. They're basically ConfidentlyIncorrect personified.
Welcome to the club? This new awareness you've found over the true quality of LLM based GenAI output has been what "all the haters" have been mad about for-ever. That the output of LLMs are clearly defective, and merely have found a cute trick towards making humans think they're less defective than they are actually measured to be.
And the corresponding anger and frustration to push the risks of genai output out onto others, while also aggressively pushing it as a feature you should be using already. You're behind don't you know, and whatever other lie I have to tell to trick you into enough FOMO to pay me 200USD/mo so I can sell FOSS back to you.
An LLM can only output the mean next likely token, and then add a bunch of extra noise on top of that so it feels interesting and not repetitive. None of this is new, the problem is, 50% of humans are below the mean, but have no idea. So when an LLM tells them some lie: well, it sounds so helpful! It's impossible for someone who sounds this helpful to lie to me, liars never sound confident! It must be PERFECT! I'm gonna tell everyone how perfect it is. so the bottom 0-33% think LLMs are fantastic tools that make nearly 0 mistakes in comparison to the bottom 33%. 33-66%-ish aren't sure, some times it's great, but it will make that random mistake sometimes, but I can catch most (or all of them depending on ego). and the 66%+ are angry about how many people are getting tricked by something so obviously low quality, or are lucky enough to not have to care.
So when an LLM was asked to analyze the unit distance conjecture, it just spat out a bunch of average-or-random tokens that coincidentally happened to correspond to a valid proof that had eluded humans for decades?
It's always something along the lines of incredibly peaceful, insanely powerful, extremely interesting, also scary and uncomfortable meanwhile feel like magical super powers and science fiction.
I'm telling you... words have lost meaning.
So, unless you can turn the image into a natively tokenized format like JSON or something that somehow accurately tokenizes what's on there, I would NOT trust Dr. Claude's analysis. If you want a second opinion, talk to another doctor. A human doctor.
I didnt see the full process but I used unet models for tumor detection so I am somewhat familiar with the possible caveats of any evaluation from a engineer perspective.
First, I would like to point that unfortunately, it is not uncommon to go to two different human doctors and also get two unreliable diagnosis and treatment. The biggest problem, in the way people plan to use ai on health is the lack of liability.
A bug on a regular old web site doesn't kill anyway nor cause pain and suffering (most of the times) but misdiagnosis + the fact that a model is very good on presenting arguments even when it is completely wrong.
Claude code, and I am talking about opus 4.8 here, can tell rivers of information about code pattern and develop the poopiest code the next line.
This is a machine that will deliver a sort of templates document based on the input information but it is not exactly doing the work if you don't directly it to do it right constantly.
Because the model isn't thinking I wonder what happens if you set multiple agents to communicate and defend their point with some sort of harsh penalty prompt for not fulfilling its goal. There are some safety system prompts on Claude models that will trigger it to be very carefully to write. Like: you cannot make mistakes. "You need to ensure that it is correct or someone might end up hurt or even dead"
But you would need two agents and a setup to communicate via pipes or files.
One doctor diagnosis + LLM is gonna throw you off. You need more datapoints.
I wonder if this person was going to a traditional doctor or if they were visiting some type of specialty clinic as a second opinion. For most conditions you can find specialty clinics that will prescribe and administer (and bill for) a lot of non-indicated treatments, but some patients like being in the care of doctors who take action and do things after being recommended more conservative treatments by primary doctors.
https://www.nature.com/articles/d41586-026-01947-1
I've started asking my doctors whether they use AI, and if they say yes look for another one.
A very plausible explanation for the adenoma detection rate to have gone down is simply that its prevalence went down among the population in the second three-month period.
This was not a randomized trial. Concluding that "AI usage degrades physicians' skills" is questionable at the very least.
https://www.sciencedirect.com/science/article/pii/S245195882... (+ cf. its references)
Well, we now have the best model of our time (trillions of $$$ of investments) telling us something completely different(and wrong) from a human expert. I would really like someone calling out dario, sam, elon on these things and hear their explanations but alas, a man can only dream.
I think they’re artificially stunting the field to raise their wages. For example in my city the medical school only accepts 11 people into the program a year. (With an average graduation rate or 3-5). My niece has been trying for 2 years and finally got in this last year. Even radiology is doing AI assisted diagnostics. Half my MRI’s from this year has Doctor notes and HealthBot (AI) notes attached to them.
~ I’m assuming other schools severely limit their radiology admissions as well. To keep the wages high and the field desirable.
These days Xray machines - they don't even suit up in lead or stand behind a wall , just point and shoot. In fact they're nice and portable. I wish i had a xray machine at home.
Funny how the jobs most at risk of automation now are tech jobs.
diffusion models are probably a better bet for identifying irregular structures
All that said, as a doctor I am totally open and even happy when a patient refers they took advice from AI. I explain the holes of their reasoning and integrate it with mine. It helps rather than hurts the patient-doctor connection.
A cardiologist friend goes in deep discussions with a specialised model and he is amazed.
> As detailed in a new, yet-to-be-peer-reviewed paper, a team of researchers at Stanford University found that frontier AI models readily generated “detailed image descriptions and elaborate reasoning traces, including pathology-biased clinical findings, for images never provided.”
> In other words, the AI models happily came up with answers to questions about a supposedly accompanying image — even if the researchers never even showed it an image.
> As opposed to hallucinations, which involve AI models arbitrarily filling in the gaps within a logical framework, the team coined a new term for the phenomenon: “mirage reasoning.”
> The effect “involves constructing a false epistemic frame, i.e., describing a multi-modal input never provided by the user and basing the rest of the conversation on that, therefore changing the context of the task at hand,” the researchers wrote in their paper.
> The damning findings suggest AI models cheat by diving into the data they were given — and coming up with the rest based on probability, even if it’s almost entirely conjecture.
I know you can’t trust an LLM’s self-assessed “confidence” of a prediction, but I’ve found that confidence can at least be directionally correct for some tasks. For our benchmarks, however, confidence was poorly correlated. What’s worse is that binary classification models (“Do you see $diagnosis in this photo?”) highly influenced the LLM to confidently predict $diagnosis.
I’m concerned for those using LLMs for diagnostics, and getting confidently led to the wrong conclusion.
What I’ve seen be the true bottleneck is people not setting up the structured data. But making a tiny reasoning model with OPSD -> GRPO is totally doable with a bit of money.
I wonder if the above problem can be fixed similarly? Just ask the LLM to do a conservative grounding analysis before jumping to the main task?
In my experience, Claude Code is vastly better for doing tasks, writing code, etc., but Claude.ai is better for analysis and high-level planning. When I'm working on a new project, I've started using the latter to do the initial planning, get feedback and draw up a spec, which then goes to Claude Code.
For this project, I probably would've done something similar - use CC to get whatever you need out of the image files, but have Claude.ai do the actual review/diagnosing.
Either way, I often think about how far behind most of the world is in really understanding AI. The overwhelming majority of people would never guess that you get vastly different outcomes from the exact same model in a different harness (tbf most people don't know what a harness is). I spend hours every day using AI for a broad range of tasks and still feel like I know a fraction of what there is to know. I haven't even tried the new GLM model (or really any of the open source Chinese ones of the most recent generation). With so many people thinking that the free version of ChatGPT is SOTA AI, a lot of folks are in for a very rude awakening at some point soon.
Luckily my disks were fine. Wouldn't trust it. Additionally, an MRI of a pain-free, healthy human still would show lots of things and damage. Unless it coincides with a symptom, it's probably harmless. That's why the history is important when looking at images. Can't just upload something and hope for findings.
I wouldn't consider Claude itself to be the tool that does a job like this, but the tool that pulls in the best data and gives a supported suggestion. And then go through a number of iterations on where it failed to hone in its assessment.
LLMs are the best PDF-to-markdown converters, in my experience. I have a CLI that converts PDF to PNG, then run a background agent to "read" each PNG and write it down as markdown; it works flawlessly even for complex math formulas, it can "translate" complex charts, graphs, and tables into words.
It's slow and arguably expensive compared to traditional OCR, but very effective and precise.
The finer detail (which you may already know) is more complicated.
MR does ‘2D’ scans which are a slice, then a gap of non-imaged tissue (typically 10% the slice thickness) then a slice. Each slice is an image with a number of pixels, say 320. Each pixel in the slice is small, eg 0.5mm but very thick due to the slice being thick, which is required for MRI signal. The pixels are 3mm in the shoulder scan done here.
‘3D’ scans don’t have a gap between slices, and are often isotopic, meaning the same resolution in all directions. The voxel (a pixel with depth) would be something like 1mm x 1mm x 1mm.
3D scans are slow, prone to movement artifact and never as pretty in plane as a good 2D. You can reformat them to look ok in any plane.
The LLM doesn’t need to be leading or whatever but then you can have a conversation with the patient. If their ChatGPT reports has differences it can be analyzed as well.
It feels like the time constraint of the 15m doctor sessions is the thing. But if prepared immediately after the scan then why not?
There is always time needed to factor in new developments and innovations and that’s fine. Just moving blindly work from human to LLM is wrong. But learning on and testing with all the ai tools incoming constantly won’t be a waste. There will be more and more tools in those processes outside of human judgement, better improve the workflows now to be able to test and plugin new models and systems when they are ready.
Because they don't exist, yet.
In the UK MRIs and other imaging systems need two opinions. there has been a move to allow the first opinion to be ML based.
The _problem_ is that you are basically doing grey smudge analysis, and thats fucking hard.
An AI telling you it could be X or Y because theory ABC… is the academic answer and a luxury clinicians don’t have. AI doesn’t give you what you want. I don’t see any added value in using generic AI models for this
If the author would actually go for a second opinion (maybe bring along the AI to let it explain it's findings), then the article could read as "AI did MRI analysis and proved my doctor wrong" (or: "AI did MRI analysis and failed").
And well, yes, I have the appropriate life science degrees to navigate clinical trial reports and research publications, and that was likely indispensable for steering Claude Code where it went, the radiologist's caution is merited here. But it's just not amateur hour for me to do this, it's 2 decades of academic research in my rearview mirror.
Even a tiny injury can severely cripple us.
Many can get paid fee-for-service for after hours work, so would probably prefer that.
My dog had been acting off. Wouldn’t eat, was hunched over, looked sad. We took him to a local vet who did an X-ray because they suspected a blockage. They didn’t see one, so they sent us home with standard pain meds.
Randomly, we had a dinner party that night and another vet was there. She heard the story and immediately said, “Go home right now and take your dog to an emergency vet with ultrasound.”
Turns out, at the time, most vets had been trained to use X-rays to look for blockages, but newer evidence showed X-rays were only something like 20% effective compared to ultrasound, which was closer to 95%. (forget percentages but somethign like that)
The ultrasound found an avocado pit stuck in his intestine. He had emergency surgery that night.
That chocolate chunk of an English Lab ended up living until 15, and only needed two more blockage surgeries after that...
I know doctors hate patients reading the internet, and LLMs are going to make that 1000% worse for them. But hopefully over time, we all adapt together and end up better off in the long run.
I found that while Claude, GPT etc could describe an image, there was no way to link the description back to specific pixels in the image itself. Not even to a bounding box or segment.
Instead, it is my experiences with LLMs in a domain that I know very well that makes me skeptical of their performance across the board. I find issues in code review multiple times a day with their output, and they are explicitly and extensively trained on this use-case, unlike with the MRI data. Sometimes I veer into other domains I have decent knowledge about (construction, carpentry, landscaping) and LLMs disappoint me there as well.
I suppose Gell-Mann amnesia is a universal human quirk and not restricted to just the news.
This single sentence provides a huge clue about what’s going on: This person’s medical team is not good. It’s not hard to get an LLM to perform better than a team that is injecting homeopathic botanical formulations and performing procedures that aren’t indicated for the condition.
I think the real takeaway from this article shouldn’t be “ChatGPT is better than doctors”. It’s a story about LLMs identifying that someone was not in good hands.
And
> They performed shockwave therapy on my shoulder
(a procedure that may not be effective, but is unlikely to cause any harm)
Its not just about LLM's being better, its about people not trusting DR any more: https://www.physiciansweekly.com/post/the-erosion-of-trust-i...
If we want to fault the article for anything it's that he didnt take that information and go get a 2nd opinion from someone who IS more informed.
That said, while I do see homeopathic stuff with that name, it's worth verifying that it isn't just a naming conflict. They're not always unique, particularly across countries, and Traumeel seems to be more of a brand than a specific thing.
AI is completely without ego, and can process all my medical records in minutes. In truth, even today, I would rather have an AI analyse my records.
It's not true that "AI makes mistakes" or "ChatGPT is sycophantic". It's just that sometimes the simulated extensions to the training material are accurate, and sometimes they're not.
Overall i see a great opportunity for x-ray techs (radiographers even when Jensen from NVidia says the first field he recommends not getting into - Radiology which is the step above) to open their own businesses for people who want to use AI for self care and help. Have one doctor or dentist on staff to use as needed.
It like using WebMD for any ache and pain and it is saying it might either be Lupus or cancer.
> AI can absolutely shatter that feeling in an uncomfortable way ...
I see this as a field report in a time of fundamental transition, from a world without AI, to one that accommodates/incorporates AI. For this to happen, AI will need to become more trustworthy. As for the U.S. medical system, it can't get much worse.
I recently had a similar experience (meaning walking a fence between old and new methods), where I was told I could get an appointment with a human medical practitioner in nine months. So, to resolve my anxiety I consulted AI and got an instant diagnosis, one that was later confirmed by the inaccessible medics.
Being a born skeptic I wasn't going to act on AI's diagnosis, I just wanted to know what was going on, resolve some uncertainty. Another advantage: an AI chatbot doesn't say, "Wait, you're on Medicare? Hmm. See you in nine months."
Don't take this as an endorsement of AI's diagnostic abilities -- it's way too soon for that. In my case it was a slam dunk, about a condition I knew nothing about.
IME, on an almost daily basis, claude.ai and Claude Code are confidently wrong about something, and use polished language to assert nonsense.[*]
If it's doing that on something easy, like factual knowledge available in text on the Internet, or programming code that can be inspected easily and follows well-known rules, and I can tell, because I understand those things... then there's no way I'm going to assume that Claude doesn't also BS when it comes to someone else's field. Especially not a field that requires some of the smartest people to go a decade of training, just to get started in the field.
[*] And if I confront Claude with its mistakes, eventually it apologizes, and acts as if it's learned something, again mimicking word patterns it's heard real people use and mean, without meaning any of it. I wonder whether the AI user experience would be better, if LLM-ish interfaces weren't implicitly created in the image of fake-it-till-you-make-it overconfident performative sociopathic techbros.
But are you all forgetting that they literally injected a homeopathic drug on the author?
Between that and Claude sometimes hallucinating, it’s probably worth encouraging patients to take second opinion always.
I'm no fan of pseudoscience either, but this is where things get blurry. The placebo effect is real even if patients are aware of it. If you give a patient a homeopathic drug while informing them of potential side effects (if any), and then they feel better, have you hurt them? Or have you helped them?
I personally have no interest in trying homeopathic medicines, but the reality is that many patients do take these and are adamant they help. As long as any risks are communicated and there are no serious side effects, it's difficult to make an argument against their use in patients who report a subjective benefit.
I want to know if this is a religious thing, or is related to never having had multiple doctors so bad it seemed like they were actively trying to kill you, or both. I've never had this peaceful experience personally within the realm of healthcare.
> AI can absolutely shatter that feeling in an uncomfortable way
Good. Reality is always good.
> but I don't know if I can fully trust AI either.
WTF??!? Why on earth would anybody ever think they could fully trust LLMs? Even their most vocal proponents concede they aren't infallible panaceas.
On the plus side when they do this they can't flood your calendar with those "quick chat" meetings because they know they won't be able to hold a conversation on the issue beyond the first minute.
I find that AI can be incredibly useful, but just text dumping its output into a conversation feels insulting.
AI probably exacerbates it but crappy managers exist regardless
They give me what they'd like the UI to look like, but none of the actual content fits outside the one situation they're thinking of.
¯\_(ツ)_/¯
Thankfully where I work now everyone is good about taking no for an answer.