← Back to tweets
Kaijie_Mo
@Kaijie_Mo
View original tweet

🧵 Hello world First paper as a PhD student at UT Austin! We ask: what happens when medical “evidence” fed into an LLM is wrong? Should your AI stay faithful, or should it play it safe when the evidence is harmful? We show that frontier LLMs accept counterfactual medical evidence at face value, even when it involves toxic substances or non-medical objects. 🧵 Setup (2/4) We introduce MedCounterFact, a counterfactual medical QA dataset built on RCT-based evidence synthesis. – Replace real interventions in evidence with nonce, mismatched medical, non-medical, or toxic terms – Evaluate 9 frontier LLMs under evidence-grounded prompts 🧵 Results (3/4) – With evidence, models overwhelmingly adhere to the evidence with high certainty and confidence, even for toxic or nonsensical interventions – Implausibility awareness is transient; once evidence appears, models rarely flag problems – Scaling, medical fine-tuning, and skeptical prompting offer little protection 🧵 Paper: https:// arxiv.org/abs/2601.11886 Code/data: https:// github.com/KaijieMo-kj/Co unterfactual-Medical-Evidence … w/ @Kaijie_Mo @sidvenkatayogi @ChantalShaib @RKouzyMD @cocoweixu @byron_c_wallace @jessyjli (4/4)

Thread Screenshots

Thread screenshot 1
Thread screenshot 2
Thread screenshot 3
Thread screenshot 4

Images

Tweet image 1
Tweet image 2
Tweet image 3
Tweet image 4
Tweet image 5
Tweet image 6
Posted: 1/21/2026Scraped: 1/21/2026
Published to RedNote