
It’s remarkably easy to inject new medical misinformation into LLMs
The resulting models were much more likely to produce misinformation on these topics. But misinformation has affected other medical topics as well. “At this scale of attack, the poisoned models unexpectedly generated more malicious content than the baseline when asked to learn about concepts that our attack did not directly target,” the researchers wrote. Thus, learning from misinformation has made the system not only more unreliable in specific topics, but also in medicine in general.
But considering that on average each of the 60 topics is mentioned more than 200,000 times, replacing even half a percent of them requires significant effort. So, the researchers tried to find out how little misinformation they could include and still have an impact on the effectiveness of the LLM. Unfortunately, it didn’t work.
Using a real-life example of vaccine misinformation, the researchers found that reducing the misinformation rate to 0.01 percent still resulted in more than 10 percent of responses containing incorrect information. Moving to 0.001 percent still resulted in more than 7 percent of responses being harmful.
“A similar attack on the $70 billion LLaMA 2 LLM4 parameter trained on 2 trillion tokens,” they note, “would require the creation of 40,000 articles costing less than $100.” The “articles” themselves may just be run-of-the-mill web pages. The researchers included misinformation in parts of web pages that are not displayed, and noted that invisible text (black on a black background or font set to zero percent) would also work.
The NYU team also put their compromised models through several standard medical LLM performance tests and found that they passed. “Performance of the compromised models was comparable to control models across all five medical metrics,” the team wrote. Therefore, there is no easy way to detect poisoning.
The researchers also used several techniques to try to improve the model after training (fast design, instruction tuning, and advanced search generation). None of this improved the situation.
2025-01-08 22:58:45