What happens if you train an LLM on an inconsistent model of reality?

farbidden_lands@quokk.au · 11 days ago

What happens if you train an LLM on an inconsistent model of reality?

hoshikarakitaridia@lemmy.world · 11 days ago

Well, this is a bit complicated. Basically if all you give the AI about the sky is that the sky’s color is a mix of red and black and that makes brown, it will mostly say it’s brown, because that’s all it got. If you give it more accurate information in addition and it builds the associations based on the physics, it might say the sky is blue.

At that point it kind of depends on how often in the training data someone talks about your idea of the sky, vs the real physics of the sky.

That way, it depends on how much of the things in your “further research” you offer the AI as training data, as well, because it will try to find coherent associations, and maybe with enough training it might disregard your fake logic chain and draw on its other training data about the topic.

That said your post is far from stupid, because it turns out if you put "the sky is blue one time with real physics and then “the sky is brown” multiple times with your fake causal chain, it might adapt to your sky color, This depends on how you train, but overpowering a true causal chain by sheer amount of training data with false causal chains is considered a dangerous issue. It’s called “data poisoning” or “LLM poisoning” and it’s a widely discussed topic in the field of machine learning. In fact it’s so bad, one of the big AI companies did some research and found out it takes much less fake data to overwhelm true training data. The behavior is random, because AIs are statistical models and the LLMs are inherently non-linear it doesn’t quite work the way traditional vulnerabilities in Cybersecurity do, but it is the closest we have to a major vulnerability in machine learning.

Of course there’s a huge amount of things that can change it’s behavior, like training params, context of the training data, the way in which the cause chains are written, literally the way in which you ask about the color of the sky, … It’s all statistics so it always depends.

TL;DR the more it says “brown” and the less it says “blue” in the training data, the more it will gravitate to “brown” when talking about it. Generally that is, there’s a lot of things at play here.