LLMs pick their answers based on a large dataset of training data, and they seem to do a very good job at that (relatively, trying not to be pedantic), I know they spit a lot of incorrect information out but they're still doing a very impressive job at building text blocks about any topic.
But they often fail at the strawberry question. I know LLMs can't really think, and it would make sense for it to fail at it if it never saw the question, but I would assume that they get asked that questions thousands of times daily, and the users correct them often, but they still make the same mistake. How did it not "pick it up" yet? I am assuming it would get negative feedback when it makes a mistake, and rewarded when getting it correct, so why do LLMs seem to do a very impressive job at using the training data and feedback to build large blocks of text about anything, with perfect grammar, tuning everything to the custom user, but fails at such a simple thing with less variables?

