How good is AI at making up stories? It's beyond your imagination!

How good is AI at making up stories? It's beyond your imagination!

Have you ever encountered such a situation: you ask AI a question, and it gives you a very detailed, rich, and seemingly logical answer. But when we verify it, we find that the information is completely fictitious?

This is the famous "AI hallucination" phenomenon.

Image source: Hesenburg's Sina Weibo

Why do AI hallucinations occur? Let us unravel this mystery today.

Why do AI hallucinations occur?

AI hallucination refers to the phenomenon in which AI generates information that appears reasonable but is actually wrong. The most common manifestation is the fabrication of non-existent facts or details.

Just like when we encounter a question we don’t know how to answer during an exam, we try to use our known knowledge to guess the answer. When AI encounters a situation where information is missing or uncertain, it will fill in the blanks and make inferences based on its own “experience” (training data).

This is not because it wants to deceive us, but because it is trying to accomplish this task using a model that it understands.

1

Predictions based on statistical relationships

Because AI (especially language models like ChatGPT) learns the statistical relationship between words through a large amount of training data. Its core goal is to predict the most likely next word based on the context, not to truly understand the question or content. So AI essentially generates content by maximizing probability, rather than by logical reasoning.

Simply put, AI is like a well-read wise man who acquires knowledge by studying a vast amount of text and materials. However, it does not really understand this knowledge, but "predicts" the next most appropriate word by finding statistical relationships and patterns between words. That is, AI guesses the most likely word to appear next based on a large number of examples learned before.

Sometimes, though, the model will "guess wrong." If there's a little deviation at the beginning, the content behind it will snowball. This is why AI sometimes starts with a small mistake and ends up weaving a completely fictitious story.

2

Limitations of training data

Since AI has no real-world experience, all of its "cognition" comes from training data. However, training data cannot contain all the information in the world, and sometimes even contains wrong information. It's like a person can only answer questions based on the books he has read. If there is wrong information in the book, or knowledge in certain areas is missing, it is easy to make wrong judgments. For example: In the early days when AI hallucinations were large, it might appear that AI had learned two knowledge points: "Beijing is the capital of China" and "Paris has the Eiffel Tower." When we ask it "What famous buildings are there in Beijing", it may mistakenly mix these pieces of knowledge together and say "Beijing has the Eiffel Tower."

3

Overfitting Problem

Because the training parameters of large models are very large, large models will have the problem of "overfitting" on the training data. That is, because too many wrong or irrelevant things are remembered, the AI ​​becomes too sensitive to the noise in the training data, which eventually leads to hallucinations.

4

Limited context window

Due to technical limitations, although the context windows of large models are getting larger (for example, they can handle 64k or 128k tokens), they still understand text within a limited range. This is like reading a book through a small window, not being able to see the entire book, which can easily lead to misunderstandings.

5

Design for generating fluent responses

Many large models are now designed to give fluent answers. When they are unsure about a question, instead of saying "I don't know", they tend to make up a seemingly reasonable answer based on existing knowledge. The above situations are combined to cause the current very serious AI hallucination problem.

Copyright images in the gallery. Reprinting and using them may lead to copyright disputes.

How can we reduce AI hallucinations?

AI seems very convenient, but AI's serious "nonsense" can sometimes be really annoying. The information it gives often needs to be verified repeatedly, and sometimes it is not as reliable as searching directly on the Internet. So, how to deal with AI hallucinations? We have summarized the following methods to help you.

1

Optimize questions

To get accurate answers, the way you ask questions is crucial. Communicating with AI also needs to be clear and specific, avoiding vague or open-ended questions. The more specific and clear the questions are, the more accurate the AI's answers will be. At the same time, we should provide enough context or background information when asking questions, which can also reduce the possibility of AI making random guesses. The following four ways of asking questions can be summarized as prompt word skills:

1. Set boundaries: “Please strictly limit your research to research published in Nature in 2022”;

Example: "Introduce the development history of ChatGPT" → "Please introduce the development history of ChatGPT based only on OpenAI's official public documents from 2022-2023"

2. Mark uncertainty: "For ambiguous information, it is necessary to mark 'this is speculation'";

Example: "Analysis of Tesla's market share in 2025" → "Analysis of Tesla's market share in 2025. For unofficial data or forecasts, please mark [speculation]"

3. Step-by-step analysis: “The first step is to list the confirmed facts, and the second step is to conduct a detailed analysis”;

Example: “Assess the impact of artificial intelligence on employment” → “Please assess the impact of AI on employment in two steps:

1) First list the specific impact cases that have occurred so far;

2) Conduct future trend analysis based on these cases”.

4. Clear constraints: Tell the AI ​​to answer based on existing facts and not to make assumptions.

Example: "Predict the real estate market trend in 2024" → "Please only analyze based on the actual real estate data in 2023 and the relevant policies that have been issued, and do not add any speculative content."

2

Batch output

Because AI content is generated based on probability, the more content generated at one time, the greater the probability of AI hallucinations. We can actively limit the amount of its output. For example, if I want to write a long article, I will say to the AI: "Let's write it paragraph by paragraph. Write the beginning first. When you are satisfied with this part, continue to write the next paragraph." This not only makes the content more accurate, but also makes it easier to control the quality of the generated content.

3

Cross Validation

Another practical way to improve the reliability of AI answers is to use "multi-model cross-validation". An AI aggregation platform used: multiple AI models can answer the same question at the same time. When encountering questions that require rigorous answers, this function will be activated to allow different large models to participate in the discussion and gain a more comprehensive understanding by comparing their answers.

Click to enlarge, Image source: provided by the author

Another example is the "multi-model collaboration" function of the Nano AI search platform, which allows different AI models to perform their respective duties and form an efficient collaborative team. DeepSeekR1, which is good at reasoning, is responsible for analysis and planning, and Tongyi Qianwen makes corrections and supplements, and finally Doubao AI is handed over to sort out and summarize. This "expert group" collaboration model can not only improve the credibility of the content, but also bring more comprehensive and in-depth insights.

Image source: provided by the author

4

RAG Technology

AI is a smart but forgetful person. In order to make him perform more reliably, we can give him a super encyclopedia, and he can check the content in it at any time to answer questions. This "encyclopedia" is the core of RAG. It allows AI to find relevant information from reliable materials before answering questions, and then generate answers based on this information. In this way, AI will not easily "talk nonsense". At present, RAG technology is mostly used in professional fields such as medicine, law, and finance to improve the accuracy of answers by building a knowledge base. Of course, in actual use, in high-risk fields such as medicine, law, and finance, the content generated by AI must still be reviewed by professionals.

5

Using AI Illusions

Finally, let me talk about one benefit of AI hallucination.

AI illusions are often creative sparks! Just like an imaginative artist, they are not bound by conventional thinking and can come up with surprising ideas.

Just look at DeepSeek. It is indeed more prone to hallucinations than ChatGPT and Claude. However, the reason why DeepSeek has become so popular this year is inseparable from its powerful creative ability.

Sometimes, instead of viewing AI illusions as defects, it is better to view them as a source of creativity! When writing, creating art, or brainstorming, these "jumping thoughts" may help us open the door to a new world.


Copyright images in the gallery. Reprinting and using them may lead to copyright disputes.

The nature of AI illusions - AI, in the fog of knowledge, sometimes creates "shadows" that seem real but are actually illusory. But like any tool, the key lies in how it is used.

When we learn to communicate with AI in the right way, make good use of its creativity, and maintain independent thinking, AI can become our powerful assistant rather than an "eloquent liar."

After all, in this era where AI and humans advance together, the important thing is not to blame AI for its imperfections, but to learn to collaborate better with it.

Planning and production

Author: Tian Wei AI tool researcher

Review丨Yu Yang, Head of Tencent Xuanwu Lab

<<:  Beware of the "fourth high"! How did your uric acid get out of control step by step?

>>:  It is said online that "riding a cockroach is as fast as taking the high-speed train". Is it true or false?

Recommend

This kind of "black sesame" cannot be eaten, be careful of Datura poisoning!

Datura is an annual herbaceous plant of the Solan...

Can the gut differentiate between real and fake sugar?

Taste buds are not only found on the tongue, but ...

Yang Yigang's zero-based money-making videos become cash courses

Do you also want to experience the feeling of hav...

How to design an online traffic-generating activity from scratch?

I was slapped in the face by Double 11. I think t...

Years of doubt: Are seaweed and laver related?

01Are there any relationship between nori and sea...

Polymer framework - greatly reduces the cost of coding for developers

Polymer provides you with all the necessary tools...

Radio Gymnastics is 70 years old! Are there any young you in these photos?

Every day at noon Sitting in the office I can hea...

Internet TV is the mainstream and traditional strategies have no future

In the past, the development of the entire color ...