Competing with "tradition": AI is coming, will it rewrite the boundaries of search?

Competing with "tradition": AI is coming, will it rewrite the boundaries of search?

Original title: "Moment of Truth: Will Wikipedia destroy itself in the process of helping AI to continue to improve?"

© Tech Crunch

Leviathan Press:

I am currently using ChatGPT version 3.5. Like the Wiki editor at the beginning of this article, I found that ChatGPT is often full of nonsense... Not only does it often seriously fabricate some false facts, but when you point out its mistakes, it will instantly change its answer, even though the second answer is often still wrong.

This is undoubtedly a very unpleasant experience for an editor - this is when Wikipedia shows its difference from artificial intelligence. Although some entries are naturally controversial during repeated editing, according to the "NPVO" principle, the factual statements are much more reliable than ChatGPT.

In early 2021, when a Wikipedia editor tried ChatGPT 3 for the first time, he found that the language model was full of errors - it would make up facts at will and cite articles at random. But at the same time, he also realized the great potential of this tool and was convinced that it would replace his beloved Wikipedia in the near future. The editor wrote an article titled "Death of Wikipedia" for this.

© Mashable

Now, two years have passed, ChatGPT has been updated to version 4; Wikipedia also celebrated its 22nd birthday in January this year. So, what kind of relationship exists between the two at present?

Journalist and author Jon Gernter explored this question in depth in his New York Times article: "Moment of Truth: Can Wikipedia help tech AI chatbots to get their facts right—without destroying itself in the process?"

Looking back at the history of Wikipedia, we seem to have returned to the golden age of the Internet: at that time, everyone, as long as they could connect to the Internet, could learn and share all human knowledge for free.

Today, the total number of articles on Wikipedia has exceeded 61 million, written in 334 different languages. It has long been on the list of the most visited websites, and, unlike Google, Youtube and Facebook, which are also on the list, Wikipedia has always refused any advertising and only obtains funds through donations.

Additionally, all of its contributors receive no compensation—and this group guarantees 345 edits per minute.

© Wikipedia

Today, Wikipedia is no longer just an electronic encyclopedia, but a knowledge network that binds the entire digital world together, providing people with a reliable source of information. Most of the knowledge we search and learn from Google/Bing/Alexa/Siri comes from Wikipedia, and YouTube also uses Wikipedia to combat rumors.

Intelligent chatbots are of course no exception. Wikipedia plays a vital, perhaps even the most critical, role in their training process.

Nicholas Vincent, a researcher at Simon Fraser University, believes that there can be no strong artificial intelligence without Wikipedia, but he also believes that the popularity of large language models such as ChatGPT may lead to the demise of Wikipedia.

At a conference held in March this year, people discussed the threat posed by artificial intelligence to Wikipedia. The editors were mixed: they believed that artificial intelligence could help Wikipedia develop rapidly, but they were also worried that people would increasingly choose ChatGPT instead of Wikipedia to answer questions - compared to Wikipedia's somewhat old-fashioned and stiff entries, ChatGPT's answers were obviously more understandable and natural.

Based on the results of the conference discussions, a consensus call was: "We want to live in a world where all knowledge is produced and constructed by humans." But now, is it a little too late?

In fact, as early as 2017, the Wikimedia Foundation community and its volunteers were discussing how to further develop and achieve permanent preservation and sharing of world knowledge by 2030. At that time, they noticed how the emergence of artificial intelligence changed the way knowledge is collected, combined and integrated.

Challenges Wikipedia has encountered in its development

In addition to Wikipedia, today's large language models also widely absorb information from the Google patent database, government documents, questions and answers on Reddit, online libraries, and massive amounts of online news; however, Jesse Dodge, a computer scientist at the Allen Institute for AI in Seattle, believes that Wikipedia's contribution is unparalleled, not only because it accounts for 3%-5% of the total data used to train large language models, but also because it is one of the largest and most carefully screened databases.

© Erik Carter

Today, Wikipedia editors are having a heated discussion about the relationship between AI and Wikipedia, which is somewhat similar to their discussion about the relationship between Google and Wikipedia 10 years ago. The conclusion at that time was that Google and Wikipedia are mutually beneficial and coexist harmoniously: Wikipedia makes Google a better search engine, and Wikipedia also gets a lot of traffic from Google.

Of course, maintaining a close relationship with Google and other search engines has also brought some existential crises to Wikipedia: if you ask Google, what is the Russian-Ukrainian war? It will quote and briefly summarize the content of the article from Wikipedia, and readers often prefer Google's answer to reading the Wikipedia article with more than 10,000 words and 400 footnotes behind it.

Furthermore, this will lead to the average person oversimplifying their understanding of our world and will also affect Wikipedia's ability to recruit a younger generation of content contributors.

A 2017 study[1] showed that visits to Wikipedia are indeed declining, and the emergence of intelligent chatbots has accelerated this process.

Aaron Halfaker, head of the Wikimedia Foundation's machine learning research group, said that search engines at least post source links while providing brief answers to help people return to Wikipedia pages; while large language models only integrate information into fluent language without citations or evidence, and people have no way of knowing the source of the answer. This makes artificial intelligence a more difficult opponent for Wikipedia - it may be more harmful and difficult to compete with.

Wikipedia's own flaws and solutions

Of course, Wikipedia is far from perfect: for one thing, 80% of the 40,000 active English-language editors are men, and 75% are white American men, which leads to some bias in Wikipedia's content in terms of gender and race.

Secondly, the credibility of Wikipedia articles is not stable: Amy Bruckman, a professor at the Georgia Institute of Technology, believes that on Wikipedia, the quality of a long article that has been edited by thousands of people is quite guaranteed, while some short articles are likely to be wrong or even completely garbage.

© Wikipedia

This requires editors to engage in a protracted battle against fallacies: experienced ones will edit articles to include content that lacks factual basis or cannot be verified; in addition, the editorial code requires content editors to maintain "NPVO" - or "Neutral Point of View".

Problems and solutions with AI tools

In contrast, for intelligent chatbots, the path to truth is even more difficult and dangerous [2]: ChatGPT will fabricate facts at will and randomly cite non-existent literature (the term is called "hallucination" false information); it will oversimplify a complex fact, such as analyzing the Russia-Ukraine war; it will also give random medical advice...

In April this year, Stanford scientists tested four search engines with built-in AI tools: Bing Chat, NeevaAI, perplexity AI and YouAI, and found that only about half of the answers they generated could stand up to factual verification[3].

© MobileSyrup

Why is this? The reason is simple: the goal of a chatbot is not to seek absolute truth or accuracy, but to try to produce reasonable responses given the context and probability[4]. This selection may be based on statistics and language models, and therefore may not be 100% accurate.

Shouldn't the accuracy of the answer be the primary goal of companies that develop and train intelligent chatbots? For the public, this is almost an unquestionable question. However, according to computer scientist and former Google researcher Margaret Mitchell, in the current stage of fierce commercial competition, companies are more concerned about launching their AI products to the public as soon as possible than being authentic and reliable. (By the way, Mitchell was fired for criticizing Google's research and development direction in this field.)

However, Mitchell also believes that the future is bright. She has seen significant improvements in the accuracy of models trained with high-quality information. However, the current data training method for AI products is "hands-off", that is, as much information as possible is fed to the model, regardless of whether it is good or bad. The assumption is that the more information is input, the higher the quality of the output information; rather than the other way around - input all high-quality information to get high-quality information.

In addition, market competition also helps intelligent chatbots improve themselves. For example, OpenAI has partnerships with many commercial companies that pay great attention to the accuracy of their answers. In addition, the artificial intelligence system developed by Google has maintained close cooperation with experts in the medical field to explore disease diagnosis and treatment.

© The Cymes

Compared with previous versions, ChatGPT 4 has made significant progress in providing answers involving "factual content", but it still has a long way to go before it can accurately answer complex, multi-faceted historical questions. For such intelligent chatbots, there is always a tension between accuracy and creativity and fluency. The goal of development is not just to enable them to "regurgitate" received knowledge, but to see through the patterns of knowledge and tell users in plain language.

The current status of cooperation between the two

At the end of June, the reporter tried out the plug-in developed by the Wikimedia Foundation for ChatGPT.

ChatGPT4 currently has all the knowledge up to the end of its training: September 2021; and this plug-in allows it to access all the information so far: this allows users to enjoy the convenience of both tools at the same time: the knowledge from Wikipedia is accurate and up-to-date, and the intelligent chatbot can output it in fluent and natural language. At the same time, ChatGPT will also list the source of the information - the Wikipedia page.

Wikipedia is also internalizing some AI models to better help new users or assist editors. However, at present, the Wikipedia community is still quite resistant to articles edited entirely by AI, and editors are also very worried that in the face of powerful, sleepless opponents and AI that can generate massive amounts of content instantly, the efforts made by human editors in content review will be in vain and will eventually fail.

Judging from the current situation, any move that stands in opposition to artificial intelligence is irrational. A very likely scenario is that organizations like Wikipedia must strive to adapt to the future created by artificial intelligence in order to survive, rather than trying to influence it or even stop it.

© Analytics Insight

Of course, many scholars and Wikipedia editors interviewed also believe that the road to AI dominance will not be easy and will face many obstacles:

The first is social: the European Parliament is currently working on a series of laws and regulations to regulate the use of artificial intelligence products: for example, forcing technology companies to indicate content generated by artificial intelligence; to disclose the data used to train artificial intelligence; and to indicate the source of information, and not to use other websites or database resources without authorization.

The second is technical. In fact, the article has already emphasized at the beginning that without the massive data provided by Wikipedia and Reddit communities, large language models cannot be trained at all. However, artificial intelligence research and development companies are fully aware of the importance of these databases, which gives Wikipedia and other websites some bargaining chips.

In addition, at the end of May this year, some AI researchers also co-authored a paper[5] to explore whether new AI systems can develop themselves solely based on the knowledge generated by AI models, without using any human-generated databases for training. As a result, the researchers found that this would lead to a systemic collapse called "model collapse": using AI-synthesized data could lead to confusion because they may be inaccurate or untrue, which in turn would have a negative impact on the training data sets of the next generation of models, causing them to have a biased understanding of the real world.

The Wikipedia plug-in can prevent this from happening, but if in the future Wikipedia is filled with articles generated by artificial intelligence, the same problem will arise: the new generation of language models will fall into a state of self-contradictory circular reasoning.

Ultimately, this study proves that the value of data generated by real-person interactions is immeasurable for the development of future large language models, which is exciting news for Wikipedia editors. At least for a while, artificial intelligence still needs us, humans, to make it credible and useful.

However, this involves a theoretical concept called "alignment," which assumes that artificial intelligence is in the best interests of humanity. Ensuring that artificial intelligence and humans are on the same side is both a huge challenge and the primary task of developing artificial intelligence.

The advantage of real people is that human nature makes humans naturally equipped with some conditions for forming alliances: for example, some people are willing to share high-quality educational resources, which just happens to meet the needs of others. The author finally interviewed Jade, an English editor of Wikipedia, who mentioned that knowledge sharing is her life creed: she spends 10-20 hours a week editing Wikipedia.

Currently, she is working on editing an entry about the American Civil War, which has been read more than 4.84 million times a year. Her goal is to continue to improve the article until it receives Wikipedia's "Featured" certification - this is an extremely rare certification. Only 0.1% of the content in the English version of Wikipedia is eligible for this recognition.

Finally, the reporter asked Jade whether she thought artificial intelligence would completely replace her job. Jade replied that she was an optimist and believed that robots would not completely replace humans in editing Wikipedia, at least in this century.

However, the reporter himself is not so sure. After all, based on his own experience chatting with ChatGPT, although the artificial intelligence is not perfect enough in terms of accuracy and details of information exchange, the experience of human-computer interaction is enough to attract him. Everything is so easy.

(The original text has been edited)

References:

[1]ojs.aaai.org/index.php/ICWSM/article/view/14883/14733

[2]www.nytimes.com/2023/05/01/business/ai-chatbots-hallucination.html

[3]arxiv.org/pdf/2304.09848.pdf

[4]www.nytimes.com/2022/04/15/magazine/ai-language.html

[5]arxiv.org/pdf/2305.17493.pdf

By Jon Gertner

Compiled by Pumpkin King

Proofreading/tim

Original article/www.nytimes.com/2023/07/18/magazine/wikipedia-ai-chatgpt.html

This article is based on the Creative Commons License (BY-NC) and is published by Jon Gertner on Leviathan

The article only reflects the author's views and does not necessarily represent the position of Leviathan

<<:  The flu is raging, why are we so easily infected?

>>:  Cold Clothes Festival丨The heat is high in July, and clothes are given out in September. Please put on more clothes when the weather is cold!

Recommend

6 tips to improve ROI of information flow advertising!

Everyone is familiar with information flow advert...

How to choose a Douyin account? How to invest ROI?

This article is the first step. If you want to kn...

Why does Ice and Snow World only collect ice from the Songhua River?

Review expert: Zhu Guangsi, member of Beijing Sci...

Would you still love the new generation of Jeep Compass if it looks like this?

Recently, Jeep officially released the official p...

How to deal with muscle soreness after exercise?

Exercise is the cure for everything. Exercise is ...

Great news! A salt-tolerant rapeseed has emerged? | Expo Daily

my country has made a breakthrough in developing ...

5 basic elements of event operation!

The event details actually test the event planner...

25 little-known facts about community operations. How many do you know?

With the popularity of WeChat , there are more an...

2019 traffic growth inventory!

2019 has not been an easy year, so today we will ...