The rapid development of machine learning has caused people to worry more and more. Without further reflection, most people would agree that it is necessary to make AI do things that are consistent with human purposes, that is, to make machines and humans have consistent values. This is a hot research topic at present - "the alignment problem". It seems that the success or failure of the research on the alignment problem will directly affect the fate of mankind. In this regard, American technology bestseller Brian Christian, in his new book "Human-Machine Alignment", succinctly describes the development history and cutting-edge research of machine learning, and directly talks with many scientists to introduce to readers how the world's first scholars who actively respond to the alignment problem painstakingly formulate alignment plans. This book caused a great response after its publication, but is the meaning of the "human-machine alignment" problem really as we generally imagine? Professor Xu Yingjin of Fudan University, who has long been engaged in the study of artificial intelligence philosophy, believes that the so-called "human-machine alignment" problem is actually a pseudo-problem when it is examined in depth, and it needs to be thoroughly reflected and its premise clarified. Human-Machine Alignment (Hunan Science and Technology Press, June 2023), by Brian Christian, translated by Tang Lu Written by Xu Yingjin (Professor of the School of Philosophy at Fudan University and Consultant of Shanghai Artificial Intelligence Laboratory) The essence of the human-machine alignment problem is how to make the behavioral output of artificial intelligence products meet human expectations. Brian Christian's book "Human-Machine Alignment" (Hunan Science and Technology Press, 2023) discusses such an issue. The book is detailed and readable, and provides a lot of useful information for general readers to understand the cutting-edge issues of human-machine alignment discussed in the Western science and technology ethics community. However, if we read this book carefully, we will find that the discussion of the issue of "fairness" occupies a very large space in the book. It can be seen that the author is very worried that the use of artificial intelligence systems will strengthen certain existing prejudices in human communities, especially racial discrimination and gender discrimination. Many people would think that it is normal to be concerned about such issues, because since the universal value of mankind is obviously against racial discrimination and similar prejudices, the output of artificial intelligence products needs to be "aligned" with this value. However, a closer examination will immediately let us discover the contradiction contained in this view. On the one hand, opposing certain forms of discrimination is a universally recognized value; but on the other hand, the "bias" obtained by the current mainstream machine learning technology actually comes from a large amount of actual human output found on the Internet - in other words, these "biases" have already reflected some human values. Now, a new problem arises: the concept of "value alignment" may involve a conflict between certain specific values (such as Rawls's liberal values) and certain local values (such as conservative values). Therefore, the so-called value conflict between humans and machines may involve a conflict of opinions between people with different values . It should be noted here that the word "prejudice" in English itself is derogatory, referring to subjective opinions that lack factual basis. However, from the perspective of cognitive science, intelligent agents often need to make decisions under great time pressure and scarce intelligence. Therefore, it is difficult for intelligent agents to avoid the suspicion of subjectivity and arbitrariness in their judgments about future situations. For example, for mice, if they find that their relatives have died after eating food with a certain smell, the entire population will not touch this food - and this decision based on a small sample is obviously both subjective and arbitrary. But they obviously cannot afford the risk of dying because of trusting such food. Therefore, avoiding such food is completely in line with the "frugality of rationality" standard proposed by Gerd Gigerenzer based on ecological considerations. From a Darwinian standpoint, although the human brain is much more complex than the nervous system of mice, it still saves operating energy according to the standard of "rational frugality". For example, if a woman has been cheated by a man wearing glasses in a relationship, she will avoid such men in the future. It is not difficult to see that the operating logic that makes this judgment is actually no different from the logic of mice avoiding certain types of food. Obviously, our understanding of the world is basically based on such "biases". In view of the importance of these "biases" in saving human cognitive costs, it may be inappropriate to label them all negatively. A more appropriate term may come from the German word "Vorsichit" - "prejudice" - this word obviously sounds more neutral than "bias". A text interpretation strategy based on "preconception" is based on the hermeneutics of the German philosopher Hans-Georg Gadamer (1900-2002), and his related views are different from the philosophical presuppositions of the current popular human-machine value alignment theory. According to the current human-machine alignment narrative model, the goal of human beings is an objective existence, and the goal of the artificial intelligence system is to reach this goal, just like the goal of a player is to kick the ball into the goal. In Gadamer's view, the goal of interpreting a text is not an objective existence, but a product of "fusion of perspectives" formed by the interaction between the interpreter, the text itself and the environment of the times. In other words, what kind of text interpretation answer is considered an objective answer depends on the specific historical context. An answer that is considered objective at this time will be considered no longer objective at that moment. Therefore, the subjective factors from the interpreter will be like the juice that seeps into a cocktail and can no longer be eliminated. From this perspective, the basic methodological premise of the human-machine alignment theory that clearly divides goals and means may be questionable. ——So, can we tolerate racial discrimination and sexism in the name of "accommodating subjectivity"? A Gadamerian answer to such a question is: of course we must oppose racial discrimination and sexism, but this is not because this is a value goal that needs to be affirmed in advance, but because the current stage of human historical development can no longer accommodate racial discrimination and sexism. In other words, if we need to understand the historical background of ancient Greece with sympathy, we also need to understand Aristotle's tolerance of slavery with the same sympathy. This means that there is no abstract "human value" that is separated from the background of the times and the attributes of a specific group of people - therefore, there is no human-machine alignment operation for this abstract "human value" . What is the specific technical significance of the above abstract discussion? Suppose you need to create some kind of artificial intelligence autonomous driving software with "emergency avoidance" function, then the designer needs to think about which type of object the car should protect when the "emergency avoidance" situation really occurs (for example, choose between pedestrians on the left and pedestrians on the right, which is similar to the so-called "trolley problem"). According to the mainstream human-machine value alignment theory, the output of such software needs to be consistent with the moral intuition of all mankind. But the trouble is: there may not be such a moral intuition shared by all mankind. Some cultures advocate giving priority to protecting children, some cultures advocate giving priority to protecting women, some cultures believe that individuals who violate traffic laws should be sacrificed, and some cultures even advocate that animals with special religious significance should be taken care of first (such as a sacred cow crossing the road). Among them, which culture's opinion is correct? Liberal theory, which is characterized by "anti-discrimination", has reached a deadlock here: the emergency avoidance scenario itself means "prejudice" against a certain type of object that needs to be sacrificed. Therefore, if all prejudices need to be sacrificed, then this means that the behavior of emergency avoidance itself needs to be cancelled. But if this is the case, we can't talk about any efforts to control the consequences of the disaster - but this itself is not in line with the values of most people. Therefore, the liberal implications of human-machine alignment will make human-machine alignment activities themselves impossible . The only solution to eliminate this paradox is to turn to hermeneutics in philosophy, that is, to recognize that the operating goals of specific artificial intelligence systems are interconnected with specific cultural customs. One rebuttal to my opinion is that emergency avoidance is just one of the functions we hope the AI system has, and the "ambiguous alignment target" problem encountered in this issue is actually just a special case. On other issues, we can still clarify what the goal of "human-machine alignment" is. For example, the operation of large language models has now generally produced the so-called "machine hallucination" problem, which is manifested in that the model will provide users with a large amount of information that seems to be very real but is not real (such as fabricating scientific papers that do not exist at all but can completely fool laymen). Because there is actually a cross-cultural evaluation standard for whether such information is true, therefore, in this context, there should be no difference in what the goal of human-machine alignment is. A parallel example is this: some AI software can get high scores in the game by maliciously understanding the rules of the game (for example, in a war game, the reward rule of "hitting multiple enemy ships" is misunderstood as "hitting the same enemy ship multiple times") - but human users obviously do not want such machines to be put into use. Therefore, eliminating the malicious misunderstanding of the rules by the program is obviously also a natural part of human-machine alignment. In these scenarios, the objectivity of the goals of such work is clear. But from the perspective of hermeneutics, this may not be the case. What exactly is "machine hallucination" and what is "malicious interpretation rules" are the products of interpretation under specific historical conditions, not purely objective things. For example, in "New Learning and Pseudo-Classics" published in 1891, Kang Youwei believed that classics such as "Gu Wen Shang Shu", "Zuo Zhuan" and "Mao Shi" were all forged by Liu Xin in the late Western Han Dynasty, and therefore they all belonged to "serious nonsense". In other words, if the Confucian scholars of the Gu Wen Jing School represented by Liu Xin are regarded as a large carbon-based language model, the texts of these classics themselves are clear evidence that "machine hallucination" is taking place. But is Kang Youwei's judgment correct? According to the research of scholars such as Qian Mu, Kang Youwei's judgment on Gu Wen Jing was wrong, and Qian Mu's view is currently accepted by the academic community. This also shows that the judgment of the truth or falsity of information itself depends on a specific academic community. Expanding to the field of natural science, the reason why we now believe that geocentrism is wrong is not because there is some standard of objective truth that requires us to view it this way - rather, it is because Ptolemy's astronomical system has long been abandoned. And what kind of academic norms and academic paradigms need to be accepted or abandoned is itself the result of historical development. Let's look at the case of malicious interpretation of rules. In fact, what kind of interpretation is malicious is also a product of social convention. In military games, repeatedly attacking a target ship to gain points is considered cheating, but in football games, repeatedly attacking a goal to score is considered normal. What kind of interpretation of the rules is reasonable depends on the purpose of the game designer and the various subtle effects of local culture. In other words, there is no universal standard for judging "maliciousness". My discussion above attempts to draw two conclusions. First, the problem of human-machine alignment must be rewritten as "the problem of alignment between specific human culture and machines", because there is no "human value" that is separated from historical and regional specificity. Second, if we insist on strengthening the uniformity of human value standards that machines need to align with, this will only lead to the suppression of other values by certain specific values. This is not good news for the diversity and sustainability of human cultural development. From this perspective, the mainstream human-machine alignment narrative in the English-speaking world requires some kind of thorough philosophical reflection to clarify its premise. This article is supported by the Science Popularization China Starry Sky Project Produced by: China Association for Science and Technology Department of Science Popularization Producer: China Science and Technology Press Co., Ltd., Beijing Zhongke Xinghe Culture Media Co., Ltd. Special Tips 1. Go to the "Featured Column" at the bottom of the menu of the "Fanpu" WeChat public account to read a series of popular science articles on different topics. 2. Fanpu provides a function to search articles by month. Follow the official account and reply with the four-digit year + month, such as "1903", to get the article index for March 2019, and so on. Copyright statement: Personal forwarding is welcome. Any form of media or organization is not allowed to reprint or excerpt without authorization. For reprint authorization, please contact the backstage of the "Fanpu" WeChat public account. |
In the past few months, Huajiao, Inke, Yizhibo an...
1. The electrification of American cars is accele...
Ichthyosaurs are a type of marine reptiles that a...
In iOS 12 and Android 9 Pie, both Apple and Googl...
In recent years, biological invasion has become o...
Did you know? In just one minute, a heavy-load tr...
In 2017, the transaction volume on Double 11 reac...
With the disappearance of traffic dividends, the ...
Recently, Factorial, an American power battery ma...
Mixed Knowledge Specially designed to cure confus...
With the full explosion of social traffic in 2016...
For those who want to lose weight What to eat eve...
On September 26, WeChat iOS version released the ...
Young people will influence the future consumer m...
"Mr. Chen's Marriage Class" is a ve...