How much of your knowledge of psychology is reliable?

Don't trust any single research result at face value. The best attitude is: it's interesting, keep it in mind and take a look later.

What are you looking at?

See how the research is done, see the progress and overall picture of the research - that is exactly what high-quality popular science should strive to provide.

Written by Xiang Ruiyang (Master of Psychology, Free University of Amsterdam)

You may often see some popular science about psychology in various books and media, introducing some research findings.

For example, an article says that keeping warm and drinking more hot water will improve your interpersonal relationships, because a study shows that physical warmth increases interpersonal warmth, and compared to holding a cup of cold coffee, holding a cup of hot coffee will make you have a more positive evaluation of strangers.

Another article says that when designing a questionnaire, it is best to put the signature at the beginning of the questionnaire rather than at the end, because a study shows that this can increase the honesty of the respondents' answers.

There is also an article saying that children should be allowed to listen to Mozart's music more often, and should even be listened to as prenatal education starting from pregnancy, because a study shows that listening to Mozart's music can improve people's cognitive abilities.

…

Figure 1. The business and management book "Influence", which has been a bestseller for decades, is the social psychology enlightenment for many people.

We tend to believe these research results and apply them to our lives if we are willing. After all, isn’t psychology a science? Aren’t the researchers experts? Aren’t these studies peer-reviewed and published in internationally renowned journals?

However, in recent years, researchers have increasingly found that psychology or broader social science research is often not repeatable. In other words, some phenomena found in these people at this time and place disappear in those people at that time and place! Reproducibility is an important feature of science. These phenomena that cannot be repeated are not real scientific effects.

If the unreplicable studies were isolated cases, it wouldn’t be a big deal. Unfortunately, a large-scale replication study in 2015 found that less than 40% of psychology studies could be successfully replicated! [1] More than half of the research results are unreliable. This is the “reproducibility crisis” that has been widely discussed in the psychology community over the past decade.

The reproducibility crisis

Reproducibility first entered the field of psychology researchers' field of vision in 2011. Two major events happened in the field of psychology that year:

Diederik Stapel, a famous Dutch social psychologist, was found to have falsified data. All of his "famous" discoveries were false, and 58 of his published articles were retracted.
Daryl Bem, a famous American social psychologist, published a series of empirical studies on extrasensory perception, also known as the sixth sense (predicting the future) in the top social psychology journal Journal of Personality and Social Psychology (JPSP) at the age of 73, after enjoying great fame for most of his life. The series reported nine experiments to illustrate the existence of the extrasensory perception phenomenon. [2]
The editors and reviewers of JPSP believed that Bem's research met all the requirements of the psychology community at the time for research methods, but his research results were related to a highly controversial proposition that was considered pseudoscience.

Once the study was published, it sparked heated debate. Critics replicated Bem’s experiment but failed to replicate the significant results. This failed replication study was also published in JPSP a year later. [3]

Since the research method recognized by the top psychology journals has produced such controversial findings, is it possible that other published studies may also be unreliable? Since then, the psychology community has gradually reflected on its research practices, and more and more researchers have conducted replication studies.

The most representative of these is the Open Science Collaboration led by Brian Nosek, a psychologist at the University of Virginia. In 2015, the Open Science Collaboration conducted its first large-scale replication study, replicating 100 studies published in the three top psychology journals: Personality and Social Psychology (JPSP), Journal of Experimental Psychology (JEP), and Psychological Science (PS). It was found that only 36% of the studies were successfully replicated.

To repeat, only about one-third of the research published in the top psychology journals can be successfully replicated! Social psychology is particularly hard hit, and the replication rate of cognitive psychology research, which is generally considered to be more "hardcore", is only about 50%.

Figure 2. Reproducibility of articles in top psychology journals. (Source |
nobaproject.com/modules/the-replication-crisis-in-psychology)

What does it mean to fail to repeat successfully?

It is important to note that the failure of a study to replicate does not necessarily mean that the effect does not exist.

There are four possible reasons for the failure of research replication:

1. There are problems with the original research, and the researchers have tampered with it, such as tampering with the data;

2. The results of the original study are coincidental. The most common situation is that the number of subjects in the original study is too small, which is equivalent to drawing a small sample. It is easy to draw a coincidental sample that does not represent the population.

3. The results obtained from the original study are true, but they only apply to the group of subjects who participated in the study at that time and place, and not to the group of subjects in the repeated study.

4. The results obtained in the original study are real, and there are problems in the repeated study, such as the experimental process failed to completely replicate the original study.

Failures in replication due to the first three reasons may indicate that the results of the original research are unreliable.

The fourth situation is certainly possible, so a single repeated study may not be able to explain the problem, and it is necessary to repeat and verify the results of a study. However, the proportion of the fourth situation will not be very high, which shows that a large number of psychological research results are indeed unreliable.

So psychology is unreliable?

Does such a low reproducibility rate declare the failure of psychology as a science?

In fact, replication failure is very common in the scientific community. A famous example is the room-temperature nuclear fusion mystery in 1989, in which two scientists claimed to have achieved sustained nuclear fusion at room temperature, but other scientists failed to replicate their research. This exciting and important discovery failed to enter the hall of science and continues to be controversial.

In the medical field, especially in the field of gene-disease relationships, there is also a serious problem of replication failure. Only about 4% of the research results on the relationship between genes and diseases have been successfully replicated. In the past, researchers generally believed that there were genes related to depression, but in 2019, researchers at the University of Colorado conducted a big data study and found no data to support the so-called "depression gene." Thousands of studies over more than 20 years suddenly lost their foundation. [4]

The emergence of the reproducibility crisis precisely shows that psychology and social sciences are on the road to becoming a hard science, but there are some problems within the discipline that need to be solved.

The problem is actually very simple: journals encourage the publication of original research and discourage the publication of repeated research. Therefore, most studies are exploratory. Researchers initially discover a phenomenon, publish it immediately, and then treat this phenomenon as a real effect without testing or repeating it. While everyone is chasing to publish new research and discover new phenomena, a large number of published results are from coincidence, or only apply to specific groups of people, or even obtained by tampering with data. Based on these effects that do not really exist, a large number of subsequent new studies have become castles in the air without foundation.

Solve problems when they are discovered. The reproducibility crisis did not crush psychology, but prompted researchers to adjust and improve their research practices, attach importance to repeated studies, and journals began to encourage the publication of repeated studies. Only then did everyone realize that more and more classic studies, even those written into psychology textbooks, were replicated, and many of them failed to be successfully replicated.

Follow the latest replication results

As more and more replication studies are conducted, even psychology professors and researchers find it difficult to keep track of all the latest replication results. In order to help more people understand the progress of replication studies, a group of psychologists formed the Framework for Open and Reproducible Research Training (FORRT). They compiled hundreds of replications of psychological effects. It has not yet been completed (it will be fully completed in 2024), but it has already reached a certain scale, and their summary results can be seen on the website [5].

Figure 3. Open and Reproducible Research Training Framework, screenshot of the website homepage

FORRT currently lists more than 130 psychological effects that have been tested by repeated studies, covering various branches of psychology such as social psychology, positive psychology, cognitive psychology, developmental psychology, marketing, and neuroscience.

For each effect, FORRT lists the original literature, critical literature (including replication studies, reviews, meta-analyses, etc.), and the effect sizes of the original and replication studies, and gives a label: replicated (successfully replicated), not replicated (failed to replicate successfully, and some even had reversed effects), or mixed (partial replication was successful, and some replication failed).

It should be noted that because we are still in the data collection stage and have not been reviewed, some of the effects on the website are labeled incorrectly. However, you can draw your own conclusions by referring to the listed literature.

Of the more than 130 effects, less than 20 were successfully replicated, more than 40 were marked as mixed, and nearly 70 were not successfully replicated. If we assume that mixed is partially successfully replicated, then the sum of replicated and mixed is less than 50%, which shows that there are indeed many effects that cannot be replicated.

Successfully repeated "top students"

Let’s first look at which effects are “excellent students” and are marked as replicated. The more well-known ones are:

Prosocial spending: Spending money on others produces a greater sense of well-being than spending money on yourself.

Minimum group effect: When subjects are assigned to meaningless groups (such as the group that gets heads when flipping a coin, or the group that prefers red rather than blue), they will also prefer members of their own group.

Dunning-Kruger Effect: People with limited knowledge or ability in a certain area tend to overestimate their knowledge or ability and become overconfident. However, it should be noted that the widely circulated "mountain of ignorance" and "valley of despair" pictures are not the content of the Dunning-Kruger Effect itself and have not been carefully examined.

Loss aversion: When people are faced with the same amount of gains and losses, the negative utility brought by the losses is greater than the positive utility brought by the gains.

Exposure effect: Repeated exposure to the same thing will make people evaluate it more highly.

Bystander effect: When other people are present, responsibility is diffused and each person is less likely to lend a helping hand to someone in need.

Above- and below-average effects: When people compare themselves to others, they tend to overestimate where they stand in the crowd with respect to easier abilities and underestimate their abilities with respect to more difficult abilities.

Famous negative example

Unfortunately, there are some well-known effects that have not been successfully replicated:

Pygmalion effect (also known as Rosenthal effect, expectation effect): In Rosenthal's 1966 study, researchers randomly selected some students and told the teacher that these students performed best in the IQ test and had the most potential. The research report stated that because of the prediction, the teachers treated these students differently, resulting in an average increase of 3.8 in the IQ of these students, and the effect would become more and more obvious over time. However, subsequent studies have found that the impact of teacher expectations does exist, but it is much smaller than Rosenthal reported, and the impact is temporary and does not accumulate over time.

Power poses: A 2010 study found that striking a power pose—an extended, open posture (like with your hands on your hips)—increases testosterone levels and decreases cortisol levels in the body, making people feel more confident and powerful. This well-known embodied cognitive effect has not been successfully replicated.
Facial feedback: A 1988 study found that posing a smile (such as holding a pen sideways in your mouth) made people feel happy, while pouting made them feel depressed. This embodied cognition effect has also not been successfully replicated.

Ego depletion: In his 1998 study, the famous psychologist Roy Baumeister proposed that self-control is a limited resource, and inhibiting a thought, emotion, or behavior consumes self-control. After doing such an inhibition task, the subjects will experience self-control depletion, persist for a shorter time in subsequent tasks, and perform worse. Due to Baumeister's controversial statement (failure of repetition is because the people doing the repetition study are not competent), this effect has been repeatedly tested many times, and it was finally found that the depletion task lasting a few minutes in the laboratory cannot really make people self-deplete.

The unconscious mind advantage: A 2006 study found that when making complex decisions that require many factors to be considered, not thinking too much often leads to better decisions. This phenomenon did not appear in a replication study.
The three research findings mentioned at the beginning of this article: physical warmth improves interpersonal warmth, signing first improves honesty, and the Mozart effect, have all failed to be successfully replicated. Among them, the original study on the location of the signature was withdrawn due to data fraud. [6]

In addition, some effects that we have taken as common sense are marked as mixed, which at least shows that the importance and influence of these effects are overestimated:

Growth mindset: In a 1995 study, renowned psychologist Carol Dweck first proposed that a growth mindset—the belief that abilities can be improved rather than fixed—can help people perform better on tasks. In the field of education, many studies have shown that a growth mindset can help students achieve better results. The best-selling book "Lifelong Growth" was written based on this. However, repeated studies generally find that the impact of a growth mindset is not significant.

Nudge: Richard Thaler, winner of the 2017 Nobel Prize in Economics, proposed the concept of nudge in his 2008 study, which is to influence people's behavior and decision-making through positive reinforcement and indirect suggestions, rather than using direct education, coercion or punishment. A well-known example is the printing of a fly in each men's urinal at Amsterdam Schiphol Airport in the Netherlands, which is much more effective than the slogan "One Small Step Forward". However, the nudge effect has been questioned by meta-analysis in recent years, and even if the nudge effect does exist, the effect size is very small.

Figure 4. Although these two effects may not be significant, these two books may still have helped you.

Same type, different fate

Interestingly, there are several closely related effects in the same field, some of which have been successfully replicated and some of which have not.

For example, the “scarcity effect” includes a series of effects, all of which refer to a series of tendencies that people with limited real or imagined resources (wealth, time, etc.) will exhibit, including:

Time discounting: A sudden drop in income can make people prefer a low reward that is immediately available rather than a high reward that is available over time.

Physical pain: Financial insecurity can cause people to feel more physical pain;

Over-borrowing: The perceived lack of money can lead people to over-borrow;

Material goods preference: poor people prefer material goods over experiential goods;

Happiness: Imagining that the time spent in a city is coming to an end will increase people's happiness;

Conscious thinking: Poor people tend to have more thoughts related to financial worries than rich people do;

Competition/Threat: The merchant's hunger marketing will make consumers perceive other consumers as threatening competitors;
Product use creativity: resource scarcity will make people use products more creatively;

Preference polarization: Perceived scarcity can cause people to have preference polarization, that is, they prefer one option more strongly and reject other options;

…

Replication studies, such as the large-scale replication of 20 studies published in 2021 by researchers such as Michael O'Donnell, found that among the scarcity effects listed above, 1 to 4 were successfully replicated, while 5 to 9 were not successfully replicated. [7]

It can be seen that even in the same field, the results of strongly correlated studies may be either reliable or unreliable. We need to look at each research conclusion critically.

What attitude should we have towards psychology?

A large number of psychological research results cannot be successfully replicated. What should we do in the face of this reality?

Should we discard psychology as a worn-out shoe, never believe in it again, and stop considering it as a science?

As mentioned earlier, the reproducibility crisis has not crushed psychology. Researchers are actively changing their research practices. On the one hand, they are paying attention to repeated studies and meta-analysis to check the reliability of previous studies; on the other hand, they are improving the reliability of new studies by encouraging pre-registration (i.e., registering research methods and expected results in detail before the start of the study to prevent researchers from manipulating the data) and increasing sample size (to improve statistical power).

But it will take time for the entire discipline to move towards more rigorous science. At present, we still have difficulty in knowing which psychological knowledge is reliable and which is not. In this case, we need to have critical thinking.

Perhaps it can be said that the results of any single study cannot be trusted completely. For those eye-catching titles and eye-catching research results, the best attitude is: it’s quite interesting, remember it first, and then take a look.

To look at research results critically, we first need to have a basic understanding of the research. If a study says that asking subjects to think about the elderly will make them move more slowly, then we need to understand what kind of people the subjects are (perhaps American college students in a culture that discriminates against the elderly), and how "asking subjects to think about the elderly" makes them think about it (that is, the specific research method), and then we can judge whether the results based on these subjects are applicable to us, and whether the operations in the study have reference value for real life.

More importantly, we need to see the progress and overall picture of the research: What results have similar studies and repeated studies achieved? How do other researchers view this research? (For example, the "Thinking of the Elderly" study above has not been successfully replicated.)

Of course, the above two points put forward higher requirements for popular science writers. High-quality popular science is not just about conveying sporadic research findings, but needs to present the overall picture of the research, and even better, it should present the overall picture of the research progress on a topic.

The way popular science writers work may need to change: instead of conveying a point of view and looking for research that supports this point of view to convince readers, they should sort out the progress and context of the research on a topic and then compile the most supported point of view.

All of the above sounds very serious and tiring. Rigorousness is a pursuit of scientific attitude, but for a subject like psychology, perhaps some ambiguity can be left.

Human nature and human mind are inherently extremely complex. We hope to reveal the objective laws of psychology as much as possible, but we should not expect simple theories and superficial effects to explain everything, and we should not expect these theories and effects to apply to everyone. Seeing the limitations of psychology and acknowledging the subtlety and complexity of human nature may also be a romantic thing.

My personal opinion is that we should learn as much psychological knowledge and research findings as possible in a critical manner, but we don't have to be critical and deny everything about some best-selling books and popular science articles that do not seem to be rigorous enough. Whether to believe it or not is a personal matter, and sometimes it may work if you believe it.

References

[1] Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349.

[2] Bem, DJ (2011). Feeling the future: experimental evidence for anomalous retroactive influences on cognition and affect. Journal of Personality and Social Psychology, 100(3), 407-425.

[3] Galak, J., LeBoeuf, RA, Nelson, LD, & Simmons, JP (2012). Correcting the past: Failures to replicate psi. Journal of Personality and Social Psychology, 103(6), 933–948.

[4] Border, R., Johnson, EC, Evans, LM, Smolen, A., Berley, N., Sullivan, PF, & Keller, MC (2019). No support for historical candidate gene or candidate gene-by-interaction hypotheses for major depression across multiple large samples. American Journal of Psychiatry, 176(5), 376-387.

[5] https://forrt.org/reversals/

[6] Retraction for Shu et al., Signing at the beginning makes ethics salient and decreases dishonest self-reports in comparison to signing at the end. https://www.pnas.org/doi/10.1073/pnas.2115397118

[7] O'Donnell, M., Dev, AS, Antonoplis, S., Baum, SM, Benedetti, AH, Brown, ND, ... & Nelson, LD (2021). Empirical audit and review and an assessment of evidentiary value in research on the psychological consequences of scarcity. Proceedings of the national academy of sciences, 118(44), e2103313118.

This article is supported by the Science Popularization China Starry Sky Project

Produced by: China Association for Science and Technology Department of Science Popularization

Producer: China Science and Technology Press Co., Ltd., Beijing Zhongke Xinghe Culture Media Co., Ltd.

Special Tips

1. Go to the "Featured Column" at the bottom of the menu of the "Fanpu" WeChat public account to read a series of popular science articles on different topics.

2. Fanpu provides a function to search articles by month. Follow the official account and reply with the four-digit year + month, such as "1903", to get the article index for March 2019, and so on.

Copyright statement: Personal forwarding is welcome. Any form of media or organization is not allowed to reprint or excerpt without authorization. For reprint authorization, please contact the backstage of the "Fanpu" WeChat public account.

<<: Let’s go and see the Grand Canal!

>>: With more than 300 patents, what is the difference between China's small reactor "Linglong No. 1" and traditional nuclear energy?

A Star Trek-like legend! Samples from the asteroid Bennu were successfully returned, or could they reveal the origin of life?

Blog

Don’t know how to develop VR games? Unity 5.3 official VR tutorial is released - Series 2

"3 Tieba Money-Making Projects" 4 practical techniques for Tieba traffic diversion, nanny-level super detailed practical exercises to add 20-100 fans every day!

Training course video content introduction: Are y...

He made nine calculations for one value, and his life is the history of the development of China's nuclear weapons｜Time Letter

Special Project of Beijing Science Center Origina...

How much of your knowledge of psychology is reliable?

A Star Trek-like legend! Samples from the asteroid Bennu were successfully returned, or could they reveal the origin of life?

Don’t know how to develop VR games? Unity 5.3 official VR tutorial is released - Series 2

Zhihu promotion operation skills practical strategy

Changman | Writing papers on the land of the motherland, this academician always maintains his "farmer" character

Brand marketing strategy in 2021: Planting grass is not a panacea!

Attention! WeChat keyboard starts second round of internal testing to prevent eavesdropping and protect privacy

"Sky City" on the Qinghai-Tibet Plateau

When a "stink bomb" weighing dozens of tons appears on the beach...

2018 Borgward BX5 real shot at Guangzhou Auto Show: Can it overtake BBA by parking it opposite the Land Rover booth?

Is “unplugging the plug can save electricity” a “pseudoscience”? “Rhythm” is important →

Recommend

French students design radiation-proof underwear to prevent smartphones from killing sperm

How to create private domain traffic and become a KOC?

Why can’t unopened beverages be drunk after being soaked in flood water?

Researchers: Android phones collect 20 times more user data than comparable iPhones

The lurking supporter Yao Mei "Main lifting and detonation actual combat special training camp" 1st issue

"3 Tieba Money-Making Projects" 4 practical techniques for Tieba traffic diversion, nanny-level super detailed practical exercises to add 20-100 fans every day!

Introduction to placing Wenchang Tower for people born in the Year of the Rooster in 2020

Primary and advanced operations from the perspective of a rookie!

Confirmed! Apple will completely ban 32-bit applications, 200,000 apps will be removed from the shelves

Why are foldable phones so popular but so little effective?

Apple Watch iPhone companion app revealed

He made nine calculations for one value, and his life is the history of the development of China's nuclear weapons｜Time Letter

Color TV enterprises have to fight a tough battle in the post-entry hardware era

Ten thousand words of practical knowledge | The methodology for 10 times user growth is all here

1 model + 5 steps to increase e-commerce conversion rate 10 times