• Bipolar Disorder
  • Therapy Center
  • When To See a Therapist
  • Types of Therapy
  • Best Online Therapy
  • Best Couples Therapy
  • Best Family Therapy
  • Managing Stress
  • Sleep and Dreaming
  • Understanding Emotions
  • Self-Improvement
  • Healthy Relationships
  • Student Resources
  • Personality Types
  • Sweepstakes
  • Guided Meditations
  • Verywell Mind Insights
  • 2024 Verywell Mind 25
  • Mental Health in the Classroom
  • Editorial Process
  • Meet Our Review Board
  • Crisis Support

What Is Replication in Psychology Research?

Examples of replication in psychology.

  • Why Replication Matters
  • How It Works

What If Replication Fails?

  • The Replication Crisis

How Replication Can Be Strengthened

Replication refers to the repetition of a research study, generally with different situations and subjects, to determine if the basic findings of the original study can be applied to other participants and circumstances.

In other words, when researchers replicate a study, it means they reproduce the experiment to see if they can obtain the same outcomes.

Once a study has been conducted, researchers might be interested in determining if the results hold true in other settings or for other populations. In other cases, scientists may want to replicate the experiment to further demonstrate the results.

At a Glance

In psychology, replication is defined as reproducing a study to see if you get the same results. It's an important part of the research process that strengthens our understanding of human behavior. It's not always a perfect process, however, and extraneous variables and other factors can interfere with results.

For example, imagine that health psychologists perform an experiment showing that hypnosis can be effective in helping middle-aged smokers kick their nicotine habit. Other researchers might want to replicate the same study with younger smokers to see if they reach the same result.

Exact replication is not always possible. Ethical standards may prevent modern researchers from replicating studies that were conducted in the past, such as Stanley Milgram's infamous obedience experiments .

That doesn't mean that researchers don't perform replications; it just means they have to adapt their methods and procedures. For example, researchers have replicated Milgram's study using lower shock thresholds and improved informed consent and debriefing procedures.

Why Replication Is Important in Psychology

When studies are replicated and achieve the same or similar results as the original study, it gives greater validity to the findings. If a researcher can replicate a study’s results, it is more likely that those results can be generalized to the larger population.

Human behavior can be inconsistent and difficult to study. Even when researchers are cautious about their methods, extraneous variables can still create bias and affect results. 

That's why replication is so essential in psychology. It strengthens findings, helps detect potential problems, and improves our understanding of human behavior.

How Do Scientists Replicate an Experiment?

When conducting a study or experiment , it is essential to have clearly defined operational definitions. In other words, what is the study attempting to measure?

When replicating earlier researchers, experimenters will follow the same procedures but with a different group of participants. If the researcher obtains the same or similar results in follow-up experiments, it means that the original results are less likely to be a fluke.

The steps involved in replicating a psychology experiment often include the following:

  • Review the original experiment : The goal of replication is to use the exact methods and procedures the researchers used in the original experiment. Reviewing the original study to learn more about the hypothesis, participants, techniques, and methodology is important.
  • Conduct a literature review : Review the existing literature on the subject, including any other replications or previous research. Considering these findings can provide insights into your own research.
  • Perform the experiment : The next step is to conduct the experiment. During this step, keeping your conditions as close as possible to the original experiment is essential. This includes how you select participants, the equipment you use, and the procedures you follow as you collect your data.
  • Analyze the data : As you analyze the data from your experiment, you can better understand how your results compare to the original results.
  • Communicate the results : Finally, you will document your processes and communicate your findings. This is typically done by writing a paper for publication in a professional psychology journal. Be sure to carefully describe your procedures and methods, describe your findings, and discuss how your results compare to the original research.

So what happens if the original results cannot be reproduced? Does that mean that the experimenters conducted bad research or that, even worse, they lied or fabricated their data?

In many cases, non-replicated research is caused by differences in the participants or in other extraneous variables that might influence the results of an experiment. Sometimes the differences might not be immediately clear, but other researchers might be able to discern which variables could have impacted the results.

For example, minor differences in things like the way questions are presented, the weather, or even the time of day the study is conducted might have an unexpected impact on the results of an experiment. Researchers might strive to perfectly reproduce the original study, but variations are expected and often impossible to avoid.

Are the Results of Psychology Experiments Hard to Replicate?

In 2015, a group of 271 researchers published the results of their five-year effort to replicate 100 different experimental studies previously published in three top psychology journals. The replicators worked closely with the original researchers of each study in order to replicate the experiments as closely as possible.

The results were less than stellar. Of the 100 experiments in question, 61% could not be replicated with the original results. Of the original studies, 97% of the findings were deemed statistically significant. Only 36% of the replicated studies were able to obtain statistically significant results.

As one might expect, these dismal findings caused quite a stir. You may have heard this referred to as the "'replication crisis' in psychology.

Similar replication attempts have produced similar results. Another study published in 2018 replicated 21 social and behavioral science studies. In these studies, the researchers were only able to successfully reproduce the original results about 62% of the time.

So why are psychology results so difficult to replicate? Writing for The Guardian , John Ioannidis suggested that there are a number of reasons why this might happen, including competition for research funds and the powerful pressure to obtain significant results. There is little incentive to retest, so many results obtained purely by chance are simply accepted without further research or scrutiny.

The American Psychological Association suggests that the problem stems partly from the research culture. Academic journals are more likely to publish novel, innovative studies rather than replication research, creating less of an incentive to conduct that type of research.

Reasons Why Research Cannot Be Replicated

The project authors suggest that there are three potential reasons why the original findings could not be replicated.  

  • The original results were a false positive.
  • The replicated results were a false negative.
  • Both studies were correct but differed due to unknown differences in experimental conditions or methodologies.

The Nobel Prize-winning psychologist Daniel Kahneman has suggested that because published studies are often too vague in describing methods used, replications should involve the authors of the original studies to more carefully mirror the methods and procedures used in the original research.

In fact, one investigation found that replication rates are much higher when original researchers are involved.

While some might be tempted to look at the results of such replication projects and assume that psychology is more art than science, many suggest that such findings actually help make psychology a stronger science. Human thought and behavior is a remarkably subtle and ever-changing subject to study.

In other words, it's normal and expected for variations to exist when observing diverse populations and participants.

Some research findings might be wrong, but digging deeper, pointing out the flaws, and designing better experiments helps strengthen the field. The APA notes that replication research represents a great opportunity for students. it can help strengthen research skills and contribute to science in a meaningful way.

Nosek BA, Errington TM. What is replication ?  PLoS Biol . 2020;18(3):e3000691. doi:10.1371/journal.pbio.3000691

Burger JM. Replicating Milgram: Would people still obey today ?  Am Psychol . 2009;64(1):1-11. doi:10.1037/a0010932

Makel MC, Plucker JA, Hegarty B. Replications in psychology research: How often do they really occur? Perspectives on Psychological Science . 2012;7(6):537-542. doi:10.1177/1745691612460688

Aarts AA, Anderson JE, Anderson CJ, et al. Estimating the reproducibility of psychological science . Science. 2015;349(6251). doi:10.1126/science.aac4716

Camerer CF, Dreber A, Holzmeister F, et al. Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015 . Nat Hum Behav . 2018;2(9):637-644. doi:10.1038/s41562-018-0399-z

American Psychological Association. Learning into the replication crisis: Why you should consider conducting replication research .

Kahneman D. A new etiquette for replication . Social Psychology. 2014;45(4):310-311.

By Kendra Cherry, MSEd Kendra Cherry, MS, is a psychosocial rehabilitation specialist, psychology educator, and author of the "Everything Psychology Book."

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

Replicability, Robustness, and Reproducibility in Psychological Science

Affiliations.

  • 1 Department of Psychology, University of Virginia, Charlottesville, Virginia 22904, USA; email: [email protected].
  • 2 Center for Open Science, Charlottesville, Virginia 22903, USA.
  • 3 Department of Psychology, University of Amsterdam, 1012 ZA Amsterdam, The Netherlands.
  • 4 Addiction Research Center, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA.
  • 5 Department of Psychology, University of California, Davis, California 95616, USA.
  • 6 Psychology Department, Grand Valley State University, Allendale, Michigan 49401, USA.
  • 7 Department of Economics, Stockholm School of Economics, 113 83 Stockholm, Sweden.
  • 8 School of Biosciences, University of Melbourne, Parkville VIC 3010, Australia.
  • 9 Department of Psychology, Illinois State University, Normal, Illinois 61790, USA.
  • 10 Meta-Research Center, Tilburg University, 5037 AB Tilburg, The Netherlands.
  • 11 Department of Psychology, Leipzig University, 04109 Leipzig, Germany.
  • 12 Department of Theoretical Philosophy, University of Groningen, 9712 CP Groningen, The Netherlands.
  • 13 Department of Industrial Engineering and Innovation Sciences, Eindhoven University of Technology, 5612 AZ Eindhoven, The Netherlands.
  • 14 University of Colorado Anschutz Medical Campus, Aurora, Colorado 80045, USA.
  • 15 Department of Psychology, Ludwig Maximilian University of Munich, 80539 Munich, Germany.
  • 16 School of Psychological Sciences, University of Melbourne, Parkville VIC 3052, Australia.
  • PMID: 34665669
  • DOI: 10.1146/annurev-psych-020821-114157

Replication-an important, uncommon, and misunderstood practice-is gaining appreciation in psychology. Achieving replicability is important for making research progress. If findings are not replicable, then prediction and theory development are stifled. If findings are replicable, then interrogation of their meaning and validity can advance knowledge. Assessing replicability can be productive for generating and testing hypotheses by actively confronting current understandings to identify weaknesses and spur innovation. For psychology, the 2010s might be characterized as a decade of active confrontation. Systematic and multi-site replication projects assessed current understandings and observed surprising failures to replicate many published findings. Replication efforts highlighted sociocultural challenges such as disincentives to conduct replications and a tendency to frame replication as a personal attack rather than a healthy scientific practice, and they raised awareness that replication contributes to self-correction. Nevertheless, innovation in doing and understanding replication and its cousins, reproducibility and robustness, has positioned psychology to improve research practices and accelerate progress.

Keywords: generalizability; metascience; replication; reproducibility; research methods; robustness; statistical inference; theory; validity.

PubMed Disclaimer

Similar articles

  • Eleven years of student replication projects provide evidence on the correlates of replicability in psychology. Boyce V, Mathur M, Frank MC. Boyce V, et al. R Soc Open Sci. 2023 Nov 8;10(11):231240. doi: 10.1098/rsos.231240. eCollection 2023 Nov. R Soc Open Sci. 2023. PMID: 38026006 Free PMC article.
  • Psychology, Science, and Knowledge Construction: Broadening Perspectives from the Replication Crisis. Shrout PE, Rodgers JL. Shrout PE, et al. Annu Rev Psychol. 2018 Jan 4;69:487-510. doi: 10.1146/annurev-psych-122216-011845. Annu Rev Psychol. 2018. PMID: 29300688 Review.
  • A discipline-wide investigation of the replicability of Psychology papers over the past two decades. Youyou W, Yang Y, Uzzi B. Youyou W, et al. Proc Natl Acad Sci U S A. 2023 Feb 7;120(6):e2208863120. doi: 10.1073/pnas.2208863120. Epub 2023 Jan 30. Proc Natl Acad Sci U S A. 2023. PMID: 36716367 Free PMC article.
  • What is replication? Nosek BA, Errington TM. Nosek BA, et al. PLoS Biol. 2020 Mar 27;18(3):e3000691. doi: 10.1371/journal.pbio.3000691. eCollection 2020 Mar. PLoS Biol. 2020. PMID: 32218571 Free PMC article.
  • Addressing the theory crisis in psychology. Oberauer K, Lewandowsky S. Oberauer K, et al. Psychon Bull Rev. 2019 Oct;26(5):1596-1618. doi: 10.3758/s13423-019-01645-2. Psychon Bull Rev. 2019. PMID: 31515732 Review.
  • A third kind of episodic memory: Context familiarity is a distinct process from item familiarity and recollection. Addante RJ, Clise E, Waechter R, Bengson J, Drane DL, Perez-Caban J. Addante RJ, et al. bioRxiv [Preprint]. 2024 Jul 18:2024.07.15.603640. doi: 10.1101/2024.07.15.603640. bioRxiv. 2024. PMID: 39071285 Free PMC article. Preprint.
  • Scoping review and evidence mapping of interventions aimed at improving reproducible and replicable science: Protocol. Dudda LA, Kozula M, Ross-Hellauer T, Kormann E, Spijker R, DeVito N, Gopalakrishna G, Van den Eynden V, Onghena P, Naudet F, Banzi R, Fratelli M, Varga M, Gelsleichter YA, Stegeman I, Leeflang MM. Dudda LA, et al. Open Res Eur. 2024 Jul 10;3:179. doi: 10.12688/openreseurope.16567.2. eCollection 2023. Open Res Eur. 2024. PMID: 39036539 Free PMC article.
  • Approaching future rewards or waiting for them to arrive: Spatial representations of time and intertemporal choice. Fletcher D, Houghton R, Spence A. Fletcher D, et al. PLoS One. 2024 Apr 5;19(4):e0301781. doi: 10.1371/journal.pone.0301781. eCollection 2024. PLoS One. 2024. PMID: 38578791 Free PMC article. Clinical Trial.
  • Intrinsic functional connectivity strength of SuperAgers in the default mode and salience networks: Insights from ADNI. Keenan HE, Czippel A, Heydari S, Gawryluk JR, Mazerolle EL; Alzheimer's Disease Neuroimaging Initiative. Keenan HE, et al. Aging Brain. 2024 Mar 21;5:100114. doi: 10.1016/j.nbas.2024.100114. eCollection 2024. Aging Brain. 2024. PMID: 38550790 Free PMC article.
  • Raising awareness of uncertain choices in empirical data analysis: A teaching concept toward replicable research practices. Mandl MM, Hoffmann S, Bieringer S, Jacob AE, Kraft M, Lemster S, Boulesteix AL. Mandl MM, et al. PLoS Comput Biol. 2024 Mar 28;20(3):e1011936. doi: 10.1371/journal.pcbi.1011936. eCollection 2024 Mar. PLoS Comput Biol. 2024. PMID: 38547084 Free PMC article.

Publication types

  • Search in MeSH

LinkOut - more resources

Full text sources.

  • Ingenta plc
  • Ovid Technologies, Inc.
  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

The role of replication in psychological science

  • Paper in Philosophy of Science in Practice
  • Published: 08 January 2021
  • Volume 11 , article number  23 , ( 2021 )

Cite this article

why is it important for research to be replicated psychology

  • Samuel C. Fletcher   ORCID: orcid.org/0000-0002-9061-8976 1  

2045 Accesses

10 Citations

16 Altmetric

Explore all metrics

The replication or reproducibility crisis in psychological science has renewed attention to philosophical aspects of its methodology. I provide herein a new, functional account of the role of replication in a scientific discipline: to undercut the underdetermination of scientific hypotheses from data, typically by hypotheses that connect data with phenomena. These include hypotheses that concern sampling error, experimental control, and operationalization. How a scientific hypothesis could be underdetermined in one of these ways depends on a scientific discipline’s epistemic goals, theoretical development, material constraints, institutional context, and their interconnections. I illustrate how these apply to the case of psychological science. I then contrast this “bottom-up” account with “top-down” accounts, which assume that the role of replication in a particular science, such as psychology, must follow from a uniform role that it plays in science generally. Aside from avoiding unaddressed problems with top-down accounts, my bottom-up account also better explains the variability of importance of replication of various types across different scientific disciplines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

Similar content being viewed by others

why is it important for research to be replicated psychology

History of Replication Failures in Psychology

why is it important for research to be replicated psychology

Low replicability can support robust and efficient science

why is it important for research to be replicated psychology

Making our “meta-hypotheses” clear: heterogeneity and the role of direct replications in science

These related events include Daryl Bem’s use of techniques standard in psychology to show evidence for extra-sensory perception ( 2011 ), the revelations of high-profile scientific fraud by Diederik Stapel (Callaway 2011 ) and Marc Hauser (Carpenter 2012 ), and related replication failures involving prominent effects such as ego depletion (Hagger et al. 2016 ).

The quotation reads: “the scientifically significant physical effect may be defined as that which can be regularly reproduced by anyone who carries out the appropriate experiment in the way prescribed.” See also Popper ( 1959 , p. 45): “Only when certain events recur in accordance with rules or regularities, as in the case of repeatable experiments, can our observations be tested—in principle—by anyone. … Only by such repetition can we convince ourselves that we are not dealing with a mere isolated ‘coincidence,’ but with events which, on account of their regularity and reproducibility, are in principle inter-subjectively testable.” Zwaan et al. ( 2018 , pp. 1, 2, 4) also quote Dunlap ( 1926 ) (published earlier as Dunlap ( 1925 )) for the same point.

Schmidt ( 2009 , pp. 90–2), citing much the same passages of Popper ( 1959 , p. 45) as the others mentioned, also provides a similar explanation of replication’s importance, appealing to general virtues such as objectivity and reliability. (See the first paragraphs of Schmidt ( 2009 , p. 90; 2017 , p. 236) for especially clear statements, and Machery ( 2020 ) for an account of replication based on its ability to buttress reliability in particular.) But for him, that explanation only motivates why establishing a definition of replication is important in the first place; it plays no role in his definition itself. Thus, by drawing on Schmidt’s account of what replication is, I am not committing to his and others’ stated explanations of why is important.

For example, it is compatible with modifications or clarifications of how interpretation plays an essential role in determining what data models are or what they represent, either for Suppes’ hierarchy (Leonelli 2019 ) or Bogan and Woodward’s (Harris 2003 ). It is also compatible with interactions between the levels of data and phenomena (or experiment) in the course of a scientific investigation (Bailer-Jones 2009 , Ch. 7).

That’s not to say there is no interesting relationship between low-level underdetermination and the question of scientific realism, only that it much more indirect. See Laymon ( 1982 ) for a discussion thereof and Brewer and Chinn ( 1994 ) for historical examples from psychology as they bear on the motivation for theory change.

The first function, concerning mistakes in data analysis, does not appear in Schmidt ( 2009 , 2017 ). That said, neither he nor I claim that our lists are exhaustive, but they do seem to enumerate the most common types of low-level underdetermination that arise in the interpretation of the results of psychological studies. One type that occurs more often in the physical sciences concerns the accuracy, precision, and systematic error of an experiment or measurement technique; I hope in future work to address this other function in more detail. It would also be interesting to compare the present perspective to that of Feest ( 2019 ), who, focusing on the “epistemic uncertainty” regarding the third and sixth functions, arrives at a more pessimistic and limiting conclusion about the role of replication in psychological science.

For examples from economics, see Cartwright ( 1991 , pp. 145–6); for examples from gravitational and particle physics, see Franklin and Howson ( 1984 , pp. 56–8).

This is also analogous to the case of the demarcation problem, on which progress might be possible if one helps oneself to discipline-specific information (Hansson 2013 ).

Of course, there is a variety of quantitative and qualitative methods in psychological research, and qualitative methods are not always a good target for statistical analysis. But the question of whether the data are representative of the population of interest is important regardless of whether that data is quantitative or qualitative.

Meehl ( 1967 ) wanted to distinguish this lack of precise predictions from the situation in physics, but perhaps overstated his case: there are many experimental situations in physics in which theory predicts the existence of an effect determined by an unknown parameter, too. Meehl ( 1967 ) was absolutely right, though, that one cannot rest simply with evidence against a non-zero effect size; doing so abdicates responsibility to find just what the aforementioned patterns of human behavior and mental life are .

Online participant services such as Amazon Turk and other crowdsourced methods offer a potentially more diverse participant pool at a more modest cost (Uhlmann et al. 2019 ), but come with their own challenges.

“Big science” is a historiographical cluster concept referring to science with one or more of the following characteristics: large budgets, large staff sizes, large or particularly expensive equipment, and complex and expansive laboratories (Galison and Hevly 1992 ).

For secondary sources on MSRP, see Musgrave and Pigden ( 2016 , §§2.2, 3.4)

For more on this, see Musgrave and Pigden ( 2016 , §4).

In what follows, I use my own examples rather than Guttinger’s, with the exception of some overlap in discussion of Leonelli ( 2018 ).

Leonelli ( 2018 ) has argued that this possibility is realized in certain sciences that focus on qualitative data collection, but it is yet unclear whether this is really due to pragmatic limitations on the possibility of replications, rather than a lack of underdetermination, low-level or otherwise.

Bailer-Jones, D.M. (2009). Scientific models in philosophy of science . Pittsburgh: University of Pittsburgh Press.

Book   Google Scholar  

Baker, M. (2016). 1,500 scientists lift the lid on reproducibility. Nature , 533 (7604), 452–454.

Article   Google Scholar  

Begley, C.G., & Ellis, L.M. (2012). Raise standards for preclinical cancer research: drug development. Nature , 483 (7391), 531–533.

Bem, D.J. (2011). Feeling the future: experimental evidence for anomalous retroactive influences on cognition and affect. Journal of Personality and Social Psychology , 100 (3), 407.

Benjamin, D.J., Berger, J.O., Johannesson, M., Nosek, B.A., Wagenmakers, E.-J., Berk, R., Bollen, K.A., Brembs, B., Brown, L., Camerer, C., & et al. (2018). Redefine statistical significance. Nature Human Behaviour , 2 (1), 6.

Bird, A. (2018). Understanding the replication crisis as a base rate fallacy. The British Journal for the Philosophy of Science , forthcoming.

Bogen, J., & Woodward, J. (1988). Saving the phenomena. The Philosophical Review , 97 (3), 303–352.

Brewer, W.F., & Chinn, C.A. (1994). Scientists’ responses to anomalous data: Evidence from psychology, history, and philosophy of science. In PSA: Proceedings of the biennial meeting of the philosophy of science association , (Vol. 1 pp. 304–313): Philosophy of Science Association.

Button, K.S., Ioannidis, J.P., Mokrysz, C., Nosek, B.A., Flint, J., Robinson, E.S., & Munafò, M.R. (2013). Power failure: why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience , 14 (5), 365–376.

Callaway, E. (2011). Report finds massive fraud at Dutch universities. Nature , 479 (7371), 15.

Camerer, C.F., Dreber, A., Forsell, E., Ho, T.-H., Huber, J., Johannesson, M., Kirchler, M., Almenberg, J., Altmejd, A., Chan, T., & et al. (2016). Evaluating replicability of laboratory experiments in economics. Science , 351 (6280), 1433–1436.

Carpenter, S. (2012). Government sanctions Harvard psychologist. Science , 337 (6100), 1283–1283.

Cartwright, N. (1991). Replicability, reproducibility, and robustness: comments on Harry Collins. History of Political Economy , 23 (1), 143–155.

Chen, X. (1994). The rule of reproducibility and its applications in experiment appraisal. Synthese , 99 , 87–109.

Dunlap, K. (1925). The experimental methods of psychology. The Pedagogical Seminary and Journal of Genetic Psychology , 32 (3), 502–522.

Dunlap, K. (1926). The experimental methods of psychology. In Murchison, C. (Ed.) Psychologies of 1925: Powell lectures in psychological theory (pp. 331–351). Worcester: Clark University Press.

Feest, U. (2019). Why replication is overrated. Philosophy of Science , 86 (5), 895–905.

Feyerabend, P. (1970). Consolation for the specialist. In Lakatos, I., & Musgrave, A. (Eds.) Criticism and the growth of knowledge (pp. 197–230). Cambridge: Cambridge University Press.

Feyerabend, P. (1975). Against method . London: New Left Books.

Google Scholar  

Fidler, F., & Wilcox, J. (2018). Reproducibility of scientific results. In Zalta, E.N. (Ed.) The Stanford encyclopedia of philosophy. Metaphysics Research Lab, Stanford University, winter 2018 edition .

Franklin, A., & Howson, C. (1984). Why do scientists prefer to vary their experiments? Studies in History and Philosophy of Science Part A , 15 (1), 51–62.

Galison, P., & Hevly, B.W. (Eds.). (1992). Big science: the growth of large-scale research . Stanford: Stanford University Press.

Gelman, A. (2018). Don’t characterize replications as successes or failures. Behavioral and Brain Sciences , 41 , e128.

Gillies, D.A. (1971). A falsifying rule for probability statements. The British Journal for the Philosophy of Science , 22 (3), 231–261.

Gómez, O.S., Juristo, N., & Vegas, S. (2010). Replications types in experimental disciplines. In Proceedings of the 2010 ACM-IEEE international symposium on empirical software engineering and measurement, ESEM ’10 . New York: Association for Computing Machinery.

Greenwald, A.G., Pratkanis, A.R., Leippe, M.R., & Baumgardner, M.H. (1986). Under what conditions does theory obstruct research progress? Psychological Review , 93 (2), 216–229.

Guttinger, S. (2020). The limits of replicability. European Journal for Philosophy of Science , 10 (10), 1–17.

Hagger, M.S., Chatzisarantis, N.L., Alberts, H., Anggono, C.O., Batailler, C., Birt, A.R., Brand, R., Brandt, M.J., Brewer, G., Bruyneel, S., & et al. (2016). A multilab preregistered replication of the ego-depletion effect. Perspectives on Psychological Science , 11 (4), 546–573.

Hansson, S.O. (2013). Defining pseudoscience and science. In Pigliucci, M., & Boudry, M. (Eds.) Philosophy of pseudoscience: reconsidering the demarcation problem (pp. 61–77). Chicago: University of Chicago Press.

Harris, T. (2003). Data models and the acquisition and manipulation of data. Philosophy of Science , 70 (5), 1508–1517.

Lakatos, I. (1970). Falsification and the methodology of scientific research programmes. In Lakatos, I., & Musgrave, A. (Eds.) Criticism and the growth of knowledge (pp. 91–196). Cambridge: Cambridge University Press.

Lakens, D., Adolfi, F.G., Albers, C.J., Anvari, F., Apps, M.A., Argamon, S.E., Baguley, T., Becker, R.B., Benning, S.D., Bradford, D.E., & et al. (2018). Justify your alpha. Nature Human Behaviour , 2 (3), 168.

Laudan, L. (1983). The demise of the demarcation problem. In Cohan, R., & Laudan, L. (Eds.) Physics, philosophy, and psychoanalysis (pp. 111–127). Dordrecht: Reidel.

Lawrence, M.S., Stojanov, P., Polak, P., Kryukov, G.V., Cibulskis, K., Sivachenko, A., Carter, S.L., Stewart, C., Mermel, C.H., Roberts, S.A., & et al. (2013). Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature , 499 (7457), 214–218.

Laymon, R. (1982). Scientific realism and the hierarchical counterfactual path from data to theory. In PSA: Proceedings of the biennial meeting of the philosophy of science association , (Vol. 1 pp. 107–121): Philosophy of Science Association.

LeBel, E.P., Berger, D., Campbell, L., & Loving, T.J. (2017). Falsifiability is not optional. Journal of Personality and Social Psychology , 113 (2), 254–261.

Leonelli, S. (2018). Rethinking reproducibility as a criterion for research quality. In Boumans, M., & Chao, H.-K. (Eds.) Including a symposium on Mary Morgan: curiosity, imagination, and surprise, volume 36B of Research in the History of Economic Thought and Methodology (pp. 129–146): Emerald Publishing Ltd.

Leonelli, S. (2019). What distinguishes data from models? European Journal for Philosophy of Science , 9 (2), 22.

Machery, E. (2020). What is a replication? Philosophy of Science , forthcoming.

Meehl, P.E. (1967). Theory-testing in psychology and physics: a methodological paradox. Philosophy of Science , 34 (2), 103–115.

Meehl, P.E. (1990). Appraising and amending theories: the strategy of Lakatosian defense and two principles that warrant it. Psychological Inquiry , 1 (2), 108–141.

Musgrave, A., & Pigden, C. (2016). Imre Lakatos. In Zalta, E.N. (Ed.) The Stanford encyclopedia of philosophy. Metaphysics Research Lab, Stanford University, winter 2016 edition .

Muthukrishna, M., & Henrich, J. (2019). A problem in theory. Nature Human Behaviour , 3 (3), 221–229.

Norton, J.D. (2015). Replicability of experiment. THEORIA. Revista de Teoría Historia y Fundamentos de la Ciencia , 30 (2), 229–248.

Nosek, B.A., & Errington, T.M. (2017). Reproducibility in cancer biology: making sense of replications. Elife , 6 , e23383.

Nosek, B.A., & Errington, T.M. (2020). What is replication? PLoS Biology , 18 (3), e3000691.

Nuijten, M.B., Bakker, M., Maassen, E., & Wicherts, J.M. (2018). Verify original results through reanalysis before replicating. Behavioral and Brain Sciences , 41 , e143.

Open Science Collaboration (OSC). (2015). Estimating the reproducibility of psychological science. Science , 349 (6251), aac4716.

Popper, K.R. (1959). The logic of scientific discovery . Oxford: Routledge.

Radder, H. (1992). Experimental reproducibility and the experimenters’ regress. PSA: Proceedings of the biennial meeting of the philosophy of science association (Vol. 1 pp. 63–73). Philosophy of Science Association.

Rosenthal, R. (1990). Replication in behavioral research. In Neuliep, J.W. (Ed.) Handbook of replication research in the behavioral and social sciences, volume 5 of Journal of Social Behavior and Personality (pp. 1–30). Corte Madera: Select Press.

Schmidt, S. (2009). Shall we really do it again? The powerful concept of replication is neglected in the social sciences. Review of General Psychology , 13 (2), 90–100.

Schmidt, S. (2017). Replication. In Makel, M.C., & Plucker, J.A. (Eds.) Toward a more perfect psychology: improving trust, accuracy, and transparency in research (pp. 233–253): American Psychological Association.

Simons, D.J. (2014). The value of direct replication. Perspectives on Psychological Science , 9 (1), 76–80.

Simons, D.J., Shoda, Y., & Lindsay, D.S. (2017). Constraints on generality (COG): a proposed addition to all empirical papers. Perspectives on Psychological Science , 12 (6), 1123–1128.

Stanford, K. (2017). Underdetermination of scientific theory. In Zalta, E.N. (Ed.) The Stanford encyclopedia of philosophy. Metaphysics Research Lab, Stanford University, winter 2017 edition .

Suppes, P. (1962). Models of data. In Nagel, E., Suppes, P., & Tarski, A. (Eds.) Logic, methodology and philosophy of science: proceedings of the 1960 international congress (pp. 252–261). Stanford: Stanford University Press.

Suppes, P. (2007). Statistical concepts in philosophy of science. Synthese , 154 (3), 485–496.

Uhlmann, E.L., Ebersole, C.R., Chartier, C.R., Errington, T.M., Kidwell, M.C., Lai, C.K., McCarthy, R.J., Riegelman, A., Silberzahn, R., & Nosek, B.A. (2019). Scientific Utopia III: crowdsourcing science. Perspectives on Psychological Science , 14 (5), 711–733.

Zwaan, R.A., Etz, A., Lucas, R.E., & Donnellan, M.B. (2018). Making replication mainstream. Behavioral and Brain Sciences , 41 , e120.

Download references

Acknowledgments

Thanks to audiences in London (UK XPhi 2018), Burlington (Social Science Roundtable 2019), and Geneva (EPSA2019) for their comments on an earlier version, and especially to the Pitt Center for Philosophy of Science Reading Group in Spring 2020: Jean Baccelli, Andrew Buskell, Christian Feldbacher-Escamilla, Marie Gueguen, Paola Hernandez-Chavez, Edouard Machery, Adina Roskies, and Sander Verhaegh.

This research was partially supported by a Single Semester Leave from the University of Minnesota, and a Visiting Fellowship at the Center for Philosophy of Science at the University of Pittsburgh.

Author information

Authors and affiliations.

Department of Philosophy, University of Minnesota, Twin Cities, Minneapolis, MN, USA

Samuel C. Fletcher

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Samuel C. Fletcher .

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: EPSA2019: Selected papers from the biennial conference in Geneva

Guest Editors: Anouk Barberousse, Richard Dawid, Marcel Weber

Rights and permissions

Reprints and permissions

About this article

Fletcher, S.C. The role of replication in psychological science. Euro Jnl Phil Sci 11 , 23 (2021). https://doi.org/10.1007/s13194-020-00329-2

Download citation

Received : 16 June 2020

Accepted : 30 October 2020

Published : 08 January 2021

DOI : https://doi.org/10.1007/s13194-020-00329-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Replication
  • Underdetermination
  • Confirmation
  • Reproducibility

Advertisement

  • Find a journal
  • Publish with us
  • Track your research
  • A-Z Publications

Annual Review of Psychology

Volume 73, 2022, review article, replicability, robustness, and reproducibility in psychological science.

  • Brian A. Nosek 1,2 , Tom E. Hardwicke 3 , Hannah Moshontz 4 , Aurélien Allard 5 , Katherine S. Corker 6 , Anna Dreber 7 , Fiona Fidler 8 , Joe Hilgard 9 , Melissa Kline Struhl 2 , Michèle B. Nuijten 10 , Julia M. Rohrer 11 , Felipe Romero 12 , Anne M. Scheel 13 , Laura D. Scherer 14 , Felix D. Schönbrodt 15 , and Simine Vazire 16
  • View Affiliations Hide Affiliations Affiliations: 1 Department of Psychology, University of Virginia, Charlottesville, Virginia 22904, USA; email: [email protected] 2 Center for Open Science, Charlottesville, Virginia 22903, USA 3 Department of Psychology, University of Amsterdam, 1012 ZA Amsterdam, The Netherlands 4 Addiction Research Center, University of Wisconsin–Madison, Madison, Wisconsin 53706, USA 5 Department of Psychology, University of California, Davis, California 95616, USA 6 Psychology Department, Grand Valley State University, Allendale, Michigan 49401, USA 7 Department of Economics, Stockholm School of Economics, 113 83 Stockholm, Sweden 8 School of Biosciences, University of Melbourne, Parkville VIC 3010, Australia 9 Department of Psychology, Illinois State University, Normal, Illinois 61790, USA 10 Meta-Research Center, Tilburg University, 5037 AB Tilburg, The Netherlands 11 Department of Psychology, Leipzig University, 04109 Leipzig, Germany 12 Department of Theoretical Philosophy, University of Groningen, 9712 CP Groningen, The Netherlands 13 Department of Industrial Engineering and Innovation Sciences, Eindhoven University of Technology, 5612 AZ Eindhoven, The Netherlands 14 University of Colorado Anschutz Medical Campus, Aurora, Colorado 80045, USA 15 Department of Psychology, Ludwig Maximilian University of Munich, 80539 Munich, Germany 16 School of Psychological Sciences, University of Melbourne, Parkville VIC 3052, Australia
  • Vol. 73:719-748 (Volume publication date January 2022) https://doi.org/10.1146/annurev-psych-020821-114157
  • First published as a Review in Advance on October 19, 2021
  • Copyright © 2022 by Annual Reviews. All rights reserved

Replication—an important, uncommon, and misunderstood practice—is gaining appreciation in psychology. Achieving replicability is important for making research progress. If findings are not replicable, then prediction and theory development are stifled. If findings are replicable, then interrogation of their meaning and validity can advance knowledge. Assessing replicability can be productive for generating and testing hypotheses by actively confronting current understandings to identify weaknesses and spur innovation. For psychology, the 2010s might be characterized as a decade of active confrontation. Systematic and multi-site replication projects assessed current understandings and observed surprising failures to replicate many published findings. Replication efforts highlighted sociocultural challenges such as disincentives to conduct replications and a tendency to frame replication as a personal attack rather than a healthy scientific practice, and they raised awareness that replication contributes to self-correction. Nevertheless, innovation in doing and understanding replication and its cousins, reproducibility and robustness, has positioned psychology to improve research practices and accelerate progress.

Article metrics loading...

Full text loading...

Literature Cited

  • Alogna VK , Attaya MK , Aucoin P , Bahník Š , Birch S et al. 2014 . Registered Replication Report: Schooler and Engstler-Schooler (1990). Perspect. Psychol. Sci. 9 : 5 556– 78 [Google Scholar]
  • Altmejd A , Dreber A , Forsell E , Huber J , Imai T et al. 2019 . Predicting the replicability of social science lab experiments. PLOS ONE 14 : 12 e0225826 [Google Scholar]
  • Anderson CJ , Bahník Š , Barnett-Cowan M , Bosco FA , Chandler J et al. 2016 . Response to Comment on “Estimating the reproducibility of psychological science. Science 351 : 6277 1037 [Google Scholar]
  • Anderson MS , Martinson BC , De Vries R. 2007 . Normative dissonance in science: results from a national survey of U.S. scientists. J. Empir. Res. Hum. Res. Ethics 2 : 4 3– 14 [Google Scholar]
  • Appelbaum M , Cooper H , Kline RB , Mayo-Wilson E , Nezu AM , Rao SM. 2018 . Journal article reporting standards for quantitative research in psychology: the APA Publications and Communications Board task force report. Am. Psychol. 73 : 1 3 – 25 Corrigendum 2018 . Am. Psychol 73 : 7 947 [Google Scholar]
  • Armeni K , Brinkman L , Carlsson R , Eerland A , Fijten R et al. 2020 . Towards wide-scale adoption of open science practices: the role of open science communities. MetaArXiv, Oct. 6 https://doi.org/10.31222/osf.io/7gct9 [Crossref]
  • Artner R , Verliefde T , Steegen S , Gomes S , Traets F et al. 2020 . The reproducibility of statistical results in psychological research: an investigation using unpublished raw data. Psychol. Methods. In press. https://doi.org/10.1037/met0000365 [Crossref] [Google Scholar]
  • Baker M. 2016 . Dutch agency launches first grants programme dedicated to replication. Nat. News. https://doi.org/10.1038/nature.2016.20287 [Crossref] [Google Scholar]
  • Bakker M , van Dijk A , Wicherts JM. 2012 . The rules of the game called psychological science. Perspect. Psychol. Sci. 7 : 6 543– 54 [Google Scholar]
  • Bakker M , Wicherts JM. 2011 . The (mis)reporting of statistical results in psychology journals. Behav. Res. Methods 43 : 3 666– 78 [Google Scholar]
  • Baribault B , Donkin C , Little DR , Trueblood JS , Oravecz Z et al. 2018 . Metastudies for robust tests of theory. PNAS 115 : 11 2607– 12 [Google Scholar]
  • Baron J , Hershey JC. 1988 . Outcome bias in decision evaluation. J. Pers. Soc. Psychol. 54 : 4 569– 79 [Google Scholar]
  • Baumeister RF. 2016 . Charting the future of social psychology on stormy seas: winners, losers, and recommendations. J. Exp. Soc. Psychol. 66 : 153– 58 [Google Scholar]
  • Baumeister RF , Vohs KD. 2016 . Misguided effort with elusive implications. Perspect. Psychol. Sci. 11 : 4 574– 75 [Google Scholar]
  • Benjamin DJ , Berger JO , Johannesson M , Nosek BA , Wagenmakers E-J et al. 2018 . Redefine statistical significance. Nat. Hum. Behav. 2 : 1 6– 10 [Google Scholar]
  • Botvinik-Nezer R , Holzmeister F , Camerer CF , Dreber A , Huber J et al. 2020 . Variability in the analysis of a single neuroimaging dataset by many teams. Nature 582 : 7810 84– 88 [Google Scholar]
  • Bouwmeester S , Verkoeijen PPJL , Aczel B , Barbosa F , Bègue L et al. 2017 . Registered Replication Report: Rand, Greene, and Nowak (2012). Perspect. Psychol. Sci. 12 : 3 527– 42 [Google Scholar]
  • Brown NJL , Heathers JAJ. 2017 . The GRIM test: A simple technique detects numerous anomalies in the reporting of results in psychology. Soc. Psychol. Pers. Sci. 8 : 4 363– 69 [Google Scholar]
  • Button KS , Ioannidis JPA , Mokrysz C , Nosek BA , Flint J et al. 2013 . Power failure: why small sample size undermines the reliability of neuroscience. Nat. Rev. Neurosci. 14 : 5 365– 76 [Google Scholar]
  • Byers-Heinlein K , Bergmann C , Davies C , Frank M , Hamlin JK et al. 2020 . Building a collaborative psychological science: lessons learned from ManyBabies 1. Can. Psychol. Psychol. Can. 61 : 4 349– 63 [Google Scholar]
  • Camerer CF , Dreber A , Forsell E , Ho T-H , Huber J et al. 2016 . Evaluating replicability of laboratory experiments in economics. Science 351 : 6280 1433– 36 [Google Scholar]
  • Camerer CF , Dreber A , Holzmeister F , Ho T-H , Huber J et al. 2018 . Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015. Nat. Hum. Behav. 2 : 9 637– 44 [Google Scholar]
  • Carter EC , Schönbrodt FD , Gervais WM , Hilgard J. 2019 . Correcting for bias in psychology: a comparison of meta-analytic methods. Adv. Methods Pract. Psychol. Sci. 2 : 2 115– 44 [Google Scholar]
  • Cent. Open Sci 2020 . APA joins as new signatory to TOP guidelines. Center for Open Science Nov. 10. https://www.cos.io/about/news/apa-joins-as-new-signatory-to-top-guidelines [Google Scholar]
  • Cesario J. 2014 . Priming, replication, and the hardest science. Perspect. Psychol. Sci. 9 : 1 40– 48 [Google Scholar]
  • Chambers C. 2019 . What's next for Registered Reports?. Nature 573 : 7773 187– 89 [Google Scholar]
  • Cheung I , Campbell L , LeBel EP , Ackerman RA , Aykutoğlu B et al. 2016 . Registered Replication Report: Study 1 from Finkel, Rusbult, Kumashiro, & Hannon (2002). Perspect. Psychol. Sci. 11 : 5 750– 64 [Google Scholar]
  • Christensen G , Wang Z , Paluck EL , Swanson N , Birke DJ , Miguel E , Littman R. 2019 . Open science practices are on the rise: the State of Social Science (3S) Survey. MetaArXiv, Oct. 18. https://doi.org/10.31222/osf.io/5rksu [Crossref]
  • Christensen-Szalanski JJ , Willham CF. 1991 . The hindsight bias: a meta-analysis. Organ. Behav. Hum. Decis. Process. 48 : 1 147– 68 [Google Scholar]
  • Cohen J. 1962 . The statistical power of abnormal-social psychological research: a review. J. Abnorm. Soc. Psychol. 65 : 3 145– 53 [Google Scholar]
  • Cohen J. 1973 . Statistical power analysis and research results. Am. Educ. Res. J. 10 : 3 225– 29 [Google Scholar]
  • Cohen J. 1990 . Things I have learned (so far). Am. Psychol. 45 : 1304– 12 [Google Scholar]
  • Cohen J. 1992 . A power primer. Psychol. Bull. 112 : 1 155– 59 [Google Scholar]
  • Cohen J. 1994 . The earth is round (p < .05). Am. Psychol. 49 : 12 997– 1003 [Google Scholar]
  • Colling LJ , Szücs D , De Marco D , Cipora K , Ulrich R et al. 2020 . Registered Replication Report on Fischer, Castel, Dodd, and Pratt (2003). Adv. Methods Pract. Psychol. Sci 3 : 2 143– 62 [Google Scholar]
  • Cook FL. 2016 . Dear Colleague Letter: robust and reliable research in the social, behavioral, and economic sciences. National Science Foundation Sept. 20. https://www.nsf.gov/pubs/2016/nsf16137/nsf16137.jsp [Google Scholar]
  • Crandall CS , Sherman JW. 2016 . On the scientific superiority of conceptual replications for scientific progress. J. Exp. Soc. Psychol. 66 : 93– 99 [Google Scholar]
  • Crisp RJ , Miles E , Husnu S 2014 . Support for the replicability of imagined contact effects. Soc. Psychol. 45 : 4 303– 4 [Google Scholar]
  • Cronbach LJ , Meehl PE. 1955 . Construct validity in psychological tests. Psychol. Bull. 52 : 4 281– 302 [Google Scholar]
  • Dang J , Barker P , Baumert A , Bentvelzen M , Berkman E et al. 2021 . A multilab replication of the ego depletion effect. Soc. Psychol. Pers. Sci. 12 : 1 14– 24 [Google Scholar]
  • Devezer B , Nardin LG , Baumgaertner B , Buzbas EO. 2019 . Scientific discovery in a model-centric framework: reproducibility, innovation, and epistemic diversity. PLOS ONE 14 : 5 e0216125 [Google Scholar]
  • Dijksterhuis A. 2018 . Reflection on the professor-priming replication report. Perspect. Psychol. Sci. 13 : 2 295– 96 [Google Scholar]
  • Dreber A , Pfeiffer T , Almenberg J , Isaksson S , Wilson B et al. 2015 . Using prediction markets to estimate the reproducibility of scientific research. PNAS 112 : 50 15343– 47 [Google Scholar]
  • Duhem PMM. 1954 . The Aim and Structure of Physical Theory Princeton, NJ: Princeton Univ. Press [Google Scholar]
  • Ebersole CR , Alaei R , Atherton OE , Bernstein MJ , Brown M et al. 2017 . Observe, hypothesize, test, repeat: Luttrell, Petty and Xu (2017) demonstrate good science. J. Exp. Soc. Psychol. 69 : 184– 86 [Google Scholar]
  • Ebersole CR , Atherton OE , Belanger AL , Skulborstad HM , Allen JM et al. 2016a . Many Labs 3: evaluating participant pool quality across the academic semester via replication. J. Exp. Soc. Psychol. 67 : 68– 82 [Google Scholar]
  • Ebersole CR , Axt JR , Nosek BA 2016b . Scientists’ reputations are based on getting it right, not being right. PLOS Biol . 14 : 5 e1002460 [Google Scholar]
  • Ebersole CR , Mathur MB , Baranski E , Bart-Plange D-J , Buttrick NR et al. 2020 . Many Labs 5: testing pre-data-collection peer review as an intervention to increase replicability. Adv. Methods Pract. Psychol. Sci. 3 : 3 309– 31 [Google Scholar]
  • Eerland A , Sherrill AM , Magliano JP , Zwaan RA , Arnal JD et al. 2016 . Registered Replication Report: Hart & Albarracín (2011). Perspect. Psychol. Sci. 11 : 1 158– 71 [Google Scholar]
  • Ellemers N , Fiske ST , Abele AE , Koch A , Yzerbyt V. 2020 . Adversarial alignment enables competing models to engage in cooperative theory building toward cumulative science. PNAS 117 : 14 7561– 67 [Google Scholar]
  • Epskamp S , Nuijten MB. 2018 . Statcheck: extract statistics from articles and recompute p values. Statistical Software https://CRAN.R-project.org/package=statcheck [Google Scholar]
  • Errington TM , Denis A , Perfito N , Iorns E , Nosek BA 2021 . Challenges for assessing reproducibility and replicability in preclinical cancer biology. eLife In press [Google Scholar]
  • Etz A , Vandekerckhove J. 2016 . A Bayesian perspective on the reproducibility project: psychology. PLOS ONE 11 : 2 e0149794 [Google Scholar]
  • Fanelli D. 2010 .. “ Positive” results increase down the hierarchy of the sciences. PLOS ONE 5 : 4 e10068 [Google Scholar]
  • Fanelli D. 2012 . Negative results are disappearing from most disciplines and countries. Scientometrics 90 : 3 891– 904 [Google Scholar]
  • Feest U. 2019 . Why replication is overrated. Philos. Sci. 86 : 5 895– 905 [Google Scholar]
  • Ferguson MJ , Carter TJ , Hassin RR. 2014 . Commentary on the attempt to replicate the effect of the American flag on increased Republican attitudes. Soc. Psychol. 45 : 4 301– 2 [Google Scholar]
  • Fetterman AK , Sassenberg K. 2015 . The reputational consequences of failed replications and wrongness admission among scientists. PLOS ONE 10 : 12 e0143723 [Google Scholar]
  • Forsell E , Viganola D , Pfeiffer T , Almenberg J , Wilson B et al. 2019 . Predicting replication outcomes in the Many Labs 2 study. J. Econ. Psychol. 75 : 102117 [Google Scholar]
  • Franco A , Malhotra N , Simonovits G. 2014 . Publication bias in the social sciences: unlocking the file drawer. Science 345 : 6203 1502– 5 [Google Scholar]
  • Franco A , Malhotra N , Simonovits G. 2016 . Underreporting in psychology experiments: evidence from a study registry. Soc. Psychol. Pers. Sci. 7 : 1 8– 12 [Google Scholar]
  • Frank MC , Bergelson E , Bergmann C , Cristia A , Floccia C et al. 2017 . A collaborative approach to infant research: promoting reproducibility, best practices, and theory-building. Infancy 22 : 4 421– 35 [Google Scholar]
  • Funder DC , Ozer DJ. 2019 . Evaluating effect size in psychological research: sense and nonsense. Adv. Methods Pract. Psychol. Sci. 2 : 2 156– 68 [Google Scholar]
  • Gelman A , Carlin J. 2014 . Beyond power calculations: assessing type S (sign) and type M (magnitude) errors. Perspect. Psychol. Sci. 9 : 6 641– 51 [Google Scholar]
  • Gelman A , Loken E. 2013 . The garden of forking paths: why multiple comparisons can be a problem, even when there is no “fishing expedition” or “p-hacking” and the research hypothesis was posited ahead of time Work. Pap., Columbia Univ. New York: [Google Scholar]
  • Gergen KJ. 1973 . Social psychology as history. J. Pers. Soc. Psychol. 26 : 2 309– 20 [Google Scholar]
  • Gervais WM , Jewell JA , Najle MB , Ng BKL. 2015 . A powerful nudge? Presenting calculable consequences of underpowered research shifts incentives toward adequately powered designs. Soc. Psychol. Pers. Sci. 6 : 7 847– 54 [Google Scholar]
  • Ghelfi E , Christopherson CD , Urry HL , Lenne RL , Legate N et al. 2020 . Reexamining the effect of gustatory disgust on moral judgment: a multilab direct replication of Eskine, Kacinik, and Prinz (2011). Adv. Methods Pract. Psychol. Sci. 3 : 1 3– 23 [Google Scholar]
  • Gilbert DT , King G , Pettigrew S , Wilson TD 2016 . Comment on “Estimating the reproducibility of psychological science. Science 351 : 6277 1037 [Google Scholar]
  • Giner-Sorolla R. 2012 . Science or art? How aesthetic standards grease the way through the publication bottleneck but undermine science. Perspect. Psychol. Sci. 7 : 6 562– 71 [Google Scholar]
  • Giner-Sorolla R. 2019 . From crisis of evidence to a “crisis” of relevance? Incentive-based answers for social psychology's perennial relevance worries. Eur. Rev. Soc. Psychol. 30 : 1 1– 38 [Google Scholar]
  • Gollwitzer M. 2020 . DFG Priority Program SPP 2317 Proposal: A meta-scientific program to analyze and optimize replicability in the behavioral, social, and cognitive sciences (META-REP). PsychArchives, May 29. http://dx.doi.org/10.23668/psycharchives.3010 [Crossref]
  • Gordon M , Viganola D , Bishop M , Chen Y , Dreber A et al. 2020 . Are replication rates the same across academic fields? Community forecasts from the DARPA SCORE programme. R. Soc. Open Sci. 7 : 7 200566 [Google Scholar]
  • Götz M , O'Boyle EH , Gonzalez-Mulé E , Banks GC , Bollmann SS 2020 . The “Goldilocks Zone”: (Too) many confidence intervals in tests of mediation just exclude zero. Psychol. Bull. 147 : 1 95– 114 [Google Scholar]
  • Greenwald AG. 1975 . Consequences of prejudice against the null hypothesis. Psychol. Bull. 82 : 1 1– 20 [Google Scholar]
  • Hagger MS , Chatzisarantis NLD , Alberts H , Anggono CO , Batailler C et al. 2016 . A multilab preregistered replication of the ego-depletion effect. Perspect. Psychol. Sci. 11 : 4 546– 73 [Google Scholar]
  • Hanea AM , McBride MF , Burgman MA , Wintle BC , Fidler F et al. 2017 . I nvestigate D iscuss E stimate A ggregate for structured expert judgement. Int. J. Forecast. 33 : 1 267– 79 [Google Scholar]
  • Hardwicke TE , Bohn M , MacDonald KE , Hembacher E , Nuijten MB et al. 2021 . Analytic reproducibility in articles receiving open data badges at the journal Psychological Science : an observational study. R. Soc. Open Sci. 8 : 1 201494 [Google Scholar]
  • Hardwicke TE , Mathur MB , MacDonald K , Nilsonne G , Banks GC et al. 2018 . Data availability, reusability, and analytic reproducibility: evaluating the impact of a mandatory open data policy at the journal Cognition . R. Soc. Open Sci. 5 : 8 180448 [Google Scholar]
  • Hardwicke TE , Serghiou S , Janiaud P , Danchev V , Crüwell S et al. 2020a . Calibrating the scientific ecosystem through meta-research. Annu. Rev. Stat. Appl. 7 : 11– 37 [Google Scholar]
  • Hardwicke TE , Thibault RT , Kosie JE , Wallach JD , Kidwell M , Ioannidis J. 2020b . Estimating the prevalence of transparency and reproducibility-related research practices in psychology (2014–2017). MetaArXiv, Jan. 2. https://doi.org/10.31222/osf.io/9sz2y [Crossref]
  • Hedges LV , Schauer JM. 2019 . Statistical analyses for studying replication: meta-analytic perspectives. Psychol. Methods 24 : 5 557– 70 [Google Scholar]
  • Hoogeveen S , Sarafoglou A , Wagenmakers E-J. 2020 . Laypeople can predict which social-science studies will be replicated successfully. Adv. Methods Pract. Psychol. Sci. 3 : 3 267– 85 [Google Scholar]
  • Hughes BM. 2018 . Psychology in Crisis London: Palgrave Macmillan [Google Scholar]
  • Inbar Y. 2016 . Association between contextual dependence and replicability in psychology may be spurious. PNAS 113 : 34 E4933– 34 [Google Scholar]
  • Ioannidis JPA. 2005 . Why most published research findings are false. PLOS Med 2 : 8 e124 [Google Scholar]
  • Ioannidis JPA. 2008 . Why most discovered true associations are inflated. Epidemiology 19 : 5 640– 48 [Google Scholar]
  • Ioannidis JPA. 2014 . How to make more published research true. PLOS Med 11 : 10 e1001747 [Google Scholar]
  • Ioannidis JPA , Trikalinos TA. 2005 . Early extreme contradictory estimates may appear in published research: the Proteus phenomenon in molecular genetics research and randomized trials. J. Clin. Epidemiol. 58 : 6 543– 49 [Google Scholar]
  • Isager PM , van Aert RCM , Bahník Š , Brandt M , DeSoto KA et al. 2020 . Deciding what to replicate: A formal definition of “replication value” and a decision model for replication study selection. MetaArXiv, Sept. 2. https://doi.org/10.31222/osf.io/2gurz [Crossref]
  • John LK , Loewenstein G , Prelec D. 2012 . Measuring the prevalence of questionable research practices with incentives for truth telling. Psychol. Sci. 23 : 5 524– 32 [Google Scholar]
  • Kahneman D. 2003 . Experiences of collaborative research. Am. Psychol. 58 : 9 723– 30 [Google Scholar]
  • Kerr NL. 1998 . HARKing: Hypothesizing after the results are known. Pers. Soc. Psychol. Rev. 2 : 3 196– 217 [Google Scholar]
  • Kidwell MC , Lazarević LB , Baranski E , Hardwicke TE , Piechowski S et al. 2016 . Badges to acknowledge open practices: a simple, low-cost, effective method for increasing transparency. PLOS Biol 14 : 5 e1002456 [Google Scholar]
  • Klein RA , Cook CL , Ebersole CR , Vitiello C , Nosek BA et al. 2019 . Many Labs 4: failure to replicate mortality salience effect with and without original author involvement. PsyArXiv, Dec. 11. https://doi.org/10/ghwq2w [Crossref]
  • Klein RA , Ratliff KA , Vianello M , Adams RB , Bahník Š et al. 2014 . Investigating variation in replicability: a “many labs” replication project. Soc. Psychol. 45 : 3 142– 52 [Google Scholar]
  • Klein RA , Vianello M , Hasselman F , Adams BG , Adams RB et al. 2018 . Many Labs 2: investigating variation in replicability across samples and settings. Adv. Methods Pract. Psychol. Sci. 1 : 4 443– 90 [Google Scholar]
  • Kunda Z. 1990 . The case for motivated reasoning. Psychol. Bull. 108 : 3 480– 98 [Google Scholar]
  • Lakens D. 2019 . The value of preregistration for psychological science: a conceptual analysis. PsyArXiv, Nov. 18. https://doi.org/10.31234/osf.io/jbh4w [Crossref]
  • Lakens D , Adolfi FG , Albers CJ , Anvari F , Apps MA et al. 2018 . Justify your alpha. Nat. Hum. Behav. 2 : 3 168– 71 [Google Scholar]
  • Landy JF , Jia ML , Ding IL , Viganola D , Tierney W et al. 2020 . Crowdsourcing hypothesis tests: making transparent how design choices shape research results. Psychol. Bull. 146 : 5 451– 79 [Google Scholar]
  • Leary MR , Diebels KJ , Davisson EK , Jongman-Sereno KP , Isherwood JC et al. 2017 . Cognitive and interpersonal features of intellectual humility. Pers. Soc. Psychol. Bull. 43 : 6 793– 813 [Google Scholar]
  • LeBel EP , McCarthy RJ , Earp BD , Elson M , Vanpaemel W. 2018 . A unified framework to quantify the credibility of scientific findings. Adv. Methods Pract. Psychol. Sci. 1 : 3 389– 402 [Google Scholar]
  • Leighton DC , Legate N , LePine S , Anderson SF , Grahe J 2018 . Self-esteem, self-disclosure, self-expression, and connection on Facebook: a collaborative replication meta-analysis. Psi Chi J. Psychol. Res. 23 : 2 98– 109 [Google Scholar]
  • Leising D , Thielmann I , Glöckner A , Gärtner A , Schönbrodt F. 2020 . Ten steps toward a better personality science—how quality may be rewarded more in research evaluation. PsyArXiv, May 31. https://doi.org/10.31234/osf.io/6btc3 [Crossref]
  • Leonelli S 2018 . Rethinking reproducibility as a criterion for research quality. Research in the History of Economic Thought and Methodology 36 L Fiorito, S Scheall, CE Suprinyak 129– 46 Bingley, UK: Emerald [Google Scholar]
  • Lewandowsky S , Oberauer K. 2020 . Low replicability can support robust and efficient science. Nat. Commun. 11 : 1 358 [Google Scholar]
  • Maassen E , van Assen MALM , Nuijten MB , Olsson-Collentine A , Wicherts JM. 2020 . Reproducibility of individual effect sizes in meta-analyses in psychology. PLOS ONE 15 : 5 e0233107 [Google Scholar]
  • Machery E. 2020 . What is a replication?. Philos. Sci. 87 : 4 545 – 67 [Google Scholar]
  • ManyBabies Consort 2020 . Quantifying sources of variability in infancy research using the infant-directed-speech preference. Adv. Methods Pract. Psychol. Sci. 3 : 1 24– 52 [Google Scholar]
  • Marcus A , Oransky I 2018 . Meet the “data thugs” out to expose shoddy and questionable research. Science Feb. 18. https://www.sciencemag.org/news/2018/02/meet-data-thugs-out-expose-shoddy-and-questionable-research [Google Scholar]
  • Marcus A , Oransky I. 2020 . Tech firms hire “Red Teams.” Scientists should, too. WIRED July 16. https://www.wired.com/story/tech-firms-hire-red-teams-scientists-should-too/ [Google Scholar]
  • Mathur MB , VanderWeele TJ. 2020 . New statistical metrics for multisite replication projects. J. R. Stat. Soc. A 183 : 3 1145– 66 [Google Scholar]
  • Maxwell SE. 2004 . The persistence of underpowered studies in psychological research: causes, consequences, and remedies. Psychol. Methods 9 : 2 147– 63 [Google Scholar]
  • Maxwell SE , Lau MY , Howard GS. 2015 . Is psychology suffering from a replication crisis? What does “failure to replicate” really mean?. Am. Psychol. 70 : 6 487– 98 [Google Scholar]
  • Mayo DG. 2018 . Statistical Inference as Severe Testing Cambridge, UK: Cambridge Univ. Press [Google Scholar]
  • McCarthy R , Gervais W , Aczel B , Al-Kire R , Baraldo S et al. 2021 . A multi-site collaborative study of the hostile priming effect. Collabra Psychol 7 : 1 18738 [Google Scholar]
  • McCarthy RJ , Hartnett JL , Heider JD , Scherer CR , Wood SE et al. 2018 . An investigation of abstract construal on impression formation: a multi-lab replication of McCarthy and Skowronski (2011). Int. Rev. Soc. Psychol. 31 : 1 15 [Google Scholar]
  • Meehl PE. 1978 . Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. J. Consult. Clin. Psychol. 46 : 4 806– 34 [Google Scholar]
  • Meyer MN , Chabris C. 2014 . Why psychologists' food fight matters. Slate Magazine July 31. https://slate.com/technology/2014/07/replication-controversy-in-psychology-bullying-file-drawer-effect-blog-posts-repligate.html [Google Scholar]
  • Mischel W. 2008 . The toothbrush problem. APS Observer Dec. 1. https://www.psychologicalscience.org/observer/the-toothbrush-problem [Google Scholar]
  • Moran T , Hughes S , Hussey I , Vadillo MA , Olson MA et al. 2020 . Incidental attitude formation via the surveillance task: a Registered Replication Report of Olson and Fazio (2001). PsyArXiv, April 17. https://doi.org/10/ghwq2z [Crossref]
  • Moshontz H , Campbell L , Ebersole CR , IJzerman H , Urry HL et al. 2018 . The Psychological Science Accelerator: advancing psychology through a distributed collaborative network. Adv. Methods Pract. Psychol. Sci. 1 : 4 501– 15 [Google Scholar]
  • Munafò MR , Chambers CD , Collins AM , Fortunato L , Macleod MR. 2020 . Research culture and reproducibility. Trends Cogn. Sci. 24 : 2 91– 93 [Google Scholar]
  • Muthukrishna M , Henrich J. 2019 . A problem in theory. Nat. Hum. Behav. 3 : 3 221– 29 [Google Scholar]
  • Natl. Acad. Sci. Eng. Med 2019 . Reproducibility and Replicability in Science Washington, DC: Natl. Acad. Press [Google Scholar]
  • Nelson LD , Simmons J , Simonsohn U. 2018 . Psychology's renaissance. Annu. Rev. Psychol. 69 : 511– 34 [Google Scholar]
  • Nickerson RS. 1998 . Confirmation bias: a ubiquitous phenomenon in many guises. Rev. Gen. Psychol. 2 : 2 175– 220 [Google Scholar]
  • Nosek B. 2019a . Strategy for culture change. Center for Open Science June 11. https://www.cos.io/blog/strategy-for-culture-change [Google Scholar]
  • Nosek B. 2019b . The rise of open science in psychology, a preliminary report. Center for Open Science June 3. https://www.cos.io/blog/rise-open-science-psychology-preliminary-report [Google Scholar]
  • Nosek BA , Alter G , Banks GC , Borsboom D , Bowman SD et al. 2015 . Promoting an open research culture. Science 348 : 6242 1422– 25 [Google Scholar]
  • Nosek BA , Beck ED , Campbell L , Flake JK , Hardwicke TE et al. 2019 . Preregistration is hard, and worthwhile. Trends Cogn. Sci. 23 : 10 815– 18 [Google Scholar]
  • Nosek BA , Ebersole CR , DeHaven AC , Mellor DT. 2018 . The preregistration revolution. PNAS 115 : 11 2600– 6 [Google Scholar]
  • Nosek BA , Errington TM. 2020a . What is replication?. PLOS Biol 18 : 3 e3000691 [Google Scholar]
  • Nosek BA , Errington TM. 2020b . The best time to argue about what a replication means? Before you do it. Nature 583 : 7817 518– 20 [Google Scholar]
  • Nosek BA , Gilbert EA. 2017 . Mischaracterizing replication studies leads to erroneous conclusions. PsyArXiv, April 18. https://doi.org/10.31234/osf.io/nt4d3 [Crossref]
  • Nosek BA , Spies JR , Motyl M. 2012 . Scientific utopia: II. Restructuring incentives and practices to promote truth over publishability. Perspect. On Psychol. Sci. 7 : 6 615– 31 [Google Scholar]
  • Nuijten MB , Bakker M , Maassen E , Wicherts JM. 2018 . Verify original results through reanalysis before replicating. Behav. Brain Sci. 41 : e143 [Google Scholar]
  • Nuijten MB , Hartgerink CHJ , van Assen MALM , Epskamp S , Wicherts JM 2016 . The prevalence of statistical reporting errors in psychology (1985–2013). Behav. Res. Methods 48 : 4 1205– 26 [Google Scholar]
  • Nuijten MB , van Assen MA , Veldkamp CL , Wicherts JM. 2015 . The replication paradox: Combining studies can decrease accuracy of effect size estimates. Rev. Gen. Psychol. 19 : 2 172– 82 [Google Scholar]
  • O'Donnell M , Nelson LD , Ackermann E , Aczel B , Akhtar A et al. 2018 . Registered Replication Report: Dijksterhuis and van Knippenberg (1998). Perspect. Psychol. Sci. 13 : 2 268– 94 [Google Scholar]
  • Olsson-Collentine A , Wicherts JM , van Assen MALM. 2020 . Heterogeneity in direct replications in psychology and its association with effect size. Psychol. Bull. 146 : 10 922– 40 [Google Scholar]
  • Open Sci. Collab 2015 . Estimating the reproducibility of psychological science. Science 349 : 6251 aac4716 [Google Scholar]
  • Patil P , Peng RD , Leek JT. 2016 . What should researchers expect when they replicate studies? A statistical view of replicability in psychological science. Perspect. Psychol. Sci. 11 : 4 539– 44 [Google Scholar]
  • Pawel S , Held L. 2020 . Probabilistic forecasting of replication studies. PLOS ONE 15 : 4 e0231416 [Google Scholar]
  • Perugini M , Gallucci M , Costantini G. 2014 . Safeguard power as a protection against imprecise power estimates. Perspect. Psychol. Sci. 9 : 3 319– 32 [Google Scholar]
  • Protzko J , Krosnick J , Nelson LD , Nosek BA , Axt J et al. 2020 . High replicability of newly-discovered social-behavioral findings is achievable. PsyArXiv, Sept. 10. https://doi.org/10.31234/osf.io/n2a9x [Crossref]
  • Rogers EM. 2003 . Diffusion of Innovations New York: Free Press, 5th ed.. [Google Scholar]
  • Romero F. 2017 . Novelty versus replicability: virtues and vices in the reward system of science. Philos. Sci. 84 : 5 1031– 43 [Google Scholar]
  • Rosenthal R. 1979 . The file drawer problem and tolerance for null results. Psychol. Bull. 86 : 3 638– 41 [Google Scholar]
  • Rothstein HR , Sutton AJ , Borenstein M 2005 . Publication bias in meta-analysis. Publication Bias in Meta-Analysis: Prevention, Assessment and Adjustments HR Rothstein, AJ Sutton, M Borenstein 1– 7 Chichester, UK: Wiley & Sons [Google Scholar]
  • Rouder JN. 2016 . The what, why, and how of born-open data. Behav. Res. Methods 48 : 3 1062– 69 [Google Scholar]
  • Scheel AM , Schijen M , Lakens D. 2020 . An excess of positive results: comparing the standard psychology literature with Registered Reports. PsyArXiv, Febr. 5. https://doi.org/10.31234/osf.io/p6e9c [Crossref]
  • Schimmack U. 2012 . The ironic effect of significant results on the credibility of multiple-study articles. Psychol. Methods 17 : 4 551– 66 [Google Scholar]
  • Schmidt S. 2009 . Shall we really do it again? The powerful concept of replication is neglected in the social sciences. Rev. Gen. Psychol. 13 : 2 90– 100 [Google Scholar]
  • Schnall S 2014 . Commentary and rejoinder on Johnson, Cheung, and Donnellan (2014a). Clean data: Statistical artifacts wash out replication efforts. Soc. Psychol. 45 : 4 315– 17 [Google Scholar]
  • Schwarz N , Strack F. 2014 . Does merely going through the same moves make for a “direct” replication? Concepts, contexts, and operationalizations. Soc. Psychol. 45 : 4 305– 6 [Google Scholar]
  • Schweinsberg M , Madan N , Vianello M , Sommer SA , Jordan J et al. 2016 . The pipeline project: pre-publication independent replications of a single laboratory's research pipeline. J. Exp. Soc. Psychol. 66 : 55– 67 [Google Scholar]
  • Sedlmeier P , Gigerenzer G. 1992 . Do studies of statistical power have an effect on the power of studies?. Psychol. Bull. 105 : 2 309– 16 [Google Scholar]
  • Shadish WR , Cook TD , Campbell DT 2002 . Experimental and Quasi-Experimental Designs for Generalized Causal Inference Boston: Houghton Mifflin [Google Scholar]
  • Shiffrin RM , Börner K , Stigler SM. 2018 . Scientific progress despite irreproducibility: a seeming paradox. PNAS 115 : 11 2632– 39 [Google Scholar]
  • Shih M , Pittinsky TL 2014 . Reflections on positive stereotypes research and on replications. Soc. Psychol. 45 : 4 335– 38 [Google Scholar]
  • Silberzahn R , Uhlmann EL , Martin DP , Anselmi P , Aust F et al. 2018 . Many analysts, one data set: making transparent how variations in analytic choices affect results. Adv. Methods Pract. Psychol. Sci. 1 : 3 337– 56 [Google Scholar]
  • Simmons JP , Nelson LD , Simonsohn U 2011 . False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol. Sci. 22 : 11 1359– 66 [Google Scholar]
  • Simons DJ. 2014 . The value of direct replication. Perspect. Psychol. Sci. 9 : 1 76– 80 [Google Scholar]
  • Simons DJ , Shoda Y , Lindsay DS. 2017 . Constraints on generality (COG): a proposed addition to all empirical papers. Perspect. Psychol. Sci. 12 : 6 1123– 28 [Google Scholar]
  • Simonsohn U. 2015 . Small telescopes: detectability and the evaluation of replication results. Psychol. Sci. 26 : 5 559– 69 [Google Scholar]
  • Simonsohn U , Simmons JP , Nelson LD. 2020 . Specification curve analysis. Nat. Hum. Behav. 4 : 1208– 14 [Google Scholar]
  • Smaldino PE , McElreath R. 2016 . The natural selection of bad science. R. Soc. Open Sci. 3 : 9 160384 [Google Scholar]
  • Smith PL , Little DR. 2018 . Small is beautiful: in defense of the small-N design. Psychon. Bull. Rev. 25 : 6 2083– 101 [Google Scholar]
  • Soderberg CK. 2018 . Using OSF to share data: a step-by-step guide. Adv. Methods Pract. Psychol. Sci. 1 : 1 115– 20 [Google Scholar]
  • Soderberg CK , Errington T , Schiavone SR , Bottesini JG , Thorn FS et al. 2021 . Initial evidence of research quality of Registered Reports compared with the standard publishing model. Nat. Hum. Behav 5 : 8 990 – 97 [Google Scholar]
  • Soto CJ. 2019 . How replicable are links between personality traits and consequential life outcomes? The life outcomes of personality replication project. Psychol. Sci. 30 : 5 711– 27 [Google Scholar]
  • Spellman BA. 2015 . A short (personal) future history of revolution 2.0. Perspect. Psychol. Sci. 10 : 6 886– 99 [Google Scholar]
  • Steegen S , Tuerlinckx F , Gelman A , Vanpaemel W 2016 . Increasing transparency through a multiverse analysis. Perspect. Psychol. Sci. 11 : 5 702– 12 [Google Scholar]
  • Sterling TD. 1959 . Publication decisions and their possible effects on inferences drawn from tests of significance—or vice versa. J. Am. Stat. Assoc. 54 : 285 30– 34 [Google Scholar]
  • Sterling TD , Rosenbaum WL , Weinkam JJ. 1995 . Publication decisions revisited: the effect of the outcome of statistical tests on the decision to publish and vice versa. Am. Stat. 49 : 108– 12 [Google Scholar]
  • Stroebe W , Strack F. 2014 . The alleged crisis and the illusion of exact replication. Perspect. Psychol. Sci. 9 : 1 59– 71 [Google Scholar]
  • Szucs D , Ioannidis JPA. 2017 . Empirical assessment of published effect sizes and power in the recent cognitive neuroscience and psychology literature. PLOS Biol 15 : 3 e2000797 [Google Scholar]
  • Tiokhin L , Derex M. 2019 . Competition for novelty reduces information sampling in a research game—a registered report. R. Soc. Open Sci. 6 : 5 180934 [Google Scholar]
  • Van Bavel JJ , Mende-Siedlecki P , Brady WJ , Reinero DA 2016 . Contextual sensitivity in scientific reproducibility. PNAS 113 : 23 6454– 59 [Google Scholar]
  • Vazire S. 2018 . Implications of the credibility revolution for productivity, creativity, and progress. Perspect. Psychol. Sci. 13 : 4 411– 17 [Google Scholar]
  • Vazire S , Schiavone SR , Bottesini JG. 2020 . Credibility beyond replicability: improving the four validities in psychological science. PsyArXiv, Oct. 7. https://doi.org/10.31234/osf.io/bu4d3 [Crossref]
  • Verhagen J , Wagenmakers E-J. 2014 . Bayesian tests to quantify the result of a replication attempt. J. Exp. Psychol. Gen. 143 : 4 1457– 75 [Google Scholar]
  • Verschuere B , Meijer EH , Jim A , Hoogesteyn K , Orthey R et al. 2018 . Registered Replication Report on Mazar, Amir, and Ariely (2008). Adv. Methods Pract. Psychol. Sci. 1 : 3 299– 317 [Google Scholar]
  • Vosgerau J , Simonsohn U , Nelson LD , Simmons JP 2019 . 99% impossible: a valid, or falsifiable, internal meta-analysis. J. Exp. Psychol. Gen. 148 : 9 1628– 39 [Google Scholar]
  • Wagenmakers E-J , Beek T , Dijkhoff L , Gronau QF , Acosta A et al. 2016 . Registered Replication Report: Strack, Martin, & Stepper (1988). Perspect. Psychol. Sci. 11 : 6 917– 28 [Google Scholar]
  • Wagenmakers E-J , Wetzels R , Borsboom D , van der Maas HL. 2011 . Why psychologists must change the way they analyze their data. The case of psi: comment on Bem (2011). J. Pers. Soc. Psychol. 100 : 3 426– 32 [Google Scholar]
  • Wagenmakers E-J , Wetzels R , Borsboom D , van der Maas HL , Kievit RA. 2012 . An agenda for purely confirmatory research. Perspect. Psychol. Sci. 7 : 6 632– 38 [Google Scholar]
  • Wagge J , Baciu C , Banas K , Nadler JT , Schwarz S et al. 2018 . A demonstration of the Collaborative Replication and Education Project: replication attempts of the red-romance effect. PsyArXiv, June 22. https://doi.org/10.31234/osf.io/chax8 [Crossref]
  • Whitcomb D , Battaly H , Baehr J , Howard-Snyder D. 2017 . Intellectual humility: owning our limitations. Philos. Phenomenol. Res. 94 : 3 509– 39 [Google Scholar]
  • Wicherts JM , Bakker M , Molenaar D. 2011 . Willingness to share research data is related to the strength of the evidence and the quality of reporting of statistical results. PLOS ONE 6 : 11 e26828 [Google Scholar]
  • Wiktop G. 2020 . Systematizing Confidence in Open Research and Evidence (SCORE). Defense Advanced Research Projects Agency https://www.darpa.mil/program/systematizing-confidence-in-open-research-and-evidence [Google Scholar]
  • Wilkinson MD , Dumontier M , Aalbersberg IJ , Appleton G , Axton M et al. 2016 . The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3 : 1 160018 [Google Scholar]
  • Wilson BM , Harris CR , Wixted JT. 2020 . Science is not a signal detection problem. PNAS 117 : 11 5559– 67 [Google Scholar]
  • Wilson BM , Wixted JT. 2018 . The prior odds of testing a true effect in cognitive and social psychology. Adv. Methods Pract. Psychol. Sci. 1 : 2 186– 97 [Google Scholar]
  • Wintle B , Mody F , Smith E , Hanea A , Wilkinson DP et al. 2021 . Predicting and reasoning about replicability using structured groups. MetaArXiv, May 4. https://doi.org/10.31222/osf.io/vtpmb [Crossref]
  • Yang Y , Youyou W , Uzzi B. 2020 . Estimating the deep replicability of scientific findings using human and artificial intelligence. PNAS 117 : 20 10762– 68 [Google Scholar]
  • Yarkoni T. 2019 . The generalizability crisis. PsyArXiv, Nov. 22. https://doi.org/10.31234/osf.io/jqw35 [Crossref]
  • Yong E 2012 . A failed replication draws a scathing personal attack from a psychology professor. National Geo-graphic March 10. https://www.nationalgeographic.com/science/phenomena/2012/03/10/failed-replication-bargh-psychology-study-doyen/ [Google Scholar]

Data & Media loading...

Supplementary Data

Download the Supplemental Material as a single PDF. Includes Supplemental Text, Supplemental Tables 1-12, and Supplemental Figures 1-2.

  • Article Type: Review Article

Most Read This Month

Most cited most cited rss feed, job burnout, executive functions, social cognitive theory: an agentic perspective, on happiness and human potentials: a review of research on hedonic and eudaimonic well-being, sources of method bias in social science research and recommendations on how to control it, mediation analysis, missing data analysis: making it work in the real world, grounded cognition, personality structure: emergence of the five-factor model, motivational beliefs, values, and goals.

SIPS logo

  • Previous Article
  • Next Article

Statistical Considerations

Theoretical and methodological considerations, data accessibility statement, competing interests, author contributions, peer review comments, when and why to replicate: as easy as 1, 2, 3.

ORCID logo

  • Split-Screen
  • Article contents
  • Figures & tables
  • Supplementary Data
  • Peer Review
  • Open the PDF for in another window
  • Guest Access
  • Get Permissions
  • Cite Icon Cite
  • Search Site

Sarahanne M. Field , Rink Hoekstra , Laura Bringmann , Don van Ravenzwaaij; When and Why to Replicate: As Easy as 1, 2, 3?. Collabra: Psychology 1 January 2019; 5 (1): 46. doi: https://doi.org/10.1525/collabra.218

Download citation file:

  • Ris (Zotero)
  • Reference Manager

The crisis of confidence in psychology has prompted vigorous and persistent debate in the scientific community concerning the veracity of the findings of psychological experiments. This discussion has led to changes in psychology’s approach to research, and several new initiatives have been developed, many with the aim of improving our findings. One key advancement is the marked increase in the number of replication studies conducted. We argue that while it is important to conduct replications as part of regular research protocol, it is neither efficient nor useful to replicate results at random. We recommend adopting a methodical approach toward the selection of replication targets to maximize the impact of the outcomes of those replications, and minimize waste of scarce resources. In the current study, we demonstrate how a Bayesian re–analysis of existing research findings followed by a simple qualitative assessment process can drive the selection of the best candidate article for replication.

In 2005, Ioannidis published a theoretical article ( 2005 ), in which he argued that more than half of published findings may be false. The landmark mass replication effort of the Open Science Collaboration (henceforth OSC; 2015 ) gave empirical support for Ioannidis’s claims a decade after they were made, but reported an even bleaker narrative. Only 36% of replication studies were successful in yielding a result comparable to that reported in the original article (more recent mass replication attempts have revealed similarly low reproducibility levels: Klein et al., 2018 ; Camerer et al., 2018 ). To put this finding in context: had all of the original results been true, a minimum reproducibility rate of 89% would be expected, according to the OSC ( 2015 ). These figures reflect the gravity of what is now known as the crisis of confidence, or replicability crisis, in science. Though the discussion began in psychology, reports of unsatisfactory reproducibility rates have come from many different fields in the scientific community ( Baker 2016 ; Begley & Ioannidis, 2015 ; Chang & Li, 2018 ).

The literature has suggested a number of potential causes for poor reproducibility of research findings. One of the most obvious candidate causes is the publish or perish culture in academia ( Grimes, Bauch, & Ioannidis, 2018 ), which describes the pressure on researchers to publish much and often in order to maintain their university faculty positions, or to move up the hierarchical ‘ladder’. Another possible cause is the alarmingly high prevalence of QRPs (questionable research practises) in which researchers engage. HARKing (hypothesizing after the results are known), p-hacking (where one massages the data to procure a significant p-value) and the ‘file drawer’ problem (where researchers do not attempt to publish their null results) are all examples of QRPs ( Kerr, 1998 ; John, Loewenstein, & Prelec, 2012 ; Rosenthal, 1979 ). They lead to a literature that is unreliable, and apparently in many cases (and often as a result), impossible to replicate.

Irrespective of the causes of the crisis of confidence, its consequence is irrefutable: scientific communities are questioning the veracity of many of the key findings of psychology, and are hesitant to trust the conclusions upon which they are based. A recent online Nature news story suggested that most scientific results should not be trusted ( Baker, 2015 ). Research psychologists are asking whether science is “broken” ( Woolston 2015 ); others have referred to the “terrifying unraveling” of the field ( Aschwanden, 2016 ). Proposed solutions to this crisis of confidence have revolved around reviewers demanding openness as a condition to provide reviews ( Morey et al., 2016 ), guidelines for more openness and transparency ( Nosek et al., 2015 ), preregistration and registered reports ( Dablander, 2017 ), and funding schemes directly aimed at replication (such as those of the Netherlands Organisation for Scientific Research: https://www.nwo.nl/en/news-and-events/news/2017/social-sciences/repeating-important-research-thanks-to-replication-studies.html ).

These initiatives, while a first step in the right direction, only go so far to remedy the problem because they are preemptive in nature; only prescribing best practice for the future . They cannot help untangle the messy current literature body we continue to build upon. The most direct way to get more clarity about previously reported findings may well be through replication (but see Zwaan, Etz, Lucas, & Donnellan, 2018 , who discuss some caveats). Therefore, psychological science needs a way to separate the wheat from the chaff; a way to determine which findings to trust and which to disregard. Replication of existing empirical research articles is a practical way to meet this dire need. The need for replications introduces a second generation of complications related to interest in conducting replication studies: a flood of new replications of existing research can be found in the literature, and more are being conducted.

In theory, this up-tick in the number of replications being conducted is a good development for the field (especially given that up until recently, replication studies only occupied about 1 percent of the literature body: see Makel, Plucker, & Hegarty, 2012 ), however in practice, so much interest in conducting replications leads to a logistical problem: there exists a vast body of literature that could be subject to replication. The question is: how does one select which studies to replicate from the ever-increasing pool of candidates? Which replications retread already ‘well-trodden ground’, and which move research forward ( Chawla, 2016 )? These questions have serious practical implications, given the scarcity of resources (such as participants and time) in many scientific research fields.

Several recommendations to point us in the right direction exist in the literature already. A great number of these happen to be conveniently grouped as commentaries to Zwaan and colleagues’ recent impactful article: ‘Making Replication Mainstream’ (2018). For instance, Coles, Tiokhin, Scheel, Isager and Lakens ( 2018 ), and Hardwicke and colleagues ( 2018 ) urge potential replicators to use a formalized decision making process, and only conduct a replication when the results of a cost-benefit analysis suggest that the benefits of such a replication outweigh the associated costs. Additionally, they emphasize considering other factors such as the prior plausibility of the original article’s reported effects. Kuehberger and Schulte-Mecklenbeck ( 2018 ) argue against selecting replications studies at random and discuss potential biases that can emerge in the process of selecting studies to replicate. Little and Smith single out problems with existing literature as reasons for replication failure (such as weak theory measurement), which can be reasons for targeting some original studies for replication, over others. Finally, Witte and Zenker ( 2018 ) recommend replication only those studies which provide theoretically important findings to the literature. Coming from the opposite angle, Schimmack ( 2018 ) provides reasoning as to why to not replicate certain studies, which, naturally, is also useful in refining ones selection criteria for replication targets.

One could say that, broadly, there are three different facets to selecting replication targets, associated with the different information contained in a published article: statistical, theoretical and methodological. In the next two subsections, we first discuss statistical considerations and then theoretical and methodological considerations in turn.

First, studies can be selected for replication when their claims require additional corroboration, based on the statistical evidence reported in the publication. This is a statistical approach to determining what should be replicated first. Null-hypothesis significance testing (or NHST) dominates the literature, meaning that the bulk of statistical testing involves reporting p-values.

Although there are numerous downsides to using NHST to quantify scientific evidence (for a discussion, see Wagenmakers, 2007 ), we focus on one key drawback here which relates directly to our discussion. The p-value only allows us to reject the null hypothesis: there is a single evidence threshold, meaning that we cannot use the p-value to gather evidence in favor of the null hypothesis, no matter how much evidence may exist for it. Given that it is unlikely that each study reporting an effect is based on a true main effect ( Ioannidis, 2005 ), but that studies rarely use statistical techniques to quantify evidence for the absence of an effect, there is a mismatch in what we can conclude and what we want to conclude from our statistical inference ( Haucke, Miosga, Hoekstra, & van Ravenzwaaij, 2019 ).

One alternative to quantifying statistical evidence with the conventional NHST framework is by means of Bayes factors . Throughout this paper, we will use a relatively diffuse default prior distribution for effect size to reflect the fact that we do not possess strong prior information (see also Etz & Vandekerckhove, 2016 ). In this paper we examine scenarios calling for a t-test. For such designs, one of the most prominent default specifications uses what is known as the Jeffreys-Zellner-Siow (JZS) class of priors. Development of these so-called JZS Bayes factors have been built on the pioneering work of Jeffreys ( 1961 ) and Zellner and Siow ( 1980 ). The JZS Bayes factor quantifies the likelihood of the data under the null hypothesis (with effect size δ = 0) relative to the likelihood of the data under the alternative hypothesis. For a two-sided test, the range of alternative hypotheses is given by a prior on the effect size parameter δ, which follows a Cauchy distribution with a scale parameter r = 1 / 2 (see, Rouder, Speckman, Sun, Morey, & Iverson, 2009 , equation in note 4 on page 237). In terms of interpretation, a Bayes factor of BF 10 = 5 means the data are 5 times more likely to have occurred under the alternative hypothesis than under the null hypothesis. In comparison, a Bayes factor of BF 10 = 1/5 (or the inverse of 5), means the observed data are five times more likely to have occurred under the null hypothesis than under the alternative hypothesis. 1

The application of the JZS Bayes factor for a large-scale reanalysis of published results is not without precedent ( Hoekstra, Monden, van Ravenzwaaij, & Wagenmakers, 2018 ). We build upon the work of Hoekstra and colleagues in taking the results of such a Bayesian reanalysis as a starting point for selecting replication targets (for a similar approach, see Pittelkow, Hoekstra, & van Ravenzwaaij, 2019 ).

So why not simply use p-values as our selection mechanism for existing statistical evidence? When NHST results are reanalyzed and transformed into Bayes factors, the relationship between Bayes factors and p-values can be strong if the analyzed studies have mostly comparable sample sizes ( Wetzels et al., 2011 ; Aczel, Palfi, & Szaszi, 2017 ). However, when studies have differing sample sizes, this relationship is no longer straightforward (for instance, see Hoekstra and colleagues ( 2018 ), who show that for non-significant findings the strength of pro-null evidence is better predicted by N than by the p-value, and that larger N studies are “more likely to provide compelling evidence”, p. 6). Consider the following example for illustration.

We have two results of classical statistical inference:

Scenario 1: t(198) = 1.97, p = .05 Scenario 2: t(199998) = 1.96, p = .05

In both cases, the p-value is significant at the conventional alpha-level of .05, however due to the very different sample size in both scenarios, these two sets of results reflect very different levels of evidential strength. The Bayes factor, unlike the p-value, can differentiate between these two sets of results. Through the lens of the Bayes factor, scenario 1 presents ambiguous evidence: BF 10 = 0.94 (i.e., the data is about equally likely to occur under the null hypothesis as under the alternative hypotheses). A Bayes factor for scenario 2 presents strong evidence in favor of the null: BF 10 = 0.03 (i.e., the data is about 29 times more likely to occur under the null hypothesis than under the alternative hypothesis). Using the p-value as a criterion for which study to replicate would not differentiate between these two scenarios, whereas the Bayes factor allows us to decide that in case of Scenario 2, we have strong evidence that the null hypothesis is true (and so, arguably, no further replication is needed), whereas in case of Scenario 1, the evidence is ambiguous and replication is warranted.

In this paper, we apply a Bayesian reanalysis to several recent research findings, the end-goal being to demonstrate a technique one can use to reduce a large pool of potential replication targets to a manageable list. The Bayesian reanalysis is diagnostic in the sense that it can assist us in separating findings into three classes, or tiers of results: (1) results for which the statistical evidence pro-alternative is compelling (no replication is needed); (2) results for which the statistical evidence pro-null is compelling (no replication is needed); (3) results for which the statistical evidence is ambiguous (replication may be needed depending on theoretical and methodological considerations). We reiterate here that, crucially, p-values are unable to differentiate between results which belong in the second of these categorical classes, and those that belong in the third. The third class of studies will be carried into the next ‘phase’ of our demonstration, wherein we further scrutinize study results with ambiguous statistical evidence on theoretical and methodological considerations that might factor into the decision to replicate.

Mackey ( 2012 ) provides some pointers on how one may select a replication target based on the theoretical content of a reported research finding. She suggests that in order to qualify as a ‘candidate’ for replication, a study should address theoretically important (for short, ‘theoretical importance’) and currently relevant research questions (‘relevance’). A study also qualifies if it concerns studies in the field that are accepted as true in the field, but have yet to be sufficiently investigated (‘insufficient investigation’). 2 The theoretical approach will be explained as we describe it in a practical application later in the paper.

The last facet to selecting replication studies concerns methodological information. While many aspects of a study’s methodology are highly specific to the paradigm of the article in question (e.g., the use of certain materials like visual stimuli), some elements of methodology can be discussed in general (e.g., sample size). As with the theoretical facet, methodology will be discussed in more detail during the later demonstration.

A replication study itself is beyond the scope of this paper, however we offer a demonstration of how the combined use of theory and Bayesian statistics can drive a methodical and qualitative approach to selecting replication targets in the psychological sciences. Additionally, we offer theoretical and methodological recommendations, in case such a replication were to be conducted. Please note that although the theoretical context and methodology of a study is important for selecting studies for replication, our demonstration focuses primarily on applying the Bayesian reanalysis to this challenge.

The remainder of this paper is organized as follows. In the method section, we share details of our treatment of the replication candidate pool in the reanalysis phase. We then describe the results of the initial selection process, before moving on to describing the qualitative phase of the filtering process. We make recommendations based on our selection process, for a fictional replication study. The article ends with our discussion, wherein we justify certain subjective choices we have made and consider philosophical issues, and share the limitations of our method.

We extracted statistical details from articles in the 2015 and 2016 Psychological Science and performed a Bayesian reanalysis to make a first selection of which studies could be targets for replication, based on the evidential strength of the results reported. Once this initial selection was made, we further refined the selection based on the theoretical soundness of the conclusions drawn from the selected studies, and considered the support for the finding which exists in the literature already. The approach combined quantitative and qualitative methods: on the one hand, the initial selection was based on an empirical process, and on the other, the refinement of the selection was based on a process involving judgments of the findings in the context of the literature and theory. The process took the first author less than a working week to complete. Given that we provide the reader with the reanalysis code, and the spreadsheet with the necessary values to complete the reanalysis, we believe that attempts by others to use our method for a similarly sized sample would not be any more time-intensive than our original execution.

All Psychological Science articles from 2015 and 2016 issues were searched for reported significant statistical tests (one–sample, paired, and independent t-tests), associated with primary research questions. As mentioned, we used statistical significance as our criterion for selecting results to reanalyze. All of the articles reporting t-tests to test their main hypotheses used p-values to quantify their findings. We extracted the t-values and other details required for the reanalysis (including N and p-value) for 30 articles which contained t-tests (the data spreadsheet which logs these details for each statistic extracted is on the project’s Open Science Framework (OSF) page at https://doi.org/10.17605/OSF.IO/3RF8B ).

Incomplete or unclear reporting practices posed a challenge in the first step of selecting which articles to reanalyze. Determining whether the executed tests were one- or two-sided was often difficult, as articles frequently failed to report the type of test conducted. Several articles which used t-tests as part of their main analysis strategy were ultimately not included in the reanalysis, as not all information was available (not even to the extent that we could reverse engineer other necessary details). One article, which reported two t-tests in support of their main finding, was excluded from the final reanalysis. Due to unclear reporting, we were unable to identify what the study’s method entailed, and, therefore, how the reported results were reached. We explore the reporting problem in detail in the discussion section.

In total, from the 24 issues of 2015/2016 Psychological Science , 326 ‘research articles’ and ‘research reports’ were manually scanned for studies in which a major hypothesis was tested using a t-test. Of these, 57 results were derived from 30 individual articles. Several articles reported more than one primary experimental finding which was analyzed using a t-test. Different approaches yielded judgments of whether or not a finding was of focal importance. First, if a specific finding was reported in the abstract, it would be selected (where possible). The rationale for this approach was that the abstract has only got space for documenting the most important results of the study, thus only key findings will be reported in it. A finding was also selected if somewhere in the article it was tested in a primary hypothesis, or was explicitly noted by the authors of the article as being important for the study’s conclusions. Many articles reported several t-tests in support of a single broader hypothesis. In such cases we attempted to select the results which most directly supported the author’s conclusions.

Descriptive Results

P-values, test statistics, sample sizes and test sidedness were collected for the purpose of the reanalysis. The p-values ranged in value; the largest was .047. The test statistics and sample sizes obtained also ranged greatly. The absolute test statistics ranged from 2.00 to 7.49. The range of the sample sizes is from N = 16 to N = 484. The distribution of study sample sizes is heavily right-skewed. The median for this sample is 54 – smaller than recent estimates of typical sample sizes in psychological research ( Marszalek, Barber, Kohlhart, & Cooper, 2011 ).

In the Bayesian reanalysis, we converted reported information extracted from articles into Bayes factors, to assess the strength of evidence given by each result. 3 The Bayes factors range widely: 0.97 to 1.9 × 10 10 , or approximately 19 billion. Almost half of them are between 1 and 5.

A clear negative relationship between the Bayes factors and the reported p-values is shown in Figure 1 . Despite the nature of this relationship, some small p-values are associated with a range of Bayes factors (around the p = .04 mark, for instance). A positive relationship between Bayes factors and sample sizes can be seen in Figure 2 . Unsurprisingly, larger sample sizes are generally associated with larger Bayes factors ( r = .71), though it is not the case that large sample sizes are always associated with more compelling Bayes factors. For instance, many cases in the N = 200 region are associated with somewhat weak Bayes factors. In one case, the overall N of 30 converts to a Bayes factor of over 151,000, in another case, the overall N of 35 is associated with a Bayes factor of over 21,000.

Figure 1. Scatterplot of Bayes factors and p-values plotted on a log-log scale. The horizontal dashed lines indicate Jeffreys’ thresholds for anecdotal evidence (3, for pro-alternative cases, and the inverse for pro-null cases). The vertical red line demarcates the conventional significance level for p-values.

Scatterplot of Bayes factors and p-values plotted on a log-log scale. The horizontal dashed lines indicate Jeffreys’ thresholds for anecdotal evidence (3, for pro-alternative cases, and the inverse for pro-null cases). The vertical red line demarcates the conventional significance level for p-values.

Figure 2. Scatterplot of Bayes factors and sample size plotted on a log-log scale. The horizontal dashed lines indicate Jeffreys’ thresholds for anecdotal evidence (3, for pro-alternative cases; the inverse for pro-null cases). The cases in which we are interested for the reanalysis, those in tier 3, lie between the two finely dashed lines.

Scatterplot of Bayes factors and sample size plotted on a log-log scale. The horizontal dashed lines indicate Jeffreys’ thresholds for anecdotal evidence (3, for pro-alternative cases; the inverse for pro-null cases). The cases in which we are interested for the reanalysis, those in tier 3, lie between the two finely dashed lines.

Quantitative Target Selection

In this paper, we will make an initial selection based on those studies in tier 3: whose results yield only ambiguous evidence in relation to support for their reported hypotheses. For this purpose, we will judge such ambiguity, or low evidential strength, as when a study’s BF 10 lies between 1/3 and 3, which, by Jeffrey’s (1961) classification system provides no more than ‘anecdotal’ evidence for one hypothesis over the other.

Using the BayesFactor package in R ( Morey, Rouder, & Jamil, 2015 ), we calculated Bayes factors (BF) for each test statistic using the extracted test statistics, and other information gathered: p-values, test statistics, sample sizes and sidedness of the test. While the vast majority did not explicitly state that they were confirmatory, most results were presented as though they were. The code written for the analysis which is associated with the data spreadsheet can be found at the project’s OSF page: https://doi.org/10.17605/OSF.IO/3RF8B .

The reanalysis revealed that the Bayesian reanalysis placed 20 results in evidence tier 3. One of these yielded a Bayes factor below 1 (0.97), which, by Jeffrey’s classification system, demonstrates anecdotal pro-null evidence. The remainder of the results lie in tier 1. As we were only interested in those articles for which an effect was reported, no results falling in tier 2 (those with compelling pro-null evidence) exist in this dataset. The reanalysis has reduced the pool of results from 57 to 20 candidates for replication. We now move onto the next stage of target selection.

Qualitative Target Selection

Of the 20 results in tier 3, we select those demonstrating the weakest evidence for their effects. If there is an article for which many results fall in tier 3, these will also be considered. 4 We will then conduct an assessment based on the qualitative criteria of Mackey ( 2012 ): theoretical importance, relevance, and insufficient investigation. Alongside Mackey’s criteria, we consider the need for the finding in question to be replicated under different study conditions or with a different sample than the original (to establish the external validity of the effect in question), as well as replication feasibility (for instance, can this study be replicated by generally-equipped labs, or are more specific experimental set-ups necessary?). We will refer to the articles by the article number we have given them (the article and reanalysis details corresponding to these can be found in Appendix A; a full table of all the details can be found on the OSF page for this project, at https://doi.org/10.17605/OSF.IO/3RF8B ).

The first to consider is the article revealed by the reanalysis to contain anecdotal pro-null evidence in one of its studies: article 8, from Dai, Milkman and Riis ( 2015 ). The authors of article 8 report on the so-called ‘fresh start effect’. This effect refers to the use of temporal landmarks to initiate goal pursuit. More specifically, the authors’ report supports claims that certain times of year (for instance, New Year’s Eve) are especially potent motivators for starting new habits (such as working out, or eating more wisely). Although some evidence in this article is weakly pro-null (result 8a), one strike against naming article 8a as a suitable target for replication, is that the article contains a second result we reanalyzed (result 8b) which yielded a Bayes factor of 5.05 (constituting pro-alternative evidence: Gronau et al., 2017 ). 5

In terms of Mackey’s ( 2012 ) criteria, the study is difficult to judge as a replication target. Article 8’s topic is theoretically important and certainly currently relevant: understanding the relationship between motivation and initiating healthy eating behavior is important for many reasons (for developing strategies to lowering the global burden of health due to preventable disease, for instance). However, the link between temporal landmarks and motivation has been demonstrated often and by different research groups ( Peetz & Wilson, 2013 ; Mogilner, Hershfield, & Aaker, 2018 ; Urminsky, 2017 ), as well as in other studies by related groups ( Dai, Milkman, & Riis, 2014 ; Lee & Dai, 2017 ), including a randomized clinical trial measuring adherence to medical treatment ( Dai et al., 2017 ).

Although this phenomenon has been the subject of many different studies, and the content of article 8 lends itself to interesting replications in which one varies, for instance, the culture of the sample, existing literature in the area already demonstrates the effect in other cultures than the USA (e.g., Germany: Peetz & Wilson, 2013 ), it is not a clear replication target, in our assessment.

The majority of the remaining results in tier 3 show Bayes factors that are homogeneous in terms of their magnitude– for instance, half of the results have a Bayes factor between 1 and 2. Additionally, for articles with multiple reanalyzed studies, we see only one case in which each of these studies fell into tier 3. They may reflect one study of many in an article which overall, through other tests, provides strong evidence of a main effect. Both of these reasons render the majority of the sample less attractive as replication candidates.

Despite this, two articles (both featuring multiple low Bayes factors each) are potential targets. 6 We now commit these to the qualitative assessment to determine their suitability for replication, in no particular order.

One potential replication target is article 4 ( Reinhart, McClenahan, and Woodman, 2015 ), in which the hypothesis that using mental imagery, or ‘visualizing’ can improve attention to targets in a visual search scene was tested. The authors recorded reaction times (RT) and event-related potentials (quantified as N2pc amplitudes, which reflect ongoing neural processes – in this case, attention) in response to the provided stimuli. They reported support for their hypothesis: imagining the visual search for certain targets did increase the speed at which participants focused on the specified targets (indexed by the ERP), before the motor response of pressing a button to confirm they had located the target. This article yielded three t-tests (each testing the experimental conditions on RT), which are of interest to us. We refer to them as results 4a through 4c, respectively. They appear in the results for the first experiment, which we judged to be a clear test of their primary hypothesis. Each of these t-tests correspond to a small Bayes factor. The RT tests correspond to Bayes factors of 3.19, 1.99 and 2.02, while the EEG tests yielded Bayes factors of 1.83 and 2.53. (the two other t-tests in the sets were not significant, thus are not of interest to us for the purposes of this reanalysis).

This article meets several of the qualitative criteria too. First, the topic is theoretically important and currently relevant. Training the brain for better performance has been gaining momentum in the past decade, partly prompted by several articles that support the positive link between video-gaming and improved mental performance in different cognitive domains (such as attention: Green & Bavelier, 2012 ). Exploring the link further with studies such as this can be beneficial to many areas of psychology and medicine (e.g., for working with patients of brain damage that are undergoing rehabilitation). Second, there is little supporting evidence for the link between visualization and improved attention; importantly, some of the literature aiming to reinforce the findings of article 4 contradicts it. For instance, the preregistered failed replication and extension of article 4’s experiments conducted by Clarke, Barr and Hunt ( 2016 ) showed repeated searching – not visualization – improved attention. Other factors to consider are generalization and feasibility. The suitability of article 4 as a replication target is supported by fact that this article has already been a target for replication, and that that replication did not conclusively reinforce its conclusions. It is possible that this study should be weighted differently in the sample due to the previous replication. Indeed, one could numerically account for the evidence contributed by the existing replication (e.g., Gronau et al., 2017 ). We consider that to be outside of the scope of this paper.

Their sample for experiment one was comprised of adults between the ages of 18 and 35, with a gender split of 62% to 38% in favor of women. The findings of article 4 could benefit from a replication using a different sample: for instance, one with individuals from an older age range. Although age is not thought to impair neuroplasticity, older persons exhibit plasticity occurring in different regions of the brain than younger persons influencing the mechanisms underlying visual perceptual learning ( Yotsumoto et al., 2014 ), which may influence their response to the stimuli presented in the experiments in the article. This has implications for the generalizability of the results. Another potentially important factor for consideration is gender. A recent review article by Dachtler and Fox ( 2017 ) reports clear gender differences in plasticity that are likely to influence several cognitive domains (including learning and memory), due to circulating hormones such as estrogen, which are known to influence synaptogenesis. To summarize, we find article 4 to be suitable as a replication candidate. Specifically, some of its findings could benefit from external reinforcement in the form of a conceptual replication in which factors such as age and gender are taken into consideration. Further, the results may benefit from a more in-depth exploration into the effect of searching versus visualization on attention.

Another replication target that our sample yielded is article 12: Kupor, Laurin and Levav ( 2015 ). Mentioned above, all reanalyzed results of this article (i.e., a through c) fall into tier 3. Article 12 (which includes 5 studies, each with sub-studies), explores the general hypothesis that reminders of God increase risk-taking behavior. In study 1, which this reanalysis focused solely on (as it most directly tested the key hypothesis), four sub-studies are identified: 1a, 1b, 1c and 1d. The first three contain t-tests, while the fourth contains a chi-square test. We consider only the results of 1a through c (12a through c) for the current reanalysis.

In the study corresponding to result 12a, participants performed a priming task involving scrambled sentences. Half the participants were primed with concepts of God, by way of exposure to words such as “divine” (p. 375). The other half, which forms the control group, were exposed only to neutral words. Once participants were primed, they completed a self-report risk-taking scale which was explained to participants as being an unrelated study. This scale revealed their likely risk-taking behavior in a one to five Likert scale. In the study yielding result 12b, following the manipulation, participants described the likelihood that they would attempt a risky recreational task that they had described themselves at an earlier point. In the study corresponding to result 12c, participants were tested on their interest in risk-taking via a behavioral measure, once they were primed in the first phase of the experiment. In each of these three experiments, participants primed with concepts of God reported or behaved as predicted: more predisposed to risk-taking than their neutrally-primed counterparts. Despite these three experiments yielding significant p-values, the reanalysis revealed three Bayes factors all suggesting the evidence is ambiguous: 1.96, 1.68 and 1.83, respectively for results 12a–c.

We now assess article 12 on the qualitative factors we described earlier. First, we consider the theoretical importance and current relevance of this article. Given that the majority of the world identifies as being religious (84%, according to recent statistics: Hackett, Stonawski, Potančoková, Grim, & Skirbekk, 2015 ), understanding the role of religion in moderating behavior is important, to say the least. According to the authors of article 12, behavior modification programs such as those employed for drug and alcohol rehabilitation use concepts of God and religion as a tool to reduce delinquent behavior. While this topic has attracted the attention of several research groups globally (meaning the article does not naturally meet the ‘insufficient evidence’ criterion), the reanalyzed results in article 12 go against the majority of this body of work: “… we propose that references to God can have the opposite effect, and increase the tendency to take certain types of risks” (p. 374), and do not seem to have direct strong support in the literature as yet (a paucity of indirect support can be found, e.g., Wu & Cutright, 2018 ).

In assessing the characteristics of article 12’s sample, some details indicating the suitability of article 12 for replication come to light. First, article 12 reports using Amazon’s Mechanical Turk online workforce, which is comprised of approximately 80% U.S.-based workers, and 20% Indian workers. Given that the majority of the Mechanical Turk workers are from the U.S., and the overwhelming majority of the U.S. reports being affiliated with Christianity, we expect that the majority of this sample respond with a mindset of trusting in a God which is thought to intervene on the behalf of the faithful, responding to prayers for things like healing, guidance and help with personal troubles. The results of article 12 might be very different if the participant pool contained mostly practitioners of Buddhism (for example), as Buddhism emphasizes the importance of enlightenment (when an individual achieves an understanding of life’s truth), and personal effort, rather than the intervention of a divine being (which is relevant given that feelings of security are thought to increase willingness to engage in certain behaviors: p. 374).

The age of article 12’s sample is also relevant to their results, considering that the majority of workers (>50%) were born in the 1980s. Recent polls indicate that younger individuals across Europe, the USA and Australia are less religious than their older counterparts ( Harris, 2018 ; Wang, 2015 ; Schneiders, 2013 ), meaning that a successful replication of article 12’s results with a predominantly aged population (as opposed to the mean ages of 23, 31 and 34 years, reported in the article) would demonstrate the generalizability of the finding that God-priming increases risk taking. 7 Another possibility also relates to age – perhaps the effect is greatly decreased in aged persons, simply by virtue of maturity: Risk-taking, even for rewards, decreases as a function of age ( Rutledge et al., 2016 ).

Our reanalysis of article 12’s results, in conjunction with other methodological and theoretical criteria considerations heavily underlines this replication candidate as a promising target, reporting results that are in need of independent corroboration. We recommend a direct, or pure replication, such that the findings exactly as they are presented can be verified. In addition, we recommend a conceptual replication in which significant changes to the characteristics of the sample are made (e.g., as mentioned, on the basis of the participants’ ages and religions).

In this paper, we performed a large scale reanalysis of the results of a selection of articles published in Psychological Science in the years of 2015 and 2016 for which primary research findings were quantified by t-tests. Reanalyzing these results narrowed the pool of potential replication targets from 57 to 20 candidates. The Bayes factors for these candidate studies were between 0.97 and 2.85. To further our demonstration, we selected three articles, and subjected them to the second phase of the selection process, involving qualitative assessment. The qualitative process revealed that two of these articles are suitable for replication: their findings are theoretically important and relevant, but the literature largely lacks direct corroborating evidence for the claims thus far. It revealed that the results could benefit from changes to the magnitude of the samples, and that several variables should be included in conceptual replications to help generalize the reported results beyond the original articles.

A set of replications for articles 4 and 12 could first provide support for the existence of an effect, given the results of the Bayesian reanalysis. Once an underlying true effect is found to likely exist via a direct replication, further conceptual replications could be designed to explicitly explore other cohorts to better establish the generalizability of the findings beyond the original experimental cohort. In the case of article 4, specifically targeting participants of certain age groups may be beneficial to help determine the malleability of the effect across the lifespan. For article 12, targeting specific religious groups may assist in helping establish whether the God priming effect extends to other religions for which God is not a figure directly associated with intervention. These conceptual replications could also feature designs which vary from the originals – for instance, a replication of article 4 could feature a design in which gender is a blocking variable, or even included as a variable of interest.

Replications for both articles should contain much larger sample sizes, to help eliminate issues of reliability. In order to conduct a compelling replication study, one may need a sample size greater than that in the original study, depending on how large the sample is in the original study. Low experimental power produces some problems with reliability of original findings, leading to poor reproducibility even when other experimental and methodological conditions are ideal, which they rarely are ( Button et al., 2013 ; Wagenmakers & Forstman, 2014 ).

A simulation by Button and colleagues ( 2013 ) demonstrates an argument against the common misconception that if a replication study has a similar effect size to the original, the replication will have sufficient power to detect an effect. They show that “… a study that tries to replicate a significant effect that only barely achieved nominal statistical significance (that is, p ~ 0.05) and that uses the same sample size as the original study, will only achieve ~50% power, even if the original study accurately estimated the true effect size” (p. 367). This indicates that in order to obtain sufficient power (say, 1– β = .8) for a medium effect size in a replication study, the original sample would need to be more than doubled. In terms of the sample size in question, this indicates an increase from N = 105 to N = 212 for each of the replication studies.

Choice Justifications

Prior choice.

Though we do not want to rehash decades of debate about prior selection, our use of a Bayesian approach in our reanalysis stage, necessitates a brief discussion on our choice of prior. We have chosen to use the default prior – the Cauchy – in the BayesFactor package. This choice is suitable for our goals for a few reasons (and we recommend that the typical user use the package defaults for the same reasons). First, the Cauchy prior’s properties make it an ideal choice for a weakly informative prior based on ‘general desiderata’ ( Jeffreys, 1955 ). Second, even if we did want to use a subjective prior, the most obvious approach to doing so would yield unreliable results. Using the existing literature on an effect to inform one’s prior choice would be a poor idea due to publication bias. Other factors exist that complicate subjective prior use. For instance, the existing literature on a particular phenomenon might be conflicting (in which case, the ‘right’ subjective prior might not exist), or may be very sparse (in which case little information would be available to adequately inform the prior). This being said, there are potential users of our method that may have sufficient expertise to navigate this complex situation and wish to select an alternative to the Cauchy prior. We refer such users to Verhagen and Wagenmakers ( 2014 ) or to Gronau, Ly and Wagenmakers ( 2019 ), both of which deal with Bayesian t-tests with explicit prior information available.

Selection Based on Significance

We used statistical significance as the criterion for selecting results for the Bayesian reanalysis. One may wonder why we have not chosen to inspect the claims of the non-existence of an effect based on a non-significant p-value. We have two reasons for using statistical significance (that is, when original article authors used statistical significance to justify their claims). First, although we believe statistical significance is hardly diagnostic of a true effect, the lack of statistical significance being related to no effect is even more complicated. If one were to try to replicate a non-significant result, what would the result say of the original effect? This problem does not exist for, say, an original study with a strong pro-null Bayes factor result, as the Bayes factor allows us to actually quantify pro-null evidence.

Finally, some applications of our method could be constrained by the capabilities or resources of replicating labs – not all suitable replication candidates can be replicated by all interested parties, as shown in our description above. The study of article 4 is worthwhile as a replication target and warrants further investigation, however it requires specialized equipment and specific expertise to be recreated, and is therefore only feasible for select labs to seriously attempt. On the other hand, article 12 features a less specialized set of materials that could be recreated by a research group using easily-accessible university provided software (e.g., Qualtrics) and web-browsers.

Limitations of the reanalysis should be noted. It is not always clear from the reporting articles which test statistic is most suitable to extract for purposes of reanalysis. One main reason for this difficulty was outlined earlier in the methods section of the study – inconsistent reporting practices. Despite a clear and detailed article published in American Psychologist by the APA in 2008 that discusses desirable reporting standards in psychology, and other initiatives in other fields to improve research reporting (e.g., the guidelines developed to improve the reporting of randomized-controlled trials in health-related research: Moher, Schulz, & Altman, 2001 ), many researchers in the social sciences have failed to adopt them (Mayo-Wilson, 2013). To be clear, poor standards of reporting are not the norm only in psychological science. To illustrate: Mackey ( 2012 ) in linguistics research states that insufficient reporting of details important for replication is problematic in many studies (p. 26); Button and colleagues ( 2013 ) in biomedical research, discuss the relationship between insufficient reporting of statistical details and false positives in results. We also recognize that it is difficult to manage a good balance between adequate reporting and the word limit in many (especially higher-impact) journals. Though, on the other hand, authors can upload supplementary documents to the various platforms available (the Open Science Framework, or Curate Science, for instance), or submit.

Another limitation regards our reanalysis of only t-tests. While reanalysis of more complex designs is possible using the Bayes factor package, we only demonstrate with the simpler design of the t-test. We intend to show, by this demonstration, a proof of concept of a methodical and evidence-driven approach to choosing targets for replication. The Bayesian reanalysis is a clear strength, from which replicating labs can draw, however we do not advocate only the use of a Bayesian reanalysis. We must consider factors that place the article and its content in context. We must consider its appropriateness as a study for replication (is a replication feasible for less well-equipped or specialist labs?), as well as the literature body it is part of. Is the study generally well supported, or does it tell a story conflicting with existing findings? Is it theoretically important, and does it hold relevance in its current historical, social and cultural context?

The reader may wonder why we have chosen not to assess the soundness of certain aspects of the methodologies of the original studies as a criterion for what studies to replicate. Although we argue that such a set of assessments is outside of the scope of the article, we recognize that to attempt to replicate an effect elicited by a poor methodological set-up is ill advised. We recommend that users of our method use their own judgment to determine whether or not an original article’s methods are sound, and to consider each experiment of their final filtered sample in turn. If the methods of the final sample of potential targets is difficult for a user to assess (for example, perhaps one ends up with two targets using highly technical methods that the typical user may be unfamiliar with), the user may want to limit themselves to those studies for which they are confident assessing the soundness of the chosen methodology.

A practical yet somewhat philosophical argument must be raised of how one might use the Bayesian reanalysis to prioritize replication targets. The reader critical of Bayes factors may suggest that no matter what classification one uses (Jeffreys or otherwise), Bayes factors still do not provide a complete measure of the information contained in a given original study. This reader would be right, though this can be said for any currently used quantification approach. We stress that we are not advertising the Bayesian reanalysis as the only route to a search for replication targets. We argue that it is a tool one can apply to reveal valuable information to use to distinguish between pro-null evidence and ambiguous study results. In this demonstration, it was valuable as a kind of centrifuge – filtering the studies into different ‘weight’ categories based on the evidence from the results, which helps us determine which studies should be replicated first. The Bayesian reanalysis can be conducted relatively easily for most interested users with the statistical software R, using the code we have provided on our OSF page https://doi.org/10.17605/OSF.IO/3RF8B , to reduce the amount of potential replication targets, allowing individuals to direct their resources in a manner based on a justifiable and systematic method.

In this paper, we have chosen to have our statistical considerations be guided by the strength of evidence for the existence of an effect. Strong evidence can result from a large sample drawn from a relatively modest true effect, or a modest sample drawn from a large true effect. Other criteria are conceivable, such as those based on the precision of the effect size estimate.

A final important consideration for the reader concerns the role of publication bias in the pool of potential targets, and therefore final target selection. The work of Etz and Vandekerckhove ( 2016 ) suggests that if one were to take all studies as the possible pool of targets (that is, take publication bias into account), the average effect size will be smaller, and, presumably, the pool of viable targets much larger. Although their results suggest that an estimate of average strength of evidence based on published results is an overestimate, under the assumptions that (1) a single study has not been replicated many times in the same lab and only the most compelling result reported; and (2) a single study has not been duplicated exactly somewhere else in the world but was never reported; the reported test statistic can be safely reanalyzed in the way we have in our paper.

Aside from this, to date over 200 academic journals use the registered report format (for an up-to-date figure, see https://cos.io/rr/ ), and the number is steadily climbing. We consider it likely that as time passes and more people take advantage of this submission format, publication bias prevalence will decrease.

We would like to stress that the articles discussed in detail in this study were selected for illustration purposes only. The demonstration serves as proof of concept, and by no means aims to criticize specific studies or question their veracity. In fact, one of the three articles has two OSF badges (for more information see https://cos.io/our-services/open-science-badges-details/ ): one for open data, and one for open materials, indicating that the authors have made their data and study materials openly available on their project’s OSF page. One of the other articles has the badge for open materials. The third article has provided access to their study materials in a supplemental folder available on the Psychological Science website. Such a commitment to transparent scientific practices are associated with research that is of higher quality, and therefore likely to be more reproducible (see the OSF badge page: https://cos.io/our-services/open-science-badges-details/ for a discussion).

The current debate over poor reproducibility in psychology has led to a number of new ideas for how to improve our research going forward. Increased numbers of replication studies is one such advance, which has been taken up wholeheartedly by many concerned researchers. While such an initiative marks a positive and constructive move toward remedying a serious problem in our field, it is neither efficient nor useful to replicate results randomly. In this article, we have argued for and demonstrated an approach which is methodical and systematic, supplemented by careful and defensible qualitative analysis toward the selection of replication targets.

The approach we advocate and apply in this article can be simple and relatively fast to conduct, and affords the user access to important information about the strength of evidence contained in a published study. Although efficient, this approach has the potential to maximize the impact of the outcomes of those replications, and minimize the waste of resources that could result from a haphazard approach to replication. Combining a quantitative reanalysis with a qualitative assessment process of a large group of potential replication targets in a simple approach such as the one presented in this paper, allows the information of multiple sources to prioritize replication targets, and can assist in refining the methodology of the replication study.

Table showing details of each reanalyzed result, and relevant information associated with each article. A full spreadsheet of all information can be found at the project’s OSF page https://doi.org/10.17605/OSF.IO/3RF8B .

ArticleResultAuthorsYearTDFOverall Np-value reportedBF(10)Evidence Tier
 Ding et al. 2015 4.42 40 42 <.001 283.12 
 Ding et al. 2015 3.49 40 42 <.001 27 
  Metcalfe et al. 2015 7.28 87 89 <.0001 106765637.21 
 Reinhart et al.    2.605 17 18 0.018 3.19 
 Fan et al. 2015 2.81 46 48 0.007 6.25 
 Fan et al. 2015 2.51 46 48 0.016 3.44 
 Schroeder et al. 2015 3.79 157 160 <.01 213 
 Mackey et al. 2015 4.4 56 58 0.0001 412.32 
 Mackey et al. 2015 4.7 56 58 <.0001 1030.14 
 Dai et al. 2015 2.47 214 216 0.01 5.05 
 Okonofua et al. 2015 4.06 23 25 <.001 139.62 
 Okonofua et al. 2015 –4.99 23 25 <.001 1158.7 
 Olson et al. 2015 4.3 16 17 0.001 126.81 
 Olson et al. 2015 3.89 29 30 0.001 114.57 
 Olson et al. 2015 6.75 29 30 <.001 151537.61 
 Yin et al. 2015 5.73 15 16 <.001 644.57 
 Yin et al. 2015 3.23 15 16 0.006 8.84 
 Yin et al. 2015 5.88 15 16 <.001 1646.23 
 Yin et al. 2015 2.59 15 16 0.021 
 Yin et al. 2015 2.84 15 16 0.012 9.07 
  Storm et al. 2015 3.23 19 20 0.004 10.21 
  Perilloux et al. 2015 7.49 482 484 <.001 18931144326.12 
  Porter et al. 2016 2.89 85 88 0.005 7.91 
  Skinner et al. 2016 4.25 66 67 <.001 297.55 
  Kirk et al. 2016 3.59 43.35 54 0.001 40.35 
 Cooney et al. 2016 3.76 29 30 0.001 83.98 
 Cooney et al. 2016 4.27 57 59 <.001 285.37 
 Cooney et al. 2016 6.83 149 150 <.001 42432905.55 
  Zhou et al. 2016 7.26 70 73 <.001 19638415.24 
 Saint-Aubin et al. 2016 6.02 34 35 <.0001 21066.77 
 Saint-Aubin et al. 2016 5.6 45 46 <.0001 13805.45 
 Li et al. 2016 4.08 22 24 0.0005 53.71 
 Li et al. 2016 3.86 26 28 0.00068 38.61 
  Sloman et al. 2016 –3.4 69 70 0.001 22.86 
 Picci et al. 2016 2.8 27 28 0.001 10.5 
 Picci et al. 2016 4.4 27 28 0.001 273.96 
 Picci et al. 2016 3.14 29 30 20.56 
  Madore et al. 2015 2.49 22 23 0.021 2.67 
 Reinhart et al.    2.318 17 18 0.033 1.99 
 Reinhart et al.    2.326 17 18 0.033 2.02 
 Reinhart et al.    2.263 17 18 0.04 1.83 
 Reinhart et al.    2.466 17 18 0.027 2.53 
 Schroeder et al. 2015 2.09 215 218 0.04 1.14 
 Dai et al. 2015 211 213 0.047 0.97 
 Yin et al. 2015 2.47 15 16 0.026 2.52 
 Kupor et al.    2.21 59 61 0.031 1.96 
 Kupor et al.    2.22 98 100 0.029 1.83 
 Kupor et al.    2.27 200 202 0.024 1.68 
  Farooqui et al. 2015 2.2 20 21 0.04 1.64 
  Olsson et al. 2016 2.44 97 100 0.02 2.85 
  Watson-Jones et al. 2016 2.05 86 88 0.043 1.38 
  Hung et al. 2016 –2.51 19 20 0.02 2.75 
 Hsee et al. 2016 2.35 17 20 <.031 2.37 
 Hsee et al. 2016 2.25 52 54 0.029 2.13 
  Constable et al. 2016 2.1 35 38 0.04 1.7 
  Chen et al. 2016 2.39 187 189 0.018 2.21 
 Picci et al. 2016 2.25 29 30 0.032 2.15 

The database including all article information and reanalyzed Bayes factors are available, along with the analysis and plot R scripts, on the project’s OSF page: https://doi.org/10.17605/OSF.IO/3RF8B .

For a more detailed primer on the Bayes factor, please see Appendix A in Field and colleagues ( 2016 ); for a full expose, see Etz and Vandekerckhove ( 2018 ).

We note that some of Mackey’s guidelines lead to subjective decisions about what is theoretically relevant and important. What may be theoretically important in one field, may not be worth investigating in another, and so it is vital to consider the context of a potential replication target, and root one’s judgments in quantifiable argumentation.

Bayes factors can show evidential strength in favor of an alternative hypothesis (denoted BF 10 ), or be inverted and show support for the null hypothesis (denoted BF 01 ). In this article, we only discuss Bayes factors in terms of their support of the alternative, and so refrain from using the specific subscript notation or verbal indication.

We originally planned to consider those articles with the smallest Bayes factors, however, as we discuss later, there are many results with similar Bayes factors (e.g., 1.64, 1.68 and 1.70), which makes that choice alone somewhat arbitrary.

More complicated approaches to handle the case of multiple studies in a single paper corroborating a certain claim in the manuscript exist, for instance through a Bayesian model-averaged meta-analysis.

We only target these articles to practically demonstrate how our approach can be used. We do not imply that they are of low veracity or that the results were obtained by questionable means.

Of course, the replication as described here would need to feature different risk-taking activities, as aged persons may be averse in general to activities such as skydiving.

The authors have no competing interests to declare.

DvR and RH conceived of the idea of reanalyzing Bayes factors to quantify evidential strength of original article results; SMF conceived of the qualitative analysis, and overall process

SMF extracted all article information which formed the data file analyzed in the study

DvR wrote the code for the reanalysis phase. SMF analyzed and interpreted the findings derived from it; DvR, RH and LB refined the interpretations and plots for the final manuscript

SMF drafted the article; SMF, DvR, RH and LB further revised it

SMF approved the submitted version for publication

The author(s) of this paper chose the Open Review option, and the peer review comments are available at: http://doi.org/10.1525/collabra.218.pr

Recipient(s) will receive an email with a link to 'When and Why to Replicate: As Easy as 1, 2, 3?' and will not need an account to access the content.

Subject: When and Why to Replicate: As Easy as 1, 2, 3?

(Optional message may have a maximum of 1000 characters.)

Citing articles via

Email alerts, affiliations.

  • Recent Content
  • Special Collections
  • All Content
  • Submission Guidelines
  • Publication Fees
  • Journal Policies
  • Editorial Team
  • Online ISSN 2474-7394
  • Copyright © 2024

Stay Informed

Disciplines.

  • Ancient World
  • Anthropology
  • Communication
  • Criminology & Criminal Justice
  • Film & Media Studies
  • Food & Wine
  • Browse All Disciplines
  • Browse All Courses
  • Book Authors
  • Booksellers
  • Instructions
  • Journal Authors
  • Journal Editors
  • Media & Journalists
  • Planned Giving

About UC Press

  • Press Releases
  • Seasonal Catalog
  • Acquisitions Editors
  • Customer Service
  • Exam/Desk Requests
  • Media Inquiries
  • Print-Disability
  • Rights & Permissions
  • UC Press Foundation
  • © Copyright 2024 by the Regents of the University of California. All rights reserved. Privacy policy    Accessibility

This Feature Is Available To Subscribers Only

Sign In or Create an Account

Why is Replication in Research Important?

Replication in research is important because it allows for the verification and validation of study findings, building confidence in their reliability and generalizability. It also fosters scientific progress by promoting the discovery of new evidence, expanding understanding, and challenging existing theories or claims.

Updated on June 30, 2023

researchers replicating a study

Often viewed as a cornerstone of science , replication builds confidence in the scientific merit of a study’s results. The philosopher Karl Popper argued that, “we do not take even our own observations quite seriously, or accept them as scientific observations, until we have repeated and tested them.”

As such, creating the potential for replication is a common goal for researchers. The methods section of scientific manuscripts is vital to this process as it details exactly how the study was conducted. From this information, other researchers can replicate the study and evaluate its quality.

This article discusses replication as a rational concept integral to the philosophy of science and as a process validating the continuous loop of the scientific method. By considering both the ethical and practical implications, we may better understand why replication is important in research.

What is replication in research?

As a fundamental tool for building confidence in the value of a study’s results, replication has power. Some would say it has the power to make or break a scientific claim when, in reality, it is simply part of the scientific process, neither good nor bad.

When Nosek and Errington propose that replication is a study for which any outcome would be considered diagnostic evidence about a claim from prior research, they revive its neutrality. The true purpose of replication, therefore, is to advance scientific discovery and theory by introducing new evidence that broadens the current understanding of a given question.

Why is replication important in research?

The great philosopher and scientist, Aristotle , asserted that a science is possible if and only if there are knowable objects involved. There cannot be a science of unicorns, for example, because unicorns do not exist. Therefore, a ‘science’ of unicorns lacks knowable objects and is not a ‘science’.

This philosophical foundation of science perfectly illustrates why replication is important in research. Basically, when an outcome is not replicable, it is not knowable and does not truly exist. Which means that each time replication of a study or a result is possible, its credibility and validity expands.

The lack of replicability is just as vital to the scientific process. It pushes researchers in new and creative directions, compelling them to continue asking questions and to never become complacent. Replication is as much a part of the scientific method as formulating a hypothesis or making observations.

Types of replication

Historically, replication has been divided into two broad categories: 

  • Direct replication : performing a new study that follows a previous study’s original methods and then comparing the results. While direct replication follows the protocols from the original study, the samples and conditions, time of day or year, lab space, research team, etc. are necessarily different. In this way, a direct replication uses empirical testing to reflect the prevailing beliefs about what is needed to produce a particular finding.
  • Conceptual replication : performing a study that employs different methodologies to test the same hypothesis as an existing study. By applying diverse manipulations and measures, conceptual replication aims to operationalize a study’s underlying theoretical variables. In doing so, conceptual replication promotes collaborative research and explanations that are not based on a single methodology.

Though these general divisions provide a helpful starting point for both conducting and understanding replication studies, they are not polar opposites. There are nuances that produce countless subcategories such as:

  • Internal replication : when the same research team conducts the same study while taking negative and positive factors into account
  • Microreplication : conducting partial replications of the findings of other research groups
  • Constructive replication : both manipulations and measures are varied
  • Participant replication : changes only the participants

Many researchers agree these labels should be confined to study design, as direction for the research team, not a preconceived notion. In fact, Nosek and Errington conclude that distinctions between “direct” and “conceptual” are at least irrelevant and possibly counterproductive for understanding replication and its role in advancing knowledge.

How do researchers replicate a study?

Like all research studies, replication studies require careful planning. The Open Science Framework (OSF) offers a practical guide which details the following steps:

  • Identify a study that is feasible to replicate given the time, expertise, and resources available to the research team.
  • Determine and obtain the materials used in the original study.
  • Develop a plan that details the type of replication study and research design intended.
  • Outline and implement the study’s best practices.
  • Conduct the replication study, analyze the data, and share the results.

These broad guidelines are expanded in Brown’s and Wood’s article , “Which tests not witch hunts: a diagnostic approach for conducting replication research.” Their findings are further condensed by Brown into a blog outlining four main procedural categories:

  • Assumptions : identifying the contextual assumptions of the original study and research team
  • Data transformations : using the study data to answer questions about data transformation choices by the original team
  • Estimation : determining if the most appropriate estimation methods were used in the original study and if the replication can benefit from additional methods
  • Heterogeneous outcomes : establishing whether the data from an original study lends itself to exploring separate heterogeneous outcomes

At the suggestion of peer reviewers from the e-journal Economics, Brown elaborates with a discussion of what not to do when conducting a replication study that includes:

  • Do not use critiques of the original study’s design as  a basis for replication findings.
  • Do not perform robustness testing before completing a direct replication study.
  • Do not omit communicating with the original authors, before, during, and after the replication.
  • Do not label the original findings as errors solely based on different outcomes in the replication.

Again, replication studies are full blown, legitimate research endeavors that acutely contribute to scientific knowledge. They require the same levels of planning and dedication as any other study.

What happens when replication fails?

There are some obvious and agreed upon contextual factors that can result in the failure of a replication study such as: 

  • The detection of unknown effects
  • Inconsistencies in the system
  • The inherent nature of complex variables
  • Substandard research practices
  • Pure chance

While these variables affect all research studies, they have particular impact on replication as the outcomes in question are not novel but predetermined.

The constant flux of contexts and variables makes assessing replicability, determining success or failure, very tricky. A publication from the National Academy of Sciences points out that replicability is obtaining consistent , not identical, results across studies aimed at answering the same scientific question. They further provide eight core principles that are applicable to all disciplines.

While there is no straightforward criteria for determining if a replication is a failure or a success, the National Library of Science and the Open Science Collaboration suggest asking some key questions, such as:

  • Does the replication produce a statistically significant effect in the same direction as the original?
  • Is the effect size in the replication similar to the effect size in the original?
  • Does the original effect size fall within the confidence or prediction interval of the replication?
  • Does a meta-analytic combination of results from the original experiment and the replication yield a statistically significant effect?
  • Do the results of the original experiment and the replication appear to be consistent?

While many clearly have an opinion about how and why replication fails, it is at best a null statement and at worst an unfair accusation. It misses the point, sidesteps the role of replication as a mechanism to further scientific endeavor by presenting new evidence to an existing question.

Can the replication process be improved?

The need to both restructure the definition of replication to account for variations in scientific fields and to recognize the degrees of potential outcomes when comparing the original data, comes in response to the replication crisis . Listen to this Hidden Brain podcast from NPR for an intriguing case study on this phenomenon.

Considered academia’s self-made disaster, the replication crisis is spurring other improvements in the replication process. Most broadly, it has prompted the resurgence and expansion of metascience , a field with roots in both philosophy and science that is widely referred to as "research on research" and "the science of science." By holding a mirror up to the scientific method, metascience is not only elucidating the purpose of replication but also guiding the rigors of its techniques.

Further efforts to improve replication are threaded throughout the industry, from updated research practices and study design to revised publication practices and oversight organizations, such as:

  • Requiring full transparency of the materials and methods used in a study
  • Pushing for statistical reform , including redefining the significance of the p-value
  • Using pre registration reports that present the study’s plan for methods and analysis
  • Adopting result-blind peer review allowing journals to accept a study based on its methodological design and justifications, not its results
  • Founding organizations like the EQUATOR Network that promotes transparent and accurate reporting

Final thoughts

In the realm of scientific research, replication is a form of checks and balances. Neither the probability of a finding nor prominence of a scientist makes a study immune to the process.

And, while a single replication does not validate or nullify the original study’s outcomes, accumulating evidence from multiple replications does boost the credibility of its claims. At the very least, the findings offer insight to other researchers and enhance the pool of scientific knowledge.

After exploring the philosophy and the mechanisms behind replication, it is clear that the process is not perfect, but evolving. Its value lies within the irreplaceable role it plays in the scientific method. Replication is no more or less important than the other parts, simply necessary to perpetuate the infinite loop of scientific discovery.

Charla Viera, MS

See our "Privacy Policy"

The rest of this article below this point is a bit more technical. Therefore, you do not have to read it if you don't want to. However, if you want t\ o read it, please do!

Warning: The NCBI web site requires JavaScript to function. more...

U.S. flag

An official website of the United States government

The .gov means it's official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you're on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Browse Titles

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

National Academies of Sciences, Engineering, and Medicine; Policy and Global Affairs; Committee on Science, Engineering, Medicine, and Public Policy; Board on Research Data and Information; Division on Engineering and Physical Sciences; Committee on Applied and Theoretical Statistics; Board on Mathematical Sciences and Analytics; Division on Earth and Life Studies; Nuclear and Radiation Studies Board; Division of Behavioral and Social Sciences and Education; Committee on National Statistics; Board on Behavioral, Cognitive, and Sensory Sciences; Committee on Reproducibility and Replicability in Science. Reproducibility and Replicability in Science. Washington (DC): National Academies Press (US); 2019 May 7.

Cover of Reproducibility and Replicability in Science

Reproducibility and Replicability in Science.

  • Hardcopy Version at National Academies Press

3 Understanding Reproducibility and Replicability

In 2013, the cover story of The Economist, “How Science Goes Wrong,” brought public attention to issues of reproducibility and replicability across science and engineering. In this chapter, we discuss how the practice of science has evolved and how these changes have introduced challenges to reproducibility and replicability. Because the terms reproducibility and replicability are used differently across different scientific disciplines, introducing confusion to a complicated set of challenges and solutions, the committee also details its definitions and highlights the scope and expression of the problems of non-reproducibility and non-replicability across science and engineering research.
  • THE EVOLVING PRACTICES OF SCIENCE

Scientific research has evolved from an activity mainly undertaken by individuals operating in a few locations to many teams, large communities, and complex organizations involving hundreds to thousands of individuals worldwide. In the 17th century, scientists would communicate through letters and were able to understand and assimilate major developments across all the emerging major disciplines. In 2016—the most recent year for which data are available—more than 2,295,000 scientific and engineering research articles were published worldwide ( National Science Foundation, 2018e ). In addition, the number of scientific and engineering fields and subfields of research is large and has greatly expanded in recent years, especially in fields that intersect disciplines (e.g., biophysics); more than 230 distinct fields and subfields can now be identified. The published literature is so voluminous and specialized that some researchers look to information retrieval, machine learning, and artificial intelligence techniques to track and apprehend the important work in their own fields.

Another major revolution in science came with the recent explosion of the availability of large amounts of data in combination with widely available and affordable computing resources. These changes have transformed many disciplines, enabled important scientific discoveries, and led to major shifts in science. In addition, the use of statistical analysis of data has expanded, and many disciplines have come to rely on complex and expensive instrumentation that generates and can automate analysis of large digital datasets.

Large-scale computation has been adopted in fields as diverse as astronomy, genetics, geoscience, particle physics, and social science, and has added scope to fields such as artificial intelligence. The democratization of data and computation has created new ways to conduct research; in particular, large-scale computation allows researchers to do research that was not possible a few decades ago. For example, public health researchers mine large databases and social media, searching for patterns, while earth scientists run massive simulations of complex systems to learn about the past, which can offer insight into possible future events.

Another change in science is an increased pressure to publish new scientific discoveries in prestigious and what some consider high-impact journals, such as Nature and Science. 1 This pressure is felt worldwide, across disciplines, and by researchers at all levels but is perhaps most acute for researchers at the beginning of their scientific careers who are trying to establish a strong scientific record to increase their chances of obtaining tenure at an academic institution and grants for future work. Tenure decisions have traditionally been made on the basis of the scientific record (i.e., published articles of important new results in a field) and have given added weight to publications in more prestigious journals. Competition for federal grants, a large source of academic research funding, is intense as the number of applicants grows at a rate higher than the increase in federal research budgets. These multiple factors create incentives for researchers to overstate the importance of their results and increase the risk of bias—either conscious or unconscious—in data collection, analysis, and reporting.

In the context of these dynamic changes, the questions and issues related to reproducibility and replicability remain central to the development and evolution of science. How should studies and other research approaches be designed to efficiently generate reliable knowledge? How might hypotheses and results be better communicated to allow others to confirm, refute, or build on them? How can the potential biases of scientists themselves be understood, identified, and exposed in order to improve accuracy in the generation and interpretation of research results? How can intentional misrepresentation and fraud be detected and eliminated? 2

Researchers have proposed approaches to answering some of the questions over the past decades. As early as the 1960s, Jacob Cohen surveyed psychology articles from the perspective of statistical power to detect effect sizes, an approach that launched many subsequent power surveys (also known as meta-analyses) in the social sciences in subsequent years ( Cohen, 1988 ).

Researchers in biomedicine have been focused on threats to validity of results since at least the 1970s. In response to the threat, biomedical researchers developed a wide variety of approaches to address the concern, including an emphasis on randomized experiments with masking (also known as blinding), reliance on meta-analytic summaries over individual trial results, proper sizing and power of experiments, and the introduction of trial registration and detailed experimental protocols. Many of the same approaches have been proposed to counter shortcomings in reproducibility and replicability.

Reproducibility and replicability as they relate to data and computation-intensive scientific work received attention as the use of computational tools expanded. In the 1990s, Jon Claerbout launched the “reproducible research movement,” brought on by the growing use of computational workflows for analyzing data across a range of disciplines ( Claerbout and Karrenbach, 1992 ). Minor mistakes in code can lead to serious errors in interpretation and in reported results; Claerbout's proposed solution was to establish an expectation that data and code will be openly shared so that results could be reproduced. The assumption was that reanalysis of the same data using the same methods would produce the same results.

In the 2000s and 2010s, several high-profile journal and general media publications focused on concerns about reproducibility and replicability (see, e.g., Ioannidis, 2005 ; Baker, 2016 ), including the cover story in The Economist ( “How Science Goes Wrong,” 2013 ) noted above. These articles introduced new concerns about the availability of data and code and highlighted problems of publication bias, selective reporting, and misaligned incentives that cause positive results to be favored for publication over negative or nonconfirmatory results. 3 Some news articles focused on issues in biomedical research and clinical trials, which were discussed in the general media partly as a result of lawsuits and settlements over widely used drugs ( Fugh-Berman, 2010 ).

Many publications about reproducibility and replicability have focused on the lack of data, code, and detailed description of methods in individual studies or a set of studies. Several attempts have been made to assess non-reproducibility or non-replicability within a field, particularly in social sciences (e.g., Camerer et al., 2018 ; Open Science Collaboration, 2015 ). In Chapters 4 , 5 , and 6 , we review in more detail the studies, analyses, efforts to improve, and factors that affect the lack of reproducibility and replicability. Before that discussion, we must clearly define these terms.

  • DEFINING REPRODUCIBILITY AND REPLICABILITY

Different scientific disciplines and institutions use the words reproducibility and replicability in inconsistent or even contradictory ways: What one group means by one word, the other group means by the other word. 4 These terms—and others, such as repeatability—have long been used in relation to the general concept of one experiment or study confirming the results of another. Within this general concept, however, no terminologically consistent way of drawing distinctions has emerged; instead, conflicting and inconsistent terms have flourished. The difficulties in assessing reproducibility and replicability are complicated by this absence of standard definitions for these terms.

In some fields, one term has been used to cover all related concepts: for example, “replication” historically covered all concerns in political science ( King, 1995 ). In many settings, the terms reproducible and replicable have distinct meanings, but different communities adopted opposing definitions ( Claerbout and Karrenbach, 1992 ; Peng et al., 2006 ; Association for Computing Machinery, 2018 ). Some have added qualifying terms, such as methods reproducibility, results reproducibility, and inferential reproducibility to the lexicon ( Goodman et al., 2016 ). In particular, tension has emerged between the usage recently adopted in computer science and the way that researchers in other scientific disciplines have described these ideas for years ( Heroux et al., 2018 ).

In the early 1990s, investigators began using the term “reproducible research” for studies that provided a complete digital compendium of data and code to reproduce their analyses, particularly in the processing of seismic wave recordings ( Claerbout and Karrenbach, 1992 ; Buckheit and Donoho, 1995 ). The emphasis was on ensuring that a computational analysis was transparent and documented so that it could be verified by other researchers. While this notion of reproducibility is quite different from situations in which a researcher gathers new data in the hopes of independently verifying previous results or a scientific inference, some scientific fields use the term reproducibility to refer to this practice. Peng et al. (2006 , p. 783) referred to this scenario as “replicability,” noting: “Scientific evidence is strengthened when important results are replicated by multiple independent investigators using independent data, analytical methods, laboratories, and instruments.” Despite efforts to coalesce around the use of these terms, lack of consensus persists across disciplines. The resulting confusion is an obstacle in moving forward to improve reproducibility and replicability ( Barba, 2018 ).

In a review paper on the use of the terms reproducibility and replicability, Barba (2018) outlined three categories of usage, which she characterized as A, B1, and B2:

A: The terms are used with no distinction between them. B1: “Reproducibility” refers to instances in which the original researcher's data and computer codes are used to regenerate the results, while “replicability” refers to instances in which a researcher collects new data to arrive at the same scientific findings as a previous study. B2: “Reproducibility” refers to independent researchers arriving at the same results using their own data and methods, while “replicability” refers to a different team arriving at the same results using the original author's artifacts.

B1 and B2 are in opposition of each other with respect to which term involves reusing the original authors' digital artifacts of research (“research compendium”) and which involves independently created digital artifacts. Barba (2018) collected data on the usage of these terms across a variety of disciplines (see Table 3-1 ). 5

TABLE 3-1. Usage of the Terms Reproducibility and Replicability by Scientific Discipline.

Usage of the Terms Reproducibility and Replicability by Scientific Discipline.

The terminology adopted by the Association for Computing Machinery (ACM) for computer science was published in 2016 as a system for badges attached to articles published by the society. The ACM declared that its definitions were inspired by the metrology vocabulary, and it associated using an original author's digital artifacts to “replicability,” and developing completely new digital artifacts to “reproducibility.” These terminological distinctions contradict the usage in computational science, where reproducibility is associated with transparency and access to the author's digital artifacts, and also with social sciences, economics, clinical studies, and other domains, where replication studies collect new data to verify the original findings.

Regardless of the specific terms used, the underlying concepts have long played essential roles in all scientific disciplines. These concepts are closely connected to the following general questions about scientific results:

  • Are the data and analysis laid out with sufficient transparency and clarity that the results can be checked ?
  • If checked, do the data and analysis offered in support of the result in fact support that result?
  • If the data and analysis are shown to support the original result, can the result reported be found again in the specific study context investigated?
  • Finally, can the result reported or the inference drawn be found again in a broader set of study contexts ?

Computational scientists generally use the term reproducibility to answer just the first question—that is, reproducible research is research that is capable of being checked because the data, code, and methods of analysis are available to other researchers. The term reproducibility can also be used in the context of the second question: research is reproducible if another researcher actually uses the available data and code and obtains the same results. The difference between the first and the second questions is one of action by another researcher; the first refers to the availability of the data, code, and methods of analysis, while the second refers to the act of recomputing the results using the available data, code, and methods of analysis.

In order to answer the first and second questions, a second researcher uses data and code from the first; no new data or code are created by the second researcher. Reproducibility depends only on whether the methods of the computational analysis were transparently and accurately reported and whether that data, code, or other materials were used to reproduce the original results. In contrast, to answer question three, a researcher must redo the study, following the original methods as closely as possible and collecting new data. To answer question four, a researcher could take a variety of paths: choose a new condition of analysis, conduct the same study in a new context, or conduct a new study aimed at the same or similar research question.

For the purposes of this report and with the aim of defining these terms in ways that apply across multiple scientific disciplines, the committee has chosen to draw the distinction between reproducibility and replicability between the second and third questions. Thus, reproducibility includes the act of a second researcher recomputing the original results, and it can be satisfied with the availability of data, code, and methods that makes that recomputation possible. This definition of reproducibility refers to the transparency and reproducibility of computations: that is, it is synonymous with “computational reproducibility,” and we use the terms interchangeably in this report.

When a new study is conducted and new data are collected, aimed at the same or a similar scientific question as a previous one, we define it as a replication. A replication attempt might be conducted by the same investigators in the same lab in order to verify the original result, or it might be conducted by new investigators in a new lab or context, using the same or different methods and conditions of analysis. If this second study, aimed at the same scientific question but collecting new data, finds consistent results or can draw consistent conclusions, the research is replicable. If a second study explores a similar scientific question but in other contexts or populations that differ from the original one and finds consistent results, the research is “generalizable.” 6

In summary, after extensive review of the ways these terms are used by different scientific communities, the committee adopted specific definitions for this report.

CONCLUSION 3-1: For this report, reproducibility is obtaining consistent results using the same input data; computational steps, methods, and code; and conditions of analysis. This definition is synonymous with “computational reproducibility,” and the terms are used interchangeably in this report.

Replicability is obtaining consistent results across studies aimed at answering the same scientific question, each of which has obtained its own data.

Two studies may be considered to have replicated if they obtain consistent results given the level of uncertainty inherent in the system under study. In studies that measure a physical entity (i.e., a measurand), the results may be the sets of measurements of the same measurand obtained by different laboratories. In studies aimed at detecting an effect of an intentional intervention or a natural event, the results may be the type and size of effects found in different studies aimed at answering the same question. In general, whenever new data are obtained that constitute the results of a study aimed at answering the same scientific question as another study, the degree of consistency of the results from the two studies constitutes their degree of replication.

Two important constraints on the replicability of scientific results rest in limits to the precision of measurement and the potential for altered results due to sometimes subtle variation in the methods and steps performed in a scientific study. We expressly consider both here, as they can each have a profound influence on the replicability of scientific studies.

  • PRECISION OF MEASUREMENT

Virtually all scientific observations involve counts, measurements, or both. Scientific measurements may be of many different kinds: spatial dimensions (e.g., size, distance, and location), time, temperature, brightness, colorimetric properties, electromagnetic properties, electric current, material properties, acidity, and concentration, to name a few from the natural sciences. The social sciences are similarly replete with counts and measures. With each measurement comes a characterization of the margin of doubt, or an assessment of uncertainty ( Possolo and Iyer, 2017 ). Indeed, it may be said that measurement, quantification, and uncertainties are core features of scientific studies.

One mark of progress in science and engineering has been the ability to make increasingly exact measurements on a widening array of objects and phenomena. Many of the things taken for granted in the modern world, from mechanical engines to interchangeable parts to smartphones, are possible only because of advances in the precision of measurement over time ( Winchester, 2018 ).

The concept of precision refers to the degree of closeness in measurements. As the unit used to measure distance, for example, shrinks from meter to centimeter to millimeter and so on down to micron, nanometer, and angstrom, the measurement unit becomes more exact and the proximity of one measurand to a second can be determined more precisely.

Even when scientists believe a quantity of interest is constant, they recognize that repeated measurement of that quantity may vary because of limits in the precision of measurement technology. It is useful to note that precision is different from the accuracy of a measurement system, as shown in Figure 3-1 , demonstrating the differences using an archery target containing three arrows.

Accuracy and precision of a measurement. NOTE: See text for discussion. SOURCE: Chemistry LibreTexts. Available: https://chem.libretexts.org/Bookshelves/Introductory_Chemistry/Book%3A_IntroductoryChemistry_(CK-12)/03%3A_Measurements/3.12%3A_Accuracy_and_Precision. (more...)

In Figure 3-1 , A, the three arrows are in the outer ring, not close together and not close to the bull's eye, illustrating low accuracy and low precision (i.e., the shots have not been accurate and are not highly precise). In B, the arrows are clustered in a tight band in an outer ring, illustrating low accuracy and high precision (i.e., the shots have been more precise, but not accurate). The other two figures similarly illustrate high accuracy and low precision (C) and high accuracy and high precision (D).

It is critical to keep in mind that the accuracy of a measurement can be judged only in relation to a known standard of truth. If the exact location of the bull's eye is unknown, one must not presume that a more precise set of measures is necessarily more accurate; the results may simply be subject to a more consistent bias, moving them in a consistent way in a particular direction and distance from the true target.

It is often useful in science to describe quantitatively the central tendency and degree of dispersion among a set of repeated measurements of the same entity and to compare one set of measurements with a second set. When a set of measurements is repeated by the same operator using the same equipment under constant conditions and close in time, metrologists refer to the proximity of these measurements to one another as measurement repeatability (see Box 3-1 ). When one is interested in comparing the degree to which the set of measurements obtained in one study are consistent with the set of measurements obtained in a second study, the committee characterizes this as a test of replicability because it entails the comparison of two studies aimed at the same scientific question where each obtained its own data.

Terms Used in Metrology and How They Differ from the Committee's Definitions.

Consider, for example, the set of measurements of the physical constant obtained over time by a number of laboratories (see Figure 3-2 ). For each laboratory's results, the figure depicts the mean observation (i.e., the central tendency) and standard error of the mean, indicated by the error bars. The standard error is an indicator of the precision of the obtained measurements, where a smaller standard error represents higher precision. In comparing the measurements obtained by the different laboratories, notice that both the mean values and the degrees of precision (as indicated by the width of the error bars) may differ from one set of measurements to another.

Evolution of scientific understanding of the fine structure constant over time. NOTES: Error bars indicate the experimental uncertainty of each measurement. See text for discussion. SOURCE: Reprinted figure with permission from Peter J. Mohr, David B. (more...)

We may now ask what is a central question for this study: How well does a second set of measurements (or results) replicate a first set of measurements (or results)? Answering this question, we suggest, may involve three components:

proximity of the mean value (central tendency) of the second set relative to the mean value of the first set, measured both in physical units and relative to the standard error of the estimate

similitude in the degree of dispersion in observed values about the mean in the second set relative to the first set

likelihood that the second set of values and the first set of values could have been drawn from the same underlying distribution

Depending on circumstances, one or another of these components could be more salient for a particular purpose. For example, two sets of measures could have means that are very close to one another in physical units, yet each were sufficiently precisely measured as to be very unlikely to be different by chance. A second comparison may find means are further apart, yet derived from more widely dispersed sets of observations, so that there is a higher likelihood that the difference in means could have been observed by chance. In terms of physical proximity, the first comparison is more closely replicated. In terms of the likelihood of being derived from the same underlying distribution, the second set is more highly replicated.

A simple visual inspection of the means and standard errors for measurements obtained by different laboratories may be sufficient for a judgment about their replicability. For example, in Figure 3-2 , it is evident that the bottom two measurement results have relatively tight precision and means that are nearly identical, so it seems reasonable these can be considered to have replicated one another. It is similarly evident that results from LAMPF (second from the top of reported measurements with a mean value and error bars in Figure 3-2 ) are better replicated by results from LNE-01 (fourth from top) than by measurements from NIST-89 (sixth from top). More subtle may be judging the degree of replication when, for example, one set of measurements has a relatively wide range of uncertainty compared to another. In Figure 3-2 , the uncertainty range from NPL-88 (third from top) is relatively wide and includes the mean of NIST-97 (seventh from top); however, the narrower uncertainty range for NIST-97 does not include the mean from NPL-88. Especially in such cases, it is valuable to have a systematic, quantitative indicator of the extent to which one set of measurements may be said to have replicated a second set of measurements, and a consistent means of quantifying the extent of replication can be useful in all cases.

  • VARIATIONS IN METHODS EMPLOYED IN A STUDY

When closely scrutinized, a scientific study or experiment may be seen to entail hundreds or thousands of choices, many of which are barely conscious or taken for granted. In the laboratory, exactly what size of Erlenmeyer flask is used to mix a set of reagents? At what exact temperature were the reagents stored? Was a drying agent such as acetone used on the glassware? Which agent and in what amount and exact concentration? Within what tolerance of error are the ingredients measured? When ingredient A was combined with ingredient B, was the flask shaken or stirred? How vigorously and for how long? What manufacturer of porcelain filter was used? If conducting a field survey, how exactly, were the subjects selected? Are the interviews conducted by computer or over the phone or in person? Are the interviews conducted by female or male, young or old, the same or different race as the interviewee? What is the exact wording of a question? If spoken, with what inflection? What is the exact sequence of questions? Without belaboring the point, we can say that many of the exact methods employed in a scientific study may or may not be described in the methods section of a publication. An investigator may or may not realize when a possible variation could be consequential to the replicability of results.

In a later section, we will deal more generally with sources of non-replicability in science (see Chapter 5 and Box 5-2 ). Here, we wish to emphasize that countless subtle variations in the methods, techniques, sequences, procedures, and tools employed in a study may contribute in unexpected ways to differences in the obtained results (see Box 3-2 ).

Data Collection, Cleaning, and Curation.

Finally, note that a single scientific study may entail elements of the several concepts introduced and defined in this chapter, including computational reproducibility, precision in measurement, replicability, and generalizability or any combination of these. For example, a large epidemiological survey of air pollution may entail portable, personal devices to measure various concentrations in the air (subject to precision of measurement), very large datasets to analyze (subject to computational reproducibility), and a large number of choices in research design, methods, and study population (subject to replicability and generalizability).

  • RIGOR AND TRANSPARENCY

The committee was asked to “make recommendations for improving rigor and transparency in scientific and engineering research” (refer to Box 1-1 in Chapter 1 ). In response to this part of our charge, we briefly discuss the meanings of rigor and of transparency below and relate them to our topic of reproducibility and replicability.

Rigor is defined as “the strict application of the scientific method to ensure robust and unbiased experimental design” ( National Institutes of Health, 2018e ). Rigor does not guarantee that a study will be replicated, but conducting a study with rigor—with a well-thought-out plan and strict adherence to methodological best practices—makes it more likely. One of the assumptions of the scientific process is that rigorously conducted studies “and accurate reporting of the results will enable the soundest decisions” and that a series of rigorous studies aimed at the same research question “will offer successively ever-better approximations to the truth” ( Wood et al., 2019 , p. 311). Practices that indicate a lack of rigor, including poor study design, errors or sloppiness, and poor analysis and reporting, contribute to avoidable sources of non-replicability (see Chapter 5 ). Rigor affects both reproducibility and replicability.

Transparency has a long tradition in science. Since the advent of scientific reports and technical conferences, scientists have shared details about their research, including study design, materials used, details of the system under study, operationalization of variables, measurement techniques, uncertainties in measurement in the system under study, and how data were collected and analyzed. A transparent scientific report makes clear whether the study was exploratory or confirmatory, shares information about what measurements were collected and how the data were prepared, which analyses were planned and which were not, and communicates the level of uncertainty in the result (e.g., through an error bar, sensitivity analysis, or p- value). Only by sharing all this information might it be possible for other researchers to confirm and check the correctness of the computations, attempt to replicate the study, and understand the full context of how to interpret the results. Transparency of data, code, and computational methods is directly linked to reproducibility, and it also applies to replicability. The clarity, accuracy, specificity, and completeness in the description of study methods directly affects replicability.

FINDING 3-1: In general, when a researcher transparently reports a study and makes available the underlying digital artifacts, such as data and code, the results should be computationally reproducible. In contrast, even when a study was rigorously conducted according to best practices, correctly analyzed, and transparently reported, it may fail to be replicated.

“High-impact” journals are viewed by some as those which possess high scores according to one of the several journal impact indicators such as Citescore, Scimago Journal Ranking (SJR), Source Normalized Impact per Paper (SNIP)—which are available in Scopus—and Journal Impact Factor (IF), Eigenfactor (EF), and Article Influence Score (AIC)—which can be obtained from the Journal Citation Report (JCR).

See Chapter 5 , Fraud and Misconduct, which further discusses the association between misconduct as a source of non-replicability, its frequency, and reporting by the media.

One such outcome became known as the “file drawer problem”: see Chapter 5 ; also see Rosenthal (1979) .

For the negative case, both “non-reproducible” and “irreproducible” are used in scientific work and are synonymous.

See also Heroux et al. (2018) for a discussion of the competing taxonomies between computational sciences (B1) and new definitions adopted in computer science (B2) and proposals for resolving the differences.

The committee definitions of reproducibility, replicability, and generalizability are consistent with the National Science Foundation's Social, Behavioral, and Economic Sciences Perspectives on Robust and Reliable Science ( Bollen et al., 2015 ).

  • Cite this Page National Academies of Sciences, Engineering, and Medicine; Policy and Global Affairs; Committee on Science, Engineering, Medicine, and Public Policy; Board on Research Data and Information; Division on Engineering and Physical Sciences; Committee on Applied and Theoretical Statistics; Board on Mathematical Sciences and Analytics; Division on Earth and Life Studies; Nuclear and Radiation Studies Board; Division of Behavioral and Social Sciences and Education; Committee on National Statistics; Board on Behavioral, Cognitive, and Sensory Sciences; Committee on Reproducibility and Replicability in Science. Reproducibility and Replicability in Science. Washington (DC): National Academies Press (US); 2019 May 7. 3, Understanding Reproducibility and Replicability.
  • PDF version of this title (2.9M)

In this Page

Recent activity.

  • Understanding Reproducibility and Replicability - Reproducibility and Replicabil... Understanding Reproducibility and Replicability - Reproducibility and Replicability in Science

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

Connect with NLM

National Library of Medicine 8600 Rockville Pike Bethesda, MD 20894

Web Policies FOIA HHS Vulnerability Disclosure

Help Accessibility Careers

statistics

  • Open access
  • Published: 01 June 2016

Psychology, replication & beyond

  • Keith R. Laws 1  

BMC Psychology volume  4 , Article number:  30 ( 2016 ) Cite this article

15k Accesses

18 Citations

43 Altmetric

Metrics details

Modern psychology is apparently in crisis and the prevailing view is that this partly reflects an inability to replicate past findings. If a crisis does exists, then it is some kind of ‘chronic’ crisis, as psychologists have been censuring themselves over replicability for decades. While the debate in psychology is not new, the lack of progress across the decades is disappointing. Recently though, we have seen a veritable surfeit of debate alongside multiple orchestrated and well-publicised replication initiatives. The spotlight is being shone on certain areas and although not everyone agrees on how we should interpret the outcomes, the debate is happening and impassioned. The issue of reproducibility occupies a central place in our whig history of psychology.

In the parlance of Karl Popper, the notion of falsification is seductive – some seem to imagine that it identifies an act as opposed to a process . It often carries the misleading implication that hypotheses can be readily discarded in the face of something called a ‘failed’ replication. Popper [ 46 ] was quite transparent when he declared “… a few stray basic statements contradicting a theory will hardly induce us to reject it as falsified. We shall take it as falsified only if we discover a reproducible effect which refutes the theory . In other words, we only accept the falsification if a low level empirical hypothesis which describes such an effect is proposed and corroborated.” (p.203: my italics). Popper’s view might reassure those whose psychological models have recently come under scrutiny through replication initiatives. We cannot, nor should we, close the door on a hypothesis because a study fails to be replicated. The hypothesis is not nullified and ‘nay-saying’ alone is an insufficient response from scientists. Like Popper, we might expect a testable alternative hypothesis that attempts to account for the discrepancy across studies; and one that itself may be subject to testing rather than merely being ad hoc . In other words, a ‘failed’ replication is not, in itself, the answer to a question, but a further question.

Replication, replication, replication

At least two key types of replication exist: direct and conceptual. Conceptual replication generally refers to cases where researchers ‘tweak’ the methods of previous studies [ 43 ] and when successful, may be informative with regard to the boundaries and possible moderators of an effect. When a conceptual replication fails, however, fewer clear implications exist for the original study because of likely differences in procedure or stimuli and so on. For this reason, we have seen an increased weight given to direct replications.

How often do direct and conceptual replications occur in psychology? Screening 100 of the most-cited psychology journals since 1900, Makel, Plucker & Hegarty [ 40 ] found that approximately 1.6 % of all psychology articles used the term replication in the text. A further more detailed analysis of 500 randomly selected articles revealed that only 68 % using the term replication were actual replications. They calculated an overall replication rate of 1.07 % and Makel et al. [ 40 ] found that only 18 % of those were direct rather than conceptual replications.

The lack of replication in psychology is systemic and widespread, and particularly the bias against publishing direct replications. In their survey of social science journal editors , Neuliep & Crandall [ 42 ] found almost three quarters preferred to publish novel findings rather than replications. In a parallel survey of reviewers for social science journals, Neuliep & Crandall [ 43 ] found over half (54 %) stated a preference for new findings over replications. Indeed, reviewers stated that replications were “Not newsworthy” or even a “Waste of space”. By contrast, comments from natural science journal editors present a more varied picture, with comments ranging from “Replication without some novelty is not accepted” to “Replication is rarely an issue for us…since we publish them.” [ 39 ].

Despite an enduring historical abandonment of replication, the tide appears to be turning. Makel et al. [ 40 ] found that the replication rate after the year 2000 was 1.84 times higher than for the period between 1950 and 1999. In a more recent evolution, several large-scale direct replication projects have emerged during the past 2 years including: the Many Labs project [ 33 ]; a set of preregistered replications published in a special issue of Social Psychology (Edited by [ 44 ]); the Reproducibility Project of the Open Science Collaboration [ 45 ]; and the Pipeline Project by Schweinsberg et al. [ 50 ]. In two of these projects (Many Labs by [ 33 ]; Pipeline Project by [ 50 ]), a group of researchers replicated samples of studies, with each group replicating all studies. In the two remaining projects, a number of research groups each replicated one study, selected from a sample of studies (Registered Reports by [ 44 ]; Open Science Collaboration, [ 45 ]). Each project ensured that replications were sufficiently powered (typically in excess of 90 % -thus offering a very good probability of detecting true effects) and where possible, used the original materials and stimuli as provided by the original authors. It is worth considering each in more detail.

Many Labs involved 36 research groups across 12 countries who replicated 13 psychological studies in over 6,000 participants. Studies of classic and newer effects were selected partly because they had simple designs that could be adapted for online administration. Reassuringly perhaps, 10 of the 13 effects replicated consistently across 36 different samples with, of course, some variability in the effect size reported compared to the original studies – some smaller but also some larger. One effect received weak support. Only two studies consistently failed to replicate and both involved what are described as ‘ social priming’ phenomena. One study where ‘accidental’ exposure to a US flag resulted in increased conservatism amongst Americans [ 11 ]. Participants viewed four photos and were asked to just estimate the time-of-day in the photo – the US flag appeared in two photos. Following this, they completed an 8-item questionnaire assessing their views toward various political issues (e.g., abortion, gun control). In the second priming study, exposure to ‘money’ had resulted in endorsement of the current social system [ 12 ]. In this study, participants completed demographic questions against a background that showed a faint picture of US $100 bills or the same background but blurred. Each of these two priming experiments had a single significant p -value (out of 36 replications) and for flag priming, it was in the opposite direction to that expected.

Turning to the special issue of Social Psychology edited by Nosek & Lakens [ 44 ]. This contained a series of articles replicating important results in social psychology. Important was broadly defined as “…often cited, a topic of intense scholarly or public interest, a challenge to established theories), but should also have uncertain truth value (e.g., few confirmations, imprecise estimates of effect sizes).” One might euphemistically describe the studies as curios . The articles were first submitted as Registered Reports and reviewed prior to data collection, with authors being assured their findings would be published regardless of outcome, as long as they adhered to the registered protocol. Attempted replications included the “Romeo and Juliet effect” – does parental interference lead to increases in love and commitment (Original: [ 17 ]; Replication: Sinclair, Hood, & Wright, [ 53 ]), does experiencing physical warmth (warm therapeutic packs) increase judgments of interpersonal warmth (Original: [ 58 ]; Replication: Lynott, Corker, Wortman, Connell, Donnellan, Lucas, & O’Brien, [ 38 ]), does recalling unethical behavior lead participants to see the room as darker (Original: [ 3 ]; Replication: [ 10 ]); does physical cleanliness reduce the severity of moral judgments (original : [ 49 ]: [ 28 ]). In contrast to high replication rate of Many Labs , the Registered Reports replications failed to confirm the results in 10 of 13 studies.

In the largest crowdsourced effort to date, the OSC Reproducibility project involved 270 collaborators attempting to replicate 100 findings from 3 major psychology journals Psychological Science (PSCI), Journal of Personality and Social Psychology (JPSP), and Journal of Experimental Psychology: Learning, Memory, and Cognition (JEP: LMC). While 97 of 100 studies originally reported statistically significant results, only 36 % of the replications did so with a mean effect size of around half of that reported in the original studies.

All of the journals exhibited a large reduction of around 50 % in effect sizes, with replications from JPSP particularly affected - shrinking by 75 % from 0.29 to 0.07. The replicability in one domain of psychology (good or poor) in no way guarantees what will happen in another domain. One thing we know from this project, is that “…reproducibility was stronger in studies and journals representing cognitive psychology than social psychology topics. For example, combining across journals, 14 of 55 (25 %) of social psychology effects replicated by the P  < 0.05 criterion, whereas 21 of 42 (50 %) of cognitive psychology effects did so.” The reasons for such a difference are debatable, but provide no licence to either congratulate cognitive psychologists or berate social psychologists. Indeed, the authors paint a considered and faithful picture of what their findings mean when they conclude “…how many of the effects have we established are true? Zero. And how many of the effects have we established are false? Zero. Is this a limitation of the project design? No. It is the reality of doing science”. (Open Science Collaboration p.4716-7)

The studies that were not selected for replication are informative – they were described as “…deemed infeasible to replicate because of time, resources, instrumentation, dependence on historical events, or hard-to-access samples… [and some] required specialized samples (such as macaques or people with autism), resources (such as eye tracking machines or functional magnetic resonance imaging), or knowledge making them difficult to match with teams”. Thus, the main drivers of replication are often economic in terms of time, money and human investment. High cost studies are likely to remain castles in the air, leaving us with little insight about replicability rates in some areas such as functional imaging (e.g. [ 9 ]), clinical and health psychology (see Coyne, this issue), and neuropsychology.

The ‘ Pipeline project’ by Schweinsberg et al. [ 50 ] intentionally used a non-adversarial approach. They crowdsourced 25 research teams across various countries to replicate a series of 10 unpublished moral-judgment experiments from the lead author’s (Uhlmann) lab i.e., in the pipeline. This speaks directly to Lykken’s [ 37 ] proposal from nearly 50 years ago that “…ideally all experiments would be replicated before publication” although at that time, he deemed it ‘impractical’.

Pipeline replications included: the Bigot–misanthrope effect – whether participants judge a manager who selectively mistreats racial minorities as a more blameworthy person than a manager who mistreats all of his employees; Bad tipper effect - are people who leave a full tip, but entirely in pennies judged more negatively than someone who leaves less money, but in notes; the Burn-in-hell effect – do people perceive corporate executives as more likely to burn in hell than members of social categories defined by antisocial behaviour, such as vandals. Six of ten findings replicated across all of their replication criteria, one further finding replicated but with a significantly smaller effect size than the original, one finding replicated consistently in the original culture but not outside of it ( bad tipper replicated in US and not outside), and two findings effects were unsupported.

The headline replication rates differed considerably across projects – occurring more frequently for Many Labs (77 %) and the Pipeline Project (60 %) than Registered Reports (30 %) and the Open Science Collaboration (36 %). Why are replication rates lower in the latter two projects? Possible explanations include the choice of likely versus unlikely replication candidates. Amongst the Many Labs studies, some had already previously been replicated and were selected knowing this fact. By contrast, the studies in the Pipeline project had not been previously replicated (indeed, not even previously published). Also important from a different perspective is whether each study was replicated only once by one group or multiple times by many groups.

In the Many Labs and Pipeline projects, 36 and 25 separate research groups were replicating each of 13 and 10 studies respectively. Multiple analyses lend themselves to meta-analytic techniques and analysis of the heterogeneity across research groups examining the same effect – the extent to which they accord in their effect sizes or not. The Many Labs project reported I2 values, which estimate the proportion of variation due to heterogeneity rather than chance. In the majority of cases, heterogeneity was small to moderate or even non-existent (e.g. across the 36 replications for both of the social priming studies: flag and money). Indeed, heterogeneity of effect sizes was greater between studies than within studies. When heterogeneity was greater, it was - perhaps surprisingly - where mean effect sizes were largest. Nonetheless, Many Labs reassuringly shows that some effects are highly replicable across research groups, countries, presentational differences (online versus face to face).

Counter-intuitive and even fanciful psychological hypotheses are not necessarily more likely to be false, but believing them to be so may influence researchers– even implicitly – in terms of how replications are conducted. In their extensive literature search, Makel et al. [ 40 ] reported that most direct replications are conducted by authors who proposed the original findings. This raises the thorny question of who should replicate? Almost 50 years ago Bakan [ 2 ] sagely warned that “If an investigator attempts to replicate his own investigation at another time, he will inevitably be under the influence of what he has already done…He should challenge, for example, his personal identification with the results he has already obtained, and prepare himself for finding both novelty and contradiction with respect to his earlier investigation” and that “…If one investigator is interested in replicating the investigation of another investigator, he should carefully take into account the possibility of suggestion, or his willingness to accept the results of the earlier investigator. …He should take careful cognizance of possible motivation for showing the earlier investigator to be in error, etc. [p. 110].” The irony is that as psychologists, we should be acutely aware of such biases - we cannot ignore the psychology of replication in the replication of psychology.

What are we replicating and why?

The cheap and easy.

Few areas of psychology have fallen under the replication lens and where they have, they are psychology’s equivalent to take-away meals – easy to prepare studies (e.g. often using online links to questionnaires). Hence, the focus has tended to be on studies from social and cognitive psychology, and not for example developmental or clinical studies, which are more prohibitive. Other notable examples exist such as cognitive neuropsychology, where the single case study has been predominant for decades – how can anyone recreate the brain injury and subsequent cognitive testing in a second patient?

The contentious

We cannot assert that the totality– or even a representative sample - of psychology has been scrutinised for replication. We can also see why some may feel targeted – replication does not (and probably cannot) occur in a random fashion. The vast majority of psychological studies are overlooked. To date, psychologists have targeted the unexpected, the curious, and newsworthy findings; and largely within a narrow range of areas (cognitive and social primarily). As psychologists, the need to sample more widely ought to go without saying; and one corollary of this, is that it makes no sense to claim that psychology is in crisis.

Too often perhaps, psychologists have been attracted to replicating contentious topics such as social priming, ego-depletion, psychic ability and so on. Some high impact journals have become repositories for the attention-grabbing, strange, unexpected and unbelievable findings. This goes to the systemic heart of the matter. Hartshorne & Schachner [ 27 ] amongst many others have noted “…replicability is not systematically considered in measuring paper, researcher, and journal quality. As a result, the current incentive structure rewards the publication of non-replicable findings …” (p.3 my italics). This is nothing new in science, as the quest for scientific prestige has historically resulted in a conflict between the goals of science and the personal goals of the scientist (see [ 47 ]).

The preposterous

“If there is no ESP, then we want to be able to carry out null experiments and get no effect, otherwise we cannot put much belief in work on small effects in non-ESP situations. If there is ESP, that is exciting. However, thus far it does not look as if it will replace the telephone” (Mosteller [ 41 ], p 396)

From the opposite perspective, Jim Coyne (this issue) maintains that psychology would benefit from some “…provision for screening out candidates for replication for which a consensus could be reached that the research hypotheses were improbable and not warranting the effort and resources required for a replication to establish this.” The frustration of some psychologists is palpable as they peruse apparently improbable hypotheses. Coyne’s concern echoes that of Edwards [ 18 ] who half a century ago similarly remarked, “If a hypothesis is preposterous to start with, no amount of bias against it can be too great. On the other hand, if it is preposterous to start with, why test it ?” Edwards (p 402). How preposterous can we get? According to Simmons et al. [ 51 ], it is “…unacceptably easy to publish “statistically significant” evidence consistent with any hypothesis. (p. 1359). Indeed, they managed to show by manipulating what they describe as researcher degrees of freedom (e.g. ‘data-peeking’, deciding when to stop testing participants, whether to exclude outlying data points) , that people appear to forget their age and claim to be 1.5 years younger after listening to the Beatles song “When I’m 64”.

The fact that seemingly incredible findings can be published raises disquiet about the methods normally employed by psychologists and in some circles, this has inflated to concerns about psychology more generally. Within the methodological and statistical frameworks that psychologists normally operate, we have to face the unpalatable possibility that the wriggle room for researchers is – unacceptably large. Further, it is implicitly reinforced, as Coyne notes, by the actions of some journals as well as media outlets– and until that is adequately addressed, little will change.

The negative

Interestingly, the four replication projects outlined above almost wholly neglected null findings. To date, replication efforts are invariably aimed at positive findings. Should we not also try to replicate null findings? Given the propensity for positive findings to become nulls , what is the likelihood of reverse effects in more adequately powered studies? The emphasis on replicating positive outcomes betrays the wider bias that psychologist have against null findings per se (Laws [ 36 ]). The overwhelming majority of published findings in psychology are positive (93.5 %: [ 54 ]) and the aversion to null findings may well be worse in psychology than other sciences [ 20 ]. Intriguingly, we can see a hint of this issue inthe OSC reproducibility project, which did include 3 %of sampled findings that were null initially - and whiletwo were confirmed as nulls, one did indeed become significant.As psychologists, we might ponder how the biasagainst publishing null findings finds a clear echo in the bias against replicating null findings.

A conflict between belief and evidence

The wriggle room is fertile ground for psychologists to exploit the disjunction between belief and evidence that seems quite pervasive in psychology. As remarked upon by Francis “Contrary to its central role in other sciences,it appears that successful replication is sometimes not related to belief about an effect in experimental psychology. A high rate of successful replication is not sufficient to induce belief in an effect [ 8 ] , nor is a high rate of successful replication necessary for belief [ 22 ].” The Bem [ 8 ] study documented “experimental evidence for anomalous retroactive influences on cognition and affect” or in plain language…precognition. Using multiple tasks, and nine experiments involving over 1,000 participants, Bem had implausibly demonstrated that the performance of participants reflected what happened after they had made their decision. For example, on a memory test, participants were more likely to remember words that they were later asked to practise i.e. memory rehearsal seemingly worked back in time. In another task, participants had to select which of two curtains on a computer screen hid an erotic image, and they did so at a level significantly greater than chance, but not when the hidden images were less titillating. Furthermore, Bem and colleagues [ 7 ] later meta-analysed 90 previous studies to establish a significant effect size of 0.22.

Bem presents nine replications of a phenomenon and a large meta-analysis, yet we do not believe it, while other phenomena do not so readily replicate (e.g. bystander apathy [ 22 ]) but we do believe in them. Francis [ 23 ] bleakly concludes “ The scientific method is supposed to be able to reveal truths about the world, and the reliability of empirical findings is supposed to be the final arbiter of science; but this method does not seem to work in experimental psychology as it is currently practiced .” Whether we believe in Bem’s precognition, social priming, or indeed, any published psychological finding – researchers are operating within the methodological and statistical wriggle room . The task for psychologists is to view these phenomena like any other scientific question i.e. in need of explanation. If they can close-down the wriggle room, then we might expect such curios and anomalies to evaporate in a cloud of nonsignificant results.

While some might view the disjunction between belief and evidence as ‘healthy skepticism’, others might also describe it as resistance to evidence or even anti-science. A pertinent example comes from Lykken [ 37 ] who described a study in which people who see frogs in a Rorschach test – ‘frog responders’ – were more likely to have an eating disorder [ 48 ] – a finding interpreted as evidence of harboring oral impregnation fantasies and an unconscious belief in anal birth. Lykken asked 20 clinician colleagues to estimate the likelihood of this ‘cloacal theory of birth’ before and after seeing Sapolsky’s evidence. Beforehand, they reported a “…median value of 0.01, which can be interpreted to mean, roughly, ‘I don't believe it’ ” and after being shown the confirmatory evidence “… the median unchanged at 0.01. I interpret this consensus to mean, roughly, ‘I still don’t believe it.’” (p. 151–152) . Lykken remarked that normally when a prediction is confirmed by experiment, we might expect “…a nontrivial increment in one’s confidence in that theory should result, especially when one’s prior confidence is low… [but that] this rule is wrong not only in a few exceptional instances but as it is routinely applied to the majority of experimental reports in the psychological literature” p.152 . Often such claims give rise to a version of Feynman’s maxim that “Extraordinary claims require extraordinary evidence”. The remarkableness of a claim, however, is not necessarily relevant to either the type or the scale of evidence required. Instead of setting different criteria for the ordinary and extraordinary, we need to continue to close the wriggle room .

Beliefs and the failure to self-correct

“Scientists should not be in the business of simply ignoring literature that they do not like because it contests their view.” [ 30 ]

Taking this to the opposite extreme, some researchers may choose to ignore the findings of meta-analyses at the expense of selected individual studies that accord more with their view. Giner-Sorolla [ 24 ] maintained that “…meta-analytic validation is not seen as necessary to proclaim an effect reliable. Textbooks, press reports, and narrative reviews often rest conclusions on single influential articles rather than insisting on a replication across independent labs and multiple contexts ” (p 564, my italics).

Stoebe & Strack rightly point-out, “Even multiple failures to replicate an established finding would not result in a rejection of the original hypothesis, if there are also multiple studies that supported that hypothesis.” [and] ‘believers’ “…will keep on believing, pointing at the successful replications and derogating the unsuccessful ones, whereas the nonbelievers will maintain their belief system drawing on the failed replications for support of their rejection of the original hypothesis.” (p.64). Psychology rarely – if ever- proceeds with an unequivocal knock-out blow delivered by a negative finding or even a meta-analysis. Indeed, psychology often has more of the feel of trench warfare, where models and hypotheses are ultimately abandoned largely because researchers lose interest [ 26 ].

Jussim et al. [ 30 ] provide some interesting examples of precisely how social psychology doesn’t seem to correct itself when big findings fail to replicate. If doubts are raised about an original finding then as Jussim et al point out, we might expect citations to reflect this debate, the uncertainly and as such the original and the unsuccessful replications would be expected to be fairly equally cited.

In a classic study, Darley & Gross [ 15 ] found people applied a stereotype about social class when they saw a young girl taking a maths test either after seeing her playing in an affluent or poor background. After obtaining the original materials and following the procedure carefully, Baron et al. [ 6 ] published two failed replications using more than twice as many participants. Not only did they fail to replicate, the evidence was in the opposite direction. Such findings ought to encourage debate with relatively equal attention to the pro and con studies in the literature - alas no. Jussim et al. reported that “…since 1996, the original study has been cited 852 times, while the failed replications have been cited just 38 times (according to Google Scholar searches conducted on 9/11/15).”

This is not an unusual case, as Jussim et al. report several examples of failed replications not being cited, while original studies continue to be readily cited. The infamous and seminal study by Bargh and colleagues [ 5 ] showed that unconsciously priming people with an ‘elderly stereotype’ (unscrambling jumbled sentences that contained words like: old, lonely, bingo, wrinkle ) makes them subsequently walk more slowly. However, Doyen et al. [ 16 ] failed to replicate the finding using more accurate measures of walking speed. Since 2013, Bargh et al. has been cited 900 times and Doyen et al. 192. Or a meta-analysis of 88 studies by Jost et al. [ 29 ] showing that conservativism is a syndrome characterized by rigidity, dogmatism, prejudice, and fear, not replicated by a larger better controlled meta-analysis conducted by Van Hiel and colleagues [ 57 ]. Since 2010, the former has been cited 1030 times while the latter a mere 60 by comparison. Jussim et al. suggest “This pattern of ignoring correctives likely leads social psychology to overstate the extent to which evidence supports the original study’s conclusions…[] it behooves researchers to grapple with the full literature, not just the studies conducive to their preferred arguments”.

Meta-analysis: rescue remedy or statistical alchemy?

Some view meta-analysis as the closest thing we have to a definitive approach for establishing the veracity and reliability of an effect. In the context of discussing social priming experiments, John Bargh [ 4 ] declared that “… In science the way to answer questions about replicability of effects is through statistical techniques such as meta-analysis ”. Others are more skeptical: “Meta-analysis is a reasonable way to search for patterns in previously published research. It has serious limitations, however, as a method for confirming hypotheses and for establishing the replicability of experiments” (p. 486 Hyman, 2010). Meta-analysis is not a magic dust that we can sprinkle over primary literatures to elucidate necessary truths. Likewise totemically accumulating replicated findings, in itself, does not necessarily prove anything (pace Popper). Does it matter if we replicate a finding once, twice, or 20 times, what ratio of positive to negative outcomes do we find acceptable? Answers or rules of thumb do not exist – it often comes down to our beliefs in psychology.

This special issue of BMC Psychology contains 4 articles (Taylor & Munafo, [ 56 ]; Lakens, Hilgaard & Staaks [ 34 ]; Coppens, Verkoeijen, Bouwmeester & Rikers, [ 13 ]; Coyne [ 14 ]) and in each, meta-analysis occupies a pivotal place. As shown by Taylor & Munafo (current issue), meta analyses have proliferated, are highly cited and “…most worryingly, the perceived authority of the conclusions of a meta-analysis means that it has become possible to use a meta-analysis in the hope of having the final word in an academic debate.” As with all methods, meta-analysis has its own limitations and retrospective validation via meta-analysis is not a substitute for prospective replication using adequately powered trials, but they do have substantive role to play in the reproducibility question.

Judging the weight of evidence is never straightforward and whether a finding sustains in psychology often reflect our beliefs almost as much as the evidence. Indeed, meta-analysis rightly or wrongly enables some ideas to persist despite a lack of support at the level of individual study or trial. This has certainly been argued in the use of meta-analyses to establish a case for psychic abilities, where Storm, Tressoldi & Di Risio [ 55 ] identify how “It distorts what scientists mean by confirmatory evidence. It confuses retrospective sanctification with prospective replicability.” (p.489)

This is a kind of free-lunch’ notion of meta-analysis. Feinstein [ 21 ] even stated that “ meta-analysis is analogous to statistical alchemy for the 21st century … the main appeal is that it can convert existing things into something better. “Significance” can be attained statistically when small group sizes are pooled into big ones” (p. 71). Undoubtedly, the conclusions of meta-analyses may prove unreliable where small numbers of nonsignificant trials are pooled to produce significant effects [ 19 ]. Nonetheless, it is also quite feasible for a majority of negative outcomes in a literature and still produce a reliable overall significant effect size (e.g. streptokinase: [ 35 ]).

Two of the papers presented here (Lakens et al. this issue; Taylor & Munafo this issue) offer extremely good suggestions relating to some of these conflicts in meta-analytic findings. Lakens and colleagues offer 6 recommendations, including permitting others to “re-analyze the data to examine how sensitive the results are to subjective choices such as inclusion criteria” and enabling this by providing links to data files that permit such analysis. Currently, we also need to address data sharing in regular papers. Sampling papers published in one year in the top 50 high-impact journals, Alsheikh-Ali et al. [ 1 ] reported that a substantial proportion of papers published in high-impact journals “…are either not subject to any data availability policies, or do not adhere to the data availability instructions in their respective journals”. Such efforts for transparency are extremely welcome and indeed, echo the posting online of our interactive CBT for schizophrenia meta-analysis database ( http://www.cbtinschizophrenia.com/ ), which has been used by others to test new hypotheses (e.g. [ 25 ]).

Taylor & Munafo (this issue) advise greater triangulation of evidence and in this particular instance, supplementing traditional meta-analysis and P-curve analysis [ 52 ]. In passing, Taylor & Munafo also mention “…adversarial collaboration, where primary study authors on both sides of a particular debate contribute to an agreed protocol and work together to interpret the results”. The proposed version of adversarial collaboration proposed by Kahneman [ 31 ] urged scientists to engage in a “good-faith effort to conduct debates by carrying out joint research” (p. 729). More recently, he elaborated on this in the context of the furore over failed replications (Kahneman [ 32 ]). Coyne covers some aspects of this latest paper on replication etiquette and finds some of it wanting. It may however be possible to find some new adversarial middle ground, but it crucially depends upon psychologists being more open. Indeed, some aspects of adversarial collaboration could dovetail with Lakens et als’ proposal regarding hosting relevant data on web platforms. In such a scenario, opposing views could test their hypotheses in a public arena using a shared database.

In the context of adversarial collaboration, some uncertainty and difference of opinion exists about how we might accommodate the views of those being replicated. One possibility again requires openness and that is for those who are replicated to be asked to submit a review; and crucially, the review and replicator’s responses are then published alongside the paper. Indeed, this happened with the paper of Coppens et al. (this issue). They replicated the ‘testing effect’ reported by Carpenter (2009) – that information which has been retrieved from memory is better recalled than that which has simply been studied. Their replications and meta-analysis partially replicate the original findings, and Carpenter was one of the reviewers whose review is available alongside the paper (along with the author responses). Indeed, from its initiation, BMC Psychology has published all reviews and responses to reviewers alongside published papers. This degree of openness is unusual in psychology journals, but does offer readers a glimpse into the process behind a replication (or any paper), allows the person being replicated to contribute and comment on the replication, to reply and be published in the same journal at the same time.

Ultimately, the issues that psychologists face over replication are as much about our beliefs, biases and openness as anything else. We are not dispassionate about the outcomes that we measure. Maybe because the substance of our spotlight is people, cognition and brains, we sometimes care too much about the ‘truths’ we choose to declare. They have implications. Similarly, we should not ignore the incentive structures and conflicts between the personal goals of psychologists and the goals of science. They have implications. Finally, the attitudes of psychologists to the transparency of our science needs to change. They have implications.

Alsheikh-Ali AA, Qureshi W, Al-Mallah MH, Ioannidis JP. Public availability of published research data in high-impact journals. PLoS One. 2011;6(9):e24357.

Article   PubMed   PubMed Central   Google Scholar  

Bakan D. On method. San Francisco: Jossey-Bass; 1967.

Google Scholar  

Banerjee P, Chatterjee P, & Sinha J. Is it light or dark? Recalling moral behavior changes perception of brightness. Psychol Sci 2012. 0956797611432497.

Bargh JA. Priming effects replicate just fine, thanks. Psychology Today 2012. Retrieved from https://www.psychologytoday.com/blog/the-natural-unconscious/201205/priming-effects-replicate-just-fine-thanks

Bargh JA, Chen M, Burrows L. Automaticity of social behavior: Direct effects of trait construct and stereotype activation on action. J Pers Soc Psychol. 1996;71(2):230.

Article   PubMed   Google Scholar  

Baron RM, Albright L, Malloy TE. Effects of behavioral and social class information on social judgment. Pers Soc Psychol Bull. 1995;21(4):308–15.

Article   Google Scholar  

Bem D, Tressoldi P, Rabeyron T, Duggan M. Feeling the future: A meta-analysis of 90 experiments on the anomalous anticipation of random future events. F1000Research. 2015;4:1188.

Bem DJ. Feeling the future: Experimental evidence for anomalous retroactive influences on cognition and affect. J Pers Soc Psychol. 2011;100:407–25.

Bennett CM, Miller MB. How reliable are the results from functional magnetic resonance imaging? Ann N Y Acad Sci. 2010;1191(1):133–55.

Brandt MJ, IJzerman H, Blanken I. Does recalling moral behavior change the perception of brightness? A replication and meta-analysis of Banerjee, Chatterjee, and Sinha (2012). Soc Psychol. 2014;45:246–252.

Carter TJ, Ferguson MJ, Hassin RR. A single exposure to the American flag shifts support toward Republicanism up to 8 months later. Psychol Sci. 2011;22:1011–8.

Caruso EM, Vohs KD, Baxter B, Waytz A. Mere exposure to money increases endorsement of freemarket systems and social inequality. J Exp Psychol Gen. 2013;142:301–6.

Coppens LC, Verkoeijen PJL, Bouwmeester S & Rikers RMJP (in press, this issue) The testing effect for mediator final test cues and related final 4 test cues in online and 5 laboratory experiments. BMC Psychology.

Coyne JC (in press, this issue) Replication initiatives will not salvage the trustworthiness of psychology. BMC Psychology.

Darley JM, Gross PH. A hypothesis-confirming bias in labeling effects. J Pers Soc Psychol. 1983;44(1):20.

Doyen S, Klein O, Pichon C-L, Cleeremans A. Behavioral priming: It’s all in the mind but whose mind? PLoS One. 2012;7:1–7. doi: 10.1371/journal.pone.0029081 .

Driscoll R, Davis KE, Lipetz ME. Parental interference and romantic love: the Romeo and Juliet effect. J Pers Soc Psychol. 1972;24(1):1.

Edwards W. Tactical note on the relation between scientific and statistical hypotheses. Psychological Bulletin. 1965;63:400–402.

Egger M, Davey Smith G, Schneider M, Minder C. Bias in meta-analysis detected by a simple, graphical test. BMJ. 1997;315:629–34.

Fanelli D. Negative results are disappearing from most disciplines and countries. Scientometrics. 2012;90:891–904.

Feinstein AR. Meta-analysis: Statistical alchemy for the 21st century. J Clin Epidemiol. 1995;48:71–9.

Fischer P, Krueger JI, Greitemeyer T, Vogrincic C, Kastenmüller A, Frey D, Heene M, Wicher M, & Kainbacher M The bystander-effect: A meta-analytic review on bystander intervention in dangerous and non-dangerous emergencies. Psychol Bull. 2011;137:517–37.

Francis G. Publication bias and the failure of replication in experimental psychology. Psychon Bull Rev 2012;1-17.

Giner-Sorolla R. Science or art? How aesthetic standards grease the way through the publication bottleneck but undermine science. Perspect Psychol Sci. 2012;7(6):562–71.

Gold C. Dose and effect in CBT for schizophrenia. Br J Psychiatry. 2015;207(3):269. doi: 10.1192/bjp.207.3.269 .

PubMed   Google Scholar  

Greenwald AG. There is nothing so theoretical as a good method. Perspect Psychol Sci. 2012;7:99–108.

Hartshorne J, Schachner A. Tracking replicability as a method of post-publication open evaluation. Front Comput Neurosci. 2012;6:1–14.

Johnson DJ, Cheung F, Donnellan MB. Does cleanliness influence moral judgments? A directreplication of Schnall, Benton, and Harvey (2008). Soc Psychol. 2014;45:209–215

Jost JT, Glaser J, Kruglanski AW, Sulloway FJ. Political conservatism as motivated social cognition. Psychol Bull. 2003;129(3):339.

Jussim L, Crawford JT, Anglin SM, Stevens ST, Duarte JL. Interpretations and methods: Towards a more effectively self-correcting social psychology. J Exp Soc Psychol. 2016. (in press)

Kahneman D. Experiences of collaborative research. Am Psychol. 2003;58(9):723.

Kahneman D. A new etiquette for replication. Social Psychology. 2014;45(4):310.

Klein RA, Ratliff K, Vianello M, Adams Jr AB, Bahník S, Bernstein NB, Cemalcilar Z. Investigating variation in replicability. A “Many Labs” Replication Project. Soc Psychol. 2014;45:142–152.

Lakens D, Hilgard J & Staaks J (in press, this issue) On the Reproducibility of Meta-Analyses: Six Practical Recommendations. BMC Psychology.

Lau J, Antman EM, Jimenez-Silva J, Kupelnick B, Mosteller F, Chalmers TC. Cumulative meta-analysis of therapeutic trials for myocardial infarction. N Engl J Med. 1992;327:248–54.

Laws KR. Negativland–A home for all findings in psychology. BMC Psychology. 2013;1(2):1–8. doi:10.1186/2050-7283-1-2.

Lykken DT. Statistical significance in psychological research. Psychol Bull. 1968;7:151.

Lynott D, Corker KS, Wortman J, Connell L, Donnellan MB, Lucas RE, & O’Brien K. Replication of “Experiencing physical warmth promotes interpersonal warmth” by Williams and Bargh (2008). Soc Psychol. 2014;45:216–222.

Madden CS, Easley RW, Dunn MG. How journal editors view replication research. J Advert. 1995;24:78–87.

Makel MC, Plucker JA, Hegarty B. Replications in Psychology Research: How Often Do They Really Occur? Perspect Psychol Sci. 2012;7:537–42.

Mosteller F. “Comment” on Jessica Utts, “Replication and metaanalysis in parapsychology”. Statistical Science. 1991;6(4):395–396.

Neuliep JW, Crandall R. Editorial bias against replication research. J Soc Behav Pers. 1990;5:85–90.

Neuliep JW, Crandall R. Reviewer bias against replication research. J Soc Behav Pers. 1993;8:21–9.

Nosek BA, Lakens D. Registered reports: A method to increase the credibility of published results. Soc Psychol. 2014;45:137–141.

Open Science Collaboration. Estimating the reproducibility of psychological science. Science. 2015;349(6251):aac4716.

Popper KR. The logic of scientific discovery. New York: Routledge. 1959.

Reif F. The competitive world of the pure scientist. Science. 1961;134:1957–62.

Sapolsky A. An effort at studying Rorschach content symbolism: The frog response. J Consult Psychol. 1964;28(5):469.

Schnall S, Benton J, Harvey S. With a clean conscience cleanliness reduces the severity of moral judgments. Psychol Sci. 2008;19(12):1219–22.

Schweinsberg M, Madan N, Vianello M, Sommer SA, Jordan J, Tierney W, Srinivasan M. The pipeline project: Pre-publication independent replications of a single laboratory’s research pipeline. J Exp Soc Psychol. 2016. (in press)

Simmons JP, Nelson LD, Simonsohn U. False-positive psychology undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol Sci. 2011;22:1359–66.

Simonsohn U, Nelson LD, Simmons JP. P-curve: a key to the file-drawer. J Exp Psychol Gen. 2014;143:534–47.

Sinclair HC, Hood K, Wright B. Revisiting the Romeo and Juliet (Driscoll, Davis, & Lipetz, 1972): Reexamining the links between social network opinions and romantic relationship outcomes. Soc Psychol. 2014;45:170–178.

Sterling TD, Rosenbaum WL, Weinkam JJ. Publication decisions revisited: The effect of the outcome of statistical tests on the decision to publish and vice versa. Am Stat. 1995;49(1):108–12.

Storm L, Tressoldi PE, Di Risio L. Meta-analysis of free-response studies, 1992–2008: Assessing the noise reduction model in parapsychology. Psychol Bull. 2010;136(4):471.

Taylor AE & Munafò MR (in press, this issue) Triangulating Meta-Analyses: The example of the serotonin transporter gene, stressful life events and major depression. BMC Psychology.

Van Hiel A, Onraet E, De Pauw S. The Relationship between Social‐Cultural Attitudes and Behavioral Measures of Cognitive Style: A Meta‐Analytic Integration of Studies. J Pers. 2010;78(6):1765–800.

Williams LE, Bargh JA. Experiencing physical warmth promotes interpersonal warmth. Science. 2008;322(5901):606–7.

Download references

Acknowledgements

Not applicable.

Availability of data and materials

Authors’ contributions, competing interests.

Keith R Laws is a Section Editor for BMC Psychology, who declares no competing interests.

Consent for publication

Ethics approval and consent to participate, author information, authors and affiliations.

School of Life and Medical Sciences, University of Hertfordshire, Hatfield, UK

Keith R. Laws

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Keith R. Laws .

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article.

Laws, K.R. Psychology, replication & beyond. BMC Psychol 4 , 30 (2016). https://doi.org/10.1186/s40359-016-0135-2

Download citation

Received : 17 May 2016

Accepted : 20 May 2016

Published : 01 June 2016

DOI : https://doi.org/10.1186/s40359-016-0135-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

BMC Psychology

ISSN: 2050-7283

why is it important for research to be replicated psychology

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • 19 November 2018
  • Correction 19 November 2018

Replication failures in psychology not due to differences in study populations

  • Brian Owens

You can also search for this author in PubMed   Google Scholar

A large-scale effort to replicate results in psychology research has rebuffed claims that failures to reproduce social-science findings might be down to differences in study populations.

Access options

Access Nature and 54 other Nature Portfolio journals

Get Nature+, our best-value online-access subscription

24,99 € / 30 days

cancel any time

Subscribe to this journal

Receive 51 print issues and online access

185,98 € per year

only 3,65 € per issue

Rent or buy this article

Prices vary by article type

Prices may be subject to local taxes which are calculated during checkout

doi: https://doi.org/10.1038/d41586-018-07474-y

Updates & Corrections

Correction 19 November 2018 : An earlier version of this story included an incorrect reference for the reproducibility paper.

Klein, R. A. et al. Preprint at PsyArXiv https://doi.org/10.31234/osf.io/9654g (2018).

Tversky, A., & Kahneman, D. Science 211 , 453–458 (1981).

Article   PubMed   Google Scholar  

Inbar, Y., Pizarro, D., Knobe, J., & Bloom, P. Emotion 9 , 435-439 (2009).

PubMed   Google Scholar  

Klein, R. A. et al. Soc. Psychol. 45 , 142–152 (2014).

Article   Google Scholar  

Download references

Reprints and permissions

Related Articles

1,500 scientists lift the lid on reproducibility

First analysis of ‘pre-registered’ studies shows sharp rise in null findings

First results from psychology’s largest reproducibility test

Psychology’s reproducibility problem is exaggerated – say psychologists

Over half of psychology studies fail reproducibility test

Muddled meanings hamper efforts to fix reproducibility crisis

  • Research data

The world’s most expensive dinosaur and more — July’s best science images

The world’s most expensive dinosaur and more — July’s best science images

News 02 AUG 24

How to spot a predatory conference, and what science needs to do about them: a guide

How to spot a predatory conference, and what science needs to do about them: a guide

Career Feature 30 JUL 24

Predatory conferences are on the rise. Here are five ways to tackle them

Predatory conferences are on the rise. Here are five ways to tackle them

Editorial 30 JUL 24

How pregnancy transforms the brain to prepare it for parenthood

How pregnancy transforms the brain to prepare it for parenthood

News Feature 31 JUL 24

To explain biological sex, look to evolution

Correspondence 09 JUL 24

Neuroscientists must not be afraid to study religion

Neuroscientists must not be afraid to study religion

Comment 02 JUL 24

Stop just paying lip service on publication integrity

Stop just paying lip service on publication integrity

Comment 29 JUL 24

So you got a null result. Will anyone publish it?

So you got a null result. Will anyone publish it?

News Feature 24 JUL 24

China–US research collaborations are in decline — this is bad news for everyone

China–US research collaborations are in decline — this is bad news for everyone

News 19 JUL 24

Sir Run Run Shaw Hospital, School of Medicine, Zhejiang University, Warmly Welcomes Talents Abroad

Qiushi Chair Professor; Qiushi Distinguished Scholar; ZJU 100 Young Researcher; Distinguished researcher

No. 3, Qingchun East Road, Hangzhou, Zhejiang (CN)

Sir Run Run Shaw Hospital Affiliated with Zhejiang University School of Medicine

why is it important for research to be replicated psychology

Faculty Positions & Postdocs at Institute of Physics (IOP), Chinese Academy of Sciences

IOP is the leading research institute in China in condensed matter physics and related fields. Through the steadfast efforts of generations of scie...

Beijing, China

Institute of Physics (IOP), Chinese Academy of Sciences (CAS)

why is it important for research to be replicated psychology

Assistant/Associate Professor, New York University Grossman School of Medicine

The Department of Biochemistry and Molecular Pharmacology at the NYUGSoM in Manhattan invite applications for tenure-track positions.

New York (US)

NYU Langone Health

why is it important for research to be replicated psychology

Assistant or Associate Professor of Neurosciences

FACULTY POSITION IN NEUROSCIENCES The University of New Mexico School of Medicine The Department of Neurosciences invites applications for a tenure...

University of New Mexico, Albuquerque

University of New Mexico School of Medicine

why is it important for research to be replicated psychology

Postdoctoral Associate- Biostatistics

Houston, Texas (US)

Baylor College of Medicine (BCM)

why is it important for research to be replicated psychology

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

American Psychological Association Logo

This page has been archived and is no longer being updated regularly.

Leaning into the replication crisis: Why you should consider conducting replication research

  • Conducting Research

As a student, you may have heard buzz about the replication crisis in psychology. Some of the more sensational headlines paint a bleak picture of modern research and depending on to whom you speak, individuals can be optimistic while others are demoralized. So what is going on and is it really a crisis?

The dialogue around replication ignited in 2015 when Brian Nosek’s lab reported that after replicating 100 studies from three psychology journals, researchers were unable to reproduce a large portion of findings. This report was controversial because it called into question the validity of research shared in academic journals. Publication in high profile journals requires the research to be subjected to a rigorous peer-review process. At this point, it is assumed the conclusions shared are trustworthy and others can now replicate or build upon the work. Following the Nosek study, more labs began to conduct replications and a disturbing trend emerged: a large portion of studies across multiple disciplines in science failed the replication test.

Replication is vital to psychology because studying human behavior is messy. There are numerous extraneous variables that can result in bias if researchers are not vigilant. Replication helps verify that the presence of a behavior at one point in time is not due to chance. The report that the Open Science Collaboration (2015) put forth did not undermine the peer-review process per se; rather, it highlighted a problem within the research culture. Journals were more likely to publish innovative studies over replication studies. Following the trends of the journals, researchers who require publications in order to advance their careers are unlikely to conduct a replication. As a result, without continued investigation, the exploratory studies can be treated as established lines of research rather than fledgling inquiries.

In response to the replication crisis, more individuals have been embracing the movement of transparency in research. The Open Science Foundation (OSF) and the Society for Improving Psychological Science (SIPS) have created opportunities for researchers to brainstorm means of strengthening research practices and provide avenues to share replication results. Based on these changes, I would argue the issue of replication was not a crisis, but an awakening for researchers who had become complacent to the consequences of the toxic elements of the research culture. Highlighting the issue resulted in dialogue and change. It is a perfect example of the dynamic nature of science and captures the essence of how a career in research can be intellectually stimulating, rewarding and sometimes frustrating.

From a student perspective, engaging in replication research is a useful tool to develop your own research skills. I have found that many students have misconceptions about how to conduct research. Some common behaviors include: 

  • Assuming their idea is unique, but not conducting a thorough literature search to determine what is established.
  • Proposing studies that are too complicated or have design faults.
  • Lack of awareness of ethics or the approval process needed to conduct experiments.
  • Lack of experience in regard to data entry or statistical analysis.
  • Desire to change practices mid-study to increase participant compliance.

The replication movement has presented a unique opportunity for undergraduate researchers to provide meaningful contributions to science by bolstering evidence needed to substantiate exploratory findings. I teach a research seminar at Central Oregon Community College that requires teams to complete a replication study provided by the Collaborative Replication and Education Project (associated with OSF). Reviewers give feedback before and after data collection identifying problematic areas and insuring the study is an appropriate replication. Completed projects are shared on the website and exemplary studies are eligible for research badges. The process of replication requires students to slow down and analyze strategies. Over the course of the term, the student understanding of the process matures as groups question the choices of the original researchers. It is a high impact, low-risk educational environment because students learn valuable lessons whether or not the replication is successful.

Replication studies may not offer rewards for professionals, but there are direct incentives for students. Former seminar students have been able to add research experience to resumes, which, in turn, has allowed them to secure competitive positions in labs upon transfer to four-year institutions. My students have also reported feeling more prepared for upper division courses and more confidence in their abilities to conduct individual research.

If you would like to learn more about the replication movement or how you can begin a replication study, I suggest beginning with the reference below and visiting some of the websites of the organizations listed in this article.

Website for the Open Science Foundation

Website for the Society for the Improvement of Psychological Science

References 

Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349 (6251), aac4716. DOI: 10.1126/science.aac4716

About the author

Andria Woodell, PhD

Recommended Reading

Scientists often fail when they try to replicate studies. This psychologist explains why.

by Julia Belluz

why is it important for research to be replicated psychology

For a landmark collaborative study , published today in the journal Science , researchers tried to replicate 100 recent psychology studies from top journals to see if they’d get the same results as the original authors. Overwhelmingly, they failed.

Replication — the attempt to validate previous findings by reproducing experiments — is one of the foundational ideas behind the scientific method. It tells researchers, and those who use their studies (policymakers, patients, doctors), whether research results are actually reliable or whether they might be wrong.

About 36 percent of the replications showed an effect that was consistent with the original study

In this case, about 36 percent of the replications showed an effect that was consistent with the original study. So the failure rate was more than 60 percent.

And it’s not the first time a high-profile replication effort returned concerning results — a dismal state of affairs that has led some prominent thinkers to estimate that most published scientific research findings are wrong.

But this latest study should not be read as more bad news in a distressing conversation about science’s failures, says Brian Nosek , the University of Virginia psychologist who led the effort. It is part of the Reproducibility Project: Psychology , one of many high-profile collaborative efforts to retest important research results across a range of scientific fields. The goal: to strengthen the foundation of the house of science. We talked to Nosek about the study; what follows is our conversation, lightly edited for length and clarity.

Julia Belluz: We talk a lot about the need to replicate and reproduce studies. But I think there’s little appreciation about what that actually means. Can you describe what it took to replicate 100 studies?

Brian Nosek: It is labor-intensive work. It’s easier than the original research — you don’t also have to generate the materials from nothing. But at the same time, there are challenges in understanding how another group did a study. The areas where it is a lot of work are in reviewing the methodology [the description of how the study was done] from the materials that are available, then trying to ascertain how they actually did the study. What is it that really happened?

The most interesting parts of developing these replications involved requesting original materials from the authors and comparing that against the described methodology, writing out a new methodology, and then sending that back to the original authors for their review, comments, and revisions. A lot of times in that process, researchers would say, “We actually did this thing or that thing.” It isn’t because they did something wrong, but because the norms of science are to be brief in describing the methodology.

JB: Does this mean scientists aren’t always doing a good job of writing detailed enough methodologies?

replication jb

Each dot represents a study, and you can see the original study effect size versus replication effect size. The diagonal line represents the replication effect size equal to original effect size. Points below the dotted line were effects in the opposite direction of the original. ( Science )

BN: It would be great to have stronger norms about being more detailed with the methods in the paper. But even more than that, it would be great if the norm were to post procedural details as supplements in the paper. For a lot of papers, I don’t need to know those details if I’m not trying to replicate it. I’m just reading the paper, trying to learn about the outcomes. But for stuff that’s in my area — I need access to those details so I can really understand what they did. If I can rapidly get up to speed, I have a much better chance of approximating the results.

JB: Right now, there’s a tendency to think failed replications mean the original research was wrong. (We saw this with the recent discussion around the high-profile “worm wars” replication.) But as your work here shows, that’s really not necessarily the case. Can you explain that?

BN: That’s a really important point, and it applies to all research. If you have motivations or stakes in the outcome, if you have a lot of flexibility in how you analyze your data, what choices you make, political ideologies — all those things can have a subtle influence, maybe without even the intention [to game the results of the replication].

So pre-registration [putting the study design on an open database before running the study, so you can’t change the methods if you get results you don’t like] is an important feature of doing confirmatory analysis in research. That can apply to replication efforts, as well. If you’re going to reanalyze the data, or, in our case, where you’re doing a study with brand new data collection, the pre-registration process is a way to put your chips down.

JB: After helping run this massive experiment, do you have any advice for others?

"Reproducibility is hard — that's for many reasons"

BN: My main observation here is that reproducibility is hard. That’s for many reasons. Scientists are working on hard problems. They’re investigating things where we don’t know the answer. So the fact that things go wrong in the research process, meaning we don’t get to the right answer right away, is no surprise. That should be expected.

There are three reasons that a replication might get a negative result when the original got a positive result. One, the original is a false positive — the study falsely detected evidence for an effect by chance. Two, the replication is a false negative — the study falsely failed to detect evidence for an effect by chance. Three, there is a critical difference in the original and replication methodology that accounts for the difference.

JB: Can you give me an example?

BN: Imagine an original study that found a relationship between exercise and health. Researchers conclude that people who exercise more are healthier than people who do not. A replication team runs a very similar study and finds no relationship.

One and two [described above] are possibilities that one of the teams’ evidence is incorrect and the other evidence is more credible.

Three [described above] is the possibility that when the teams look closely, they realize that the original team did their study on only women and the replication team did their study on only men. Neither team realized that this might matter — the claim was that the exercise-health relationship was about people. Now that they see the results, they wonder if gender matters.

The key is that we don’t know for sure that gender matters. It could still be one or two. But we have a new hypothesis to test in a third study. And if confirmed, it would improve our understanding of the phenomenon. Was it the changes in the sample? The procedure? Being able to dig into the differences where you observe that is a way to get a better handle on the phenomenon. That’s just science doing science.

JB: We’re hearing a lot about replication efforts these days. Is it more talk than action? Or if not, which country is leading the effort?

BN: I have no sense of the place that’s leading in funding. But the US is among the places where there’s the most progress. The NIH and NSF [National Institutes of Health and National Science Foundation] have been looking into supporting replication research. And the Netherlands has had a lot of conversations about this.

But it’s definitely [more popular now]. For me, it’s a question of research efficiency. If we only value innovation, we’re going to get a lot of great ideas and very little grounding in the stability of those ideas. The effort on improving reproducibility while paying attention to fact that innovation is the primary driver of science will help us be better stewards of public funding in science and help science fulfill its promise. There aren’t better alternatives. We really need to get this right.

Most Popular

The misleading controversy over an olympic women’s boxing match, briefly explained, intel was once a silicon valley leader. how did it fall so far, an influencer is running for senate. is she just the first of many, why two astronauts are stuck in space, is the united states in self-destruct mode, today, explained.

Understand the world with a daily explainer plus the most compelling stories of the day.

More in Science

These reviled birds of prey literally save people’s lives

These reviled birds of prey literally save people’s lives

Why two astronauts are stuck in space

What if colon cancer screening didn’t involve poop?

Storm chasing has changed — a lot — since Twister

Storm chasing has changed — a lot — since Twister

How public universities hooked America on meat

How public universities hooked America on meat

What if absolutely everything is conscious?

What if absolutely everything is conscious?

These reviled birds of prey literally save people’s lives

When vultures died off in India, people died too.

Why two astronauts are stuck in space

The Starliner test mission is just one of Boeing’s many woes.

What if colon cancer screening didn’t involve poop?

Blood-based biopsies could make screening less icky — if we can make them more accurate.

Storm chasing has changed — a lot — since Twister

These days, anyone can follow a tornado, but you’ll want to leave that to the professionals.

How public universities hooked America on meat

University scientists helped build factory farming. Now, some want to protect its “social license to operate.”

What if absolutely everything is conscious?

Scientists spent ages mocking panpsychism. Now, some are warming to the idea that plants, cells, and even atoms are conscious.

The world is getting more violent. A top refugee advocate thinks he knows why.

The world is getting more violent. A top refugee advocate thinks he knows why.

Skip the airport. It’s time to plan the perfect road trip.

Skip the airport. It’s time to plan the perfect road trip.

Can men’s gymnastics be saved?

Can men’s gymnastics be saved?  Audio

Is the United States in self-destruct mode?

Is the United States in self-destruct mode?  Audio

The misleading controversy over an Olympic women’s boxing match, briefly explained 

The misleading controversy over an Olympic women’s boxing match, briefly explained 

Intel was once a Silicon Valley leader. How did it fall so far?

What do genes have to do with psychology? They likely influence your behavior more than you realize

why is it important for research to be replicated psychology

Assistant Professor of Psychological Science, Boise State University

Disclosure statement

Jessica D. Ayers does not work for, consult, own shares in or receive funding from any company or organisation that would benefit from this article, and has disclosed no relevant affiliations beyond their academic appointment.

Boise State University provides funding as a member of The Conversation US.

View all partners

Vector illustration of side profile of head cut into different size sections, with squares and circles dotting the sides

As a species, humans like to think that we are fully in control of our decisions and behavior. But just below the surface, forces beyond our conscious control influence how we think and behave: our genes.

Since the 1950s , scientists have been studying the influences genes have on human health . This has led medical professionals, researchers and policymakers to advocate for the use of precision medicine to personalize diagnosis and treatment of diseases, leading to quicker improvements to patient well-being .

But the influence of genes on psychology has been overlooked.

My research addresses how genes influence human psychology and behavior. Here are some specific ways psychologists can use genetic conflict theory to better understand human behavior – and potentially advance the treatment of psychological issues.

What do genes have to do with it?

Genetic conflict theory proposes that though our genes blend together to make us who we are, they retain markers indicating whether they came from mom or dad. These markers cause the genes to either cooperate or fight with one another as we grow and develop. Research in genetic conflict primarily focuses on pregnancy, since this is one of the few times in human development when the influence of different sets of genes can be clearly observed in one individual.

Typically, maternal and paternal genes have different ideal strategies for growth and development. While genes from mom and dad ultimately find ways to cooperate with one another that result in normal growth and development, these genes benefit by nudging fetal development to be slightly more in line with what’s optimal for the parent they come from. Genes from mom try to keep mom healthy and with enough resources left for another pregnancy, while genes from dad benefit from the fetus taking all of mom’s resources for itself.

When genes are not able to compromise, however, this can result in undesirable outcomes such as physical and mental deficits for the baby or even miscarriage.

While genetic conflict is a normal occurrence, its influence has largely been overlooked in psychology. One reason is because researchers assume that genetic cooperation is necessary for the health and well-being of the individual. Another reason is because most human traits are controlled by many genes . For example, height is determined by a combination of 10,000 genetic variants , and skin color is determined by more than 150 genes .

The complex nature of psychology and behavior makes it hard to pinpoint the unique influence of a single gene, let alone which parent it came from. Take, for example, depression. Not only is the likelihood of developing depression influenced by 200 different genes , it is also affected by environmental inputs such as childhood maltreatment and stressful life events . Researchers have also studied similar complex interactions for stress- and anxiety-related disorders .

Prader-Willi and Angelman syndromes

When researchers study genetic conflict, they have typically focused on its link to disease , unintentionally documenting the influence of genetic conflict on psychology.

Specifically, researchers have studied how extreme instances of genetic conflict – such as when the influence of one set of parental genes is fully expressed while the other set is completely silenced – are associated with changes in behavior by studying people who have Prader-Willi syndrome and Angelman syndrome.

Prader-Willi and Angelman syndromes are rare genetic disorders affecting about 1 in 10,000 to 30,000 and 1 in 12,000 to 20,000 people around the world, respectively. There is currently no long-term treatment available for either condition.

These syndromes develop in patients missing one copy of a gene on chromosome 15 that is needed for balanced growth and development. Someone who inherits only the version of the gene from their dad will develop Angelman syndrome, while someone who has only the version of the gene from their mom will develop Prader-Willi syndrome.

Genetic map of paternal and maternal copies of chromosome 15 with various genes annotated

Physical hallmarks of Angelman syndrome include major developmental delays, intellectual disabilities, trouble moving, trouble eating and excessive smiling. Physical hallmarks of Prader-Willi syndrome include diminished muscle tone, feeding difficulties, hormone deficiencies, short stature and extreme overeating in childhood.

These syndromes represent one of the few instances where the influence of a single missing gene can be clearly observed. While both Angelman and Prader-Willi syndromes are associated with language, cognitive, eating and sleeping issues, they are also associated with clear differences in psychology and behavior.

For example, children with Angelman syndrome smile, laugh and generally want to engage in social interactions . These behaviors are associated with an increased ability to gain resources and investment from those around them.

Children with Prader-Willi syndrome, on the other hand, experience tantrums , anxiety and have difficulties in social situations . These behaviors are associated with increased hardships on mothers early in the individual’s life, potentially delaying when their mother will have another child. This would therefore increase the child’s access to resources such as food and parental attention.

Genetic conflict in psychology and behavior

Angelman syndrome and Prader-Willi syndrome highlight the importance of investigating genetic conflict’s influence on psychology and behavior. Researchers have documented differences in temperament, sociability, mental health and attachment in these disorders.

The differences in the psychological processes between these syndromes are similar to the proposed effects of genetic conflict. Genetic conflict influences attachment by determining the responsiveness and sensitivity of the parent-child relationship through differences in behavior and resource needs. This relationship begins forming while the child is still in utero and helps calibrate how reactive they will be to different social situations . While this calibration of responses starts at a purely biological level in the womb, it results in unique patterns of social beahaviors that influence everything from how we handle stress to our personalities .

Since most scientists don’t consider the influence of genetic conflict on human behavior, much of this research is still theoretical. Researchers have had to find similarities across disciplines to see how the biological process of genetic conflict influences psychological processes. Research on Angelman and Prader-Willi syndromes is only one example of how integrating a genetic conflict framework into psychological research can provide researchers an avenue to study how our biology makes us uniquely human.

  • Neuropsychology
  • Selfish gene theory

why is it important for research to be replicated psychology

Communications and Change Manager – Research Strategy

why is it important for research to be replicated psychology

Head of School: Engineering, Computer and Mathematical Sciences

why is it important for research to be replicated psychology

Educational Designer

why is it important for research to be replicated psychology

Organizational Behaviour – Assistant / Associate Professor (Tenure-Track)

why is it important for research to be replicated psychology

Apply for State Library of Queensland's next round of research opportunities

Dan Bates, LMHC, LPC, NCC

Replication Crisis

The importance of research to the practice of counseling, why is research literacy important for mental health counseling.

Posted July 30, 2024 | Reviewed by Abigail Fagan

  • The replication crisis challenges reliability—many landmark studies fail to replicate.
  • Publication bias distorts findings—positive results are more likely to be published than null ones.
  • Careerism impacts quality—the pressure to publish frequently can prioritize quantity over quality.

Lukas/Pexels

In the field of social science, particularly within psychology and counseling, several critical issues have emerged that undermine the scientific rigor of research and practice. One of the most significant challenges is the replication crisis , where many studies, including landmark research, fail to reproduce consistent results when tested in subsequent experiments. And we're not talking about little-known, oddball studies. This problem covers the whole gamut of social science research, from the seminal studies that change the field, to lesser-known research. This crisis casts doubt on the reliability of established findings and calls into question the foundations upon which many clinical practices are built.

Another pervasive issue is publication bias , where studies with significant or positive results are more likely to be published than those with null or negative findings. This skews the body of available literature, leading to an overestimation of the effectiveness of certain interventions and underrepresentation of alternative or null outcomes. Closely related is the phenomenon of idea laundering , where weak or untested theories are presented as established facts through a cycle of citations and publications, further muddying the waters of scientific clarity.

Careerism or "publish or perish" also poses a significant obstacle, as the pressure to publish frequently and in high-impact journals can lead researchers to prioritize quantity over quality. This environment can foster a focus on novel, eye-catching results rather than thorough, rigorous investigations. Moreover, inadequate graduate training in research methodology and critical thinking exacerbates these issues, leaving emerging counselors ill-prepared to both conduct and critically assess research.

These challenges collectively diminish the quality and credibility of research in social science, which is particularly concerning given the direct impact these studies have on clinical practice. For counselors, a deep understanding of research methods and critical evaluation is essential. It not only equips them to produce meaningful, replicable studies but also empowers them to discern the reliability of existing research, ensuring they base their clinical decisions on solid evidence. However, if counselors in training are not aware of the importance of research, how to conduct research, how to read research, how to integrate the findings of research, AND how to digest research critically given the problems present in research mentioned above, then it will directly affect clinical work, client outcomes and welfare. This is simply not okay since counselors have an ethical duty to provide best practices and safeguard client welfare. But, if you need some convincing, below are some of the reasons I see literacy in research as essential for competent clinical practice.

Research Guides Practice and Limits of Intuition

As clinicians, we often rely on our training, experience, and intuition to make decisions. However, it's essential to recognize that our perceptions are inherently limited and can be biased. Human reasoning, while valuable, is not infallible and can lead us astray. For instance, confirmation bias , the tendency to search for or interpret information in a way that confirms our preconceptions, can significantly impact clinical judgments. Therefore, it's crucial to complement our intuition with empirical evidence from social science research. This reliance on research helps to ground our decisions in verified data, ensuring that our interventions are based on more than just subjective judgment.

The Counterintuitive Nature of Research

One of the most valuable aspects of research is its ability to challenge our assumptions. What may seem obvious or intuitive to a seasoned counselor might not hold true for every client. For example, while it may seem intuitive that talking about suicidal thoughts could increase the likelihood of a client acting on them, research indicates that discussing these thoughts in a supportive environment can actually reduce the risk. This highlights the importance of adhering to evidence-based practices, which often provide insights that run counter to common beliefs or intuitive thinking.

Universals and Particulars in Counseling

In the realm of clinical practice, it is crucial to distinguish between universal principles and individual variations. Research can provide us with general trends and effective interventions for broad populations, but every client is unique. What works broadly might not be effective for a specific individual due to various factors such as cultural background, personal history, and psychological makeup. For example, cognitive-behavioral therapy (CBT) is widely recognized as an effective treatment for depression , but its applicability may vary based on a client's readiness, cultural context, and specific needs. Thus, while research provides a foundation, clinicians must remain flexible and responsive to the particulars of each client's situation.

Harm Prevention and Ethical Responsibility

Ethical practice in counseling involves a commitment to "do no harm." This principle necessitates that we have a reasonable expectation of the outcomes of our interventions before implementing them. Without a solid research foundation, we risk applying treatments that may be ineffective or even harmful. For example, some outdated or unsupported therapeutic practices, such as "conversion therapy" for sexual orientation , have been shown to cause significant harm. Therefore, staying informed about current research is not only a best practice but an ethical obligation to ensure we are providing safe and effective care.

Harm Detection and Differentiating Counseling Models

Not all therapeutic models are equally beneficial, and some may even be detrimental if applied inappropriately. It's vital for clinicians to discern which models are supported by robust evidence and which are not. For instance, while mindfulness -based therapies have proven effective in managing anxiety and depression, they may not be suitable for individuals with certain types of trauma -related disorders, where grounding techniques might be more appropriate. Understanding these nuances allows clinicians to tailor their approaches to better meet the needs of their clients, thereby optimizing the therapeutic outcomes.

why is it important for research to be replicated psychology

In conclusion, the integration of research into clinical practice serves as a critical tool for enhancing the quality of care provided to clients. By recognizing the limitations of intuition, valuing counterintuitive insights from research, distinguishing between universal principles and individual differences, and adhering to ethical standards of harm prevention, clinicians can ensure that their practice is both scientifically grounded and ethically sound. This commitment to evidence-based practice ultimately fosters a more effective and compassionate therapeutic environment, better serving the diverse needs of clients.

Dan Bates, LMHC, LPC, NCC

Dan Bates is a clinical mental health counselor, licensed in the state of Washington and certified nationally.

  • Find a Therapist
  • Find a Treatment Center
  • Find a Psychiatrist
  • Find a Support Group
  • Find Online Therapy
  • International
  • New Zealand
  • South Africa
  • Switzerland
  • Asperger's
  • Bipolar Disorder
  • Chronic Pain
  • Eating Disorders
  • Passive Aggression
  • Personality
  • Goal Setting
  • Positive Psychology
  • Stopping Smoking
  • Low Sexual Desire
  • Relationships
  • Child Development
  • Self Tests NEW
  • Therapy Center
  • Diagnosis Dictionary
  • Types of Therapy

July 2024 magazine cover

Sticking up for yourself is no easy task. But there are concrete skills you can use to hone your assertiveness and advocate for yourself.

  • Emotional Intelligence
  • Gaslighting
  • Affective Forecasting
  • Neuroscience
  • MIND, BODY, WONDER

There are 6 forms of depression, study shows. Here’s how they’re different.

Tens of millions of people with depression aren't properly diagnosed. Stanford researchers show that brain imaging could revolutionize treatment.

Of the   nearly 1 in 5   people in the United States suffering from depression, many aren't properly diagnosed and end up receiving treatment in a trial-and-error manner, which can be costly, ineffective, frustrating, and, in some cases, damaging. Now, research scientists at Stanford aim to change that by identifying unique biomarkers for each type of depression to match them with targeted treatment.  

Their findings,   recently published   in a   Nature Medicine   study,   show the results of machine learning and brain imaging in hundreds of patients while doing specific tasks and at rest—helping the team identify six distinct subtypes of depression.  

“Psychiatry, unlike other medical fields, currently relies on self-reported symptoms and does not use biological tests to diagnose and treat patients,” says Leanne Williams, lead author of the study and a professor of psychiatry and behavioral sciences at Stanford University's School of Medicine. “Because of this, there is a need for tests that provide precise diagnoses that are based on the biological underpinnings of the symptoms to enable personalized treatments."

While the Stanford study has limitations and is likely still years away from its findings being able to be applied to directly care for patients in a widespread manner, mental health professionals praise the research for helping psychiatrists and psychologists get closer to being able to use brain scans to identify and treat depression in similar fashion to how cardiologists use chest x-rays to identify and treat heart problems.  

"Though these results do not have any current clinical application, they are an important step toward trying to find measurable biological markers that can assist with making an accurate diagnosis and tailoring treatment," says Robert Bright, a noted psychiatrist and chair for the Mayo Clinic Arizona Department of Psychiatry and Psychology, who was not involved in the research.  

The six types of depression

The study addresses a growing concern among mental health professionals regarding the   nearly 30 percent of people diagnosed with depression whose symptoms don’t improve even after receiving multiple medical interventions.  

"Despite all our advancements in other fields of medicine, we still aren’t very good at matching patients (with depression) to the treatment that will work for them, causing some people to spend years cycling through multiple   treatments before finding one that’s effective,” says Srijan Sen, a neuroscientist and director at the Eisenberg Family Depression Center at the University of Michigan.  

To help, the Stanford researchers used functional magnetic resonance imaging (fMRI) technology   to study the regions of the brain most commonly associated with depression such as the amygdala, hypothalamus,   hippocampus, and prefrontal cortex, and, more importantly, the connections—called circuits—between those brain structures.  

( Can ending inflammation help win our battle against depression? )

Determining a patient’s depression subtype—also known as a biotype—is important, explains Williams, because each one represents a different way a major brain connection can malfunction or become disrupted, leading to the unwanted symptoms and behaviors we associate with depression.  

Bright says the broken brain connections examined in the study are known to affect a person's attention span, working memory, cognitive flexibility, planning, decision-making, rumination, motivation, and hormones associated with positive and negative emotions.  

The researchers measured these disrupted functions as they relate to depression by scanning the brains of 801 study participants who had   already been diagnosed with depression or anxiety, then studying their brain activity while at rest and while they engaged in different tasks designed to stimulate cognitive function or emotional responses to various situations—dual explorations that haven't been studied this way before.  

"By quantifying brain function at rest   and   during specific tasks, we have shown that depression consists of six specific patterns of dysfunctions in six major brain circuits," says Williams.

These six circuits and corresponding disruptors (depression biotypes) are named and defined as follows:  

1)     The default mode circuit   is active when an individual is engaged in internal mental processes such as mind-wandering and introspection. "When this circuit is disrupted, these internal mental processes are also affected," says Williams.  

You May Also Like

why is it important for research to be replicated psychology

What is cyclic vomiting syndrome and how is it diagnosed?

why is it important for research to be replicated psychology

Many people wean off antidepressants too quickly. That can be dangerous.

This is the biggest health challenge women face in their 40s.

2)     The salience circuit   helps us focus on important emotional stimuli both inside and outside ourselves. "When this circuit is disrupted, it can lead to physical symptoms of anxiety and an overwhelming sensory experience," she explains.  

3) The positive affect circuit,   also known as the reward circuit, is crucial for experiencing pleasure, rewards, social enjoyment, motivation, and a sense of purpose. "Disruptions in this circuit are associated with emotional numbness and increased effort to experience enjoyment," says Williams.  

4) The negative affect circuit   is critical for processing and responding to negative emotional stimuli such as threats and sadness. "When disrupted, reactions to negative emotions can become more intensive and prolonged," she explains.

5) The attention circuit —also known as the frontoparietal or central executive network—is involved in sustaining attention and concentration. "When disrupted, attentional processes are affected and one's ability to focus is diminished," says Williams.  

6) The cognitive control circuit   underpins executive functions such as working memory and planning, along with controlling thoughts and actions. "When disrupted, it can make decision-making difficult and planning ahead challenging," she explains.  

Why different biotypes need specific treatment

When mental health doctors can correctly identify one of these six depression biotypes responsible for disrupting healthy brain function, they'll be able to recommend tailored treatment, explains Aron Tendler, a board-certified psychiatrist and chief medical officer of BrainsWay in Burlington, Massachusetts, who wasn't involved in the study but calls its findings "extremely exciting."  

He also praises the researchers for further measuring how a few of the most common treatments for depression worked—or didn't work—on each biotype.  

They did this by randomly assigning 250 of the study's participants to receive either psychotherapy or one of three of the most prescribed antidepressants: escitalopram (Lexapro), venlafaxine (Efexor), and sertraline (Zoloft).  

Williams says their findings show multiple examples of patients with one biotype responding well to one antidepressant over another as well as patients with a different biotype experiencing improvements through talk therapy where they hadn’t responded as well to medication.

Such findings   build on   previously published research   from the Stanford team that focused on just one biotype related to the cognitive control circuit, in which the researchers were able to use fMRI technology to predict the improved likelihood of remission in depression patients who received targeted treatment over patients whose brains weren’t scanned and instead received only general care.  

In both studies and   subsequent research , the researchers have demonstrated how worthwhile fMRI technology can be in helping doctors match depression patients with the most effective treatment—reducing or eliminating the trial-and-error mental healthcare that has been the status quo.  

The study's—and industry's—limitations

While Paul Appelbaum, a psychiatrist and distinguished professor at Columbia University, calls the   Nature Medicine   study "promising," he says it needs to be replicated by other researchers and across more diverse populations, as most of that study's participants were white.  

Many other established depression treatments also need to be included in future research because the Stanford team only looked at three antidepressants and a limited number of psychotherapies.  

There's also a potentially significant barrier associated with access to the fMRI equipment needed to identify a patient's correct biotype in the first place because these machines are only available in a small number of major medical centers—making access both limited and expensive.  

Because of this, "medical doctors, including psychiatrists, rarely prescribe fMRI scans for their patients, and insurance companies would be highly unlikely to pay for these expensive brain scans until a great deal more research demonstrates that the scans can reliably predict which treatment an individual diagnosed with depression would most benefit from," says Judith Beck, a clinical professor of psychology at the University of Pennsylvania and the president of the Beck Institute for Cognitive Behavior Therapy.

But if the study's findings are consistent across additional research and if it can be shown to insurance companies and drugmakers that specific treatments are indeed effective at repairing each disrupted connection, "we could reach a watershed moment in the field," says Jonathan Rottenberg, a professor of psychology at Cornell University, who was not involved in the Stanford research. "This work has potential to make treatment for depression more efficient and effective."  

Related Topics

  • MENTAL HEALTH
  • IMAGING TECHNOLOGY

why is it important for research to be replicated psychology

Do natural adaptogens like ginseng actually combat stress?

These are the biggest health challenges women will face in their lifetimes.

why is it important for research to be replicated psychology

What working long hours does to your body

why is it important for research to be replicated psychology

What causes multiple sclerosis—and why are women more at risk?

why is it important for research to be replicated psychology

I dream of … trending topics? Here's how social media can become the stuff of nightmares

  • Environment

History & Culture

  • History & Culture
  • Paid Content
  • History Magazine
  • Mind, Body, Wonder
  • Terms of Use
  • Privacy Policy
  • Your US State Privacy Rights
  • Children's Online Privacy Policy
  • Interest-Based Ads
  • About Nielsen Measurement
  • Do Not Sell or Share My Personal Information
  • Nat Geo Home
  • Attend a Live Event
  • Book a Trip
  • Inspire Your Kids
  • Shop Nat Geo
  • Visit the D.C. Museum
  • Learn About Our Impact
  • Support Our Mission
  • Advertise With Us
  • Customer Service
  • Renew Subscription
  • Manage Your Subscription
  • Work at Nat Geo
  • Sign Up for Our Newsletters
  • Contribute to Protect the Planet

Copyright © 1996-2015 National Geographic Society Copyright © 2015-2024 National Geographic Partners, LLC. All rights reserved

  • Share full article

Ernest Jones III eats an entire test breakfast of yogurt, strawberries and granola.

Why, Exactly, Are Ultraprocessed Foods So Hard to Resist? This Study Is Trying to Find Out.

Understanding why they’re so easy to overeat might be key to making them less harmful, some researchers say.

Supported by

By Alice Callahan

Photographs by Lexey Swall

Alice Callahan spent two days at the National Institutes of Health in Bethesda, Md., and interviewed more than a dozen researchers about ultraprocessed foods.

  • Published July 30, 2024 Updated July 31, 2024

It was 9 a.m. on a Friday in March, and Ernest Jones III was hungry.

From a hospital bed at a research facility at the National Institutes of Health in Maryland, he surveyed his meal tray: Honey Nut Cheerios with fiber-enriched whole milk, a plastic-wrapped blueberry muffin and margarine.

Listen to this article with reporter commentary

“Simple, old school,” one of those “Saturday morning breakfasts from back in the day,” said Mr. Jones, 38, who is studying to become a pastor.

He was about halfway through his 28-day stay at the N.I.H., and Mr. Jones was one of 36 people participating in a nutrition trial that is expected to be completed in late 2025. For one month each, researchers will draw participants’ blood, track their body fat and weight, measure the calories they burn, and feed them three meticulously designed meals per day.

The subjects don’t know it, but their job is to help answer some of the most pressing questions in nutrition: Are ultraprocessed foods harmful to health? Are they a major driver of weight gain and obesity? And why is it so easy to eat so many of them?

If researchers can answer these questions, they say, perhaps there are ways to make ultraprocessed foods healthier.

We are having trouble retrieving the article content.

Please enable JavaScript in your browser settings.

Thank you for your patience while we verify access. If you are in Reader mode please exit and  log into  your Times account, or  subscribe  for all of The Times.

Thank you for your patience while we verify access.

Already a subscriber?  Log in .

Want all of The Times?  Subscribe .

Advertisement

IMAGES

  1. Teaching the why and how of replication studies

    why is it important for research to be replicated psychology

  2. A discipline-wide investigation of the replicability of Psychology

    why is it important for research to be replicated psychology

  3. PPT

    why is it important for research to be replicated psychology

  4. Why is Research Important?

    why is it important for research to be replicated psychology

  5. Is psychology a science?

    why is it important for research to be replicated psychology

  6. Why Replication Science?

    why is it important for research to be replicated psychology

VIDEO

  1. 15 Genuine Signs Of Intelligence

  2. EP 211 Ben Goertzel on Generative AI vs. AGI

  3. Stay Consistent

  4. The Reproducibility Crisis in Science: How do Expectations Influence Experimental Results?

  5. Yesterday is Gone! #fxtrading

  6. Here's Why Robinhood Replicated The Costco Model

COMMENTS

  1. What Is Replication in Psychology Research?

    Why Replication Is Important in Psychology . When studies are replicated and achieve the same or similar results as the original study, it gives greater validity to the findings. If a researcher can replicate a study's results, it is more likely that those results can be generalized to the larger population.

  2. What should we expect when we replicate? A statistical view of

    Second, our work highlights the critical importance of good study design and sufficient sample sizes both when performing original research and when deciding which studies to replicate. Our work shows that studies with small sample sizes - like many in the Reproducibility Project: Psychology - will produce wide prediction intervals. Although ...

  3. Replication is important for educational psychology: Recent

    Conversely, a sensational research result that cannot be replicated provides information to stakeholders that may prevent unnecessary resource and opportunity costs. Replicability is therefore a cornerstone of the research endeavor in educational psychology. It tends to occur in one of two forms, direct or conceptual replications.

  4. Replicability, Robustness, and Reproducibility in Psychological Science

    Abstract. Replication-an important, uncommon, and misunderstood practice-is gaining appreciation in psychology. Achieving replicability is important for making research progress. If findings are not replicable, then prediction and theory development are stifled. If findings are replicable, then interrogation of their meaning and validity can ...

  5. Replication and the Establishment of Scientific Truth

    The idea of replication is based on the premise that there are empirical regularities or universal laws to be replicated and verified, and the scientific method is adequate for doing it. Scientific truth, however, is not absolute but relative to time, context, and the method used. Time and context are inextricably intertwined in that time (e.g ...

  6. The role of replication in psychological science

    Progressive research programs have successfully replicated experiments; degenerating ones have failed replicated experiments. Replication provides a means to distinguish which parts of psychology are good and progressive from those that are bad and degenerating. ... hence why replication is important in psychology generally (if not in each and ...

  7. Replications in Psychology Research: How Often Do They Really Occur

    High authorship overlap is important to note because the success rates of replications were significantly different based on whether there was author overlap, with replications from the same research team more likely to be successful than replication attempts from a unique research team (91.7% vs. 64.6%, respectively), χ 2 (1, N = 303) = 32.72 ...

  8. Replicability, Robustness, and Reproducibility in Psychological Science

    Replication—an important, uncommon, and misunderstood practice—is gaining appreciation in psychology. Achieving replicability is important for making research progress. If findings are not replicable, then prediction and theory development are stifled. If findings are replicable, then interrogation of their meaning and validity can advance knowledge. Assessing replicability can be ...

  9. When and Why to Replicate: As Easy as 1, 2, 3?

    The crisis of confidence in psychology has prompted vigorous and persistent debate in the scientific community concerning the veracity of the findings of psychological experiments. This discussion has led to changes in psychology's approach to research, and several new initiatives have been developed, many with the aim of improving our findings. One key advancement is the marked increase in ...

  10. Why is Replication in Research Important?

    Replication in research is important because it allows for the verification and validation of study findings, building confidence in their reliability and generalizability. It also fosters scientific progress by promoting the discovery of new evidence, expanding understanding, and challenging existing theories or claims. Updated on June 30, 2023.

  11. Large-Scale Replication Projects in Contemporary Psychological Research

    Given this, large-scale replication projects can play an important role not only in assessing replicability but also in advancing theory. 1. Introduction. The validity of research in the biomedical and social sciences is under intense scrutiny at present, with published findings failing to replicate at an alarming rate.

  12. PDF Why is Replication so Important?

    However, when some of his experiments were replicated with all of the decisions made beforehand the results showed no ability better than what would be expected by chance. Sometimes, good science gives strange and unexpected results. In some cases, such as in Bem's psi research, the results could even be called extraordinary.

  13. 3 Understanding Reproducibility and Replicability

    Scientific research has evolved from an activity mainly undertaken by individuals operating in a few locations to many teams, large communities, and complex organizations involving hundreds to thousands of individuals worldwide. In the 17th century, scientists would communicate through letters and were able to understand and assimilate major developments across all the emerging major ...

  14. Replications in Psychology Research: How Often Do They Really Occur?

    more important, 52.9% of replications were conducted by the same research team as had produced the replicated article (defined as having an overlap of at least one author, including replications from the same publication). High authorship overlap is important to note because the success rates of replications were significantly different based

  15. The process of replication target selection in psychology: what to

    This results in an enormous back-log of non-replicated research to contend with. ... (replication OR replicated OR replicate): Psychology Biological, Psychology, Psychology ... a personal investment in the outcome/line of research' or considered personal investment as harmful as 'it is also important that the research is designed and ...

  16. Psychology, replication & beyond

    Modern psychology is apparently in crisis and the prevailing view is that this partly reflects an inability to replicate past findings. If a crisis does exists, then it is some kind of 'chronic' crisis, as psychologists have been censuring themselves over replicability for decades. While the debate in psychology is not new, the lack of progress across the decades is disappointing. Recently ...

  17. What a Replication Does (and Does Not) Tell You

    Thus, the failure to replicate a finding might be due to a difficulty in re-doing the procedure and materials in a study precisely. Knowing these differences tell us when an effect does and does ...

  18. How and Why to Conduct a Replication Study

    Replication is a research methodology used to verify, consolidate, and advance knowledge and understanding within empirical fields of study. A replication study works toward this goal by repeating a study's methodology with or without changes followed by systematic comparison to better understand the nature, repeatability, and generalizability of its findings.

  19. Replication in Psychology: Definition, Steps and FAQs

    Replication in psychology research is important because many variables affect the human behaviors, emotions and cognitive processes that psychologists research. Since scientific research relies on consistent data patterns to draw reasonable conclusions, it's important for researchers to collect extensive amounts of data for analysis.

  20. Replication failures in psychology not due to differences in study

    Brian Owens. A large-scale effort to replicate results in psychology research has rebuffed claims that failures to reproduce social-science findings might be down to differences in study ...

  21. Leaning into the replication crisis: Why you should consider conducting

    The dialogue around replication ignited in 2015 when Brian Nosek's lab reported that after replicating 100 studies from three psychology journals, researchers were unable to reproduce a large portion of findings. This report was controversial because it called into question the validity of research shared in academic journals.

  22. Replication Problems in Psychology

    After its publication, the September issue of American Psychologist, the flagship publication of the American Psychological Association, was devoted to what has been termed the " replication ...

  23. Scientists often fail when they try to replicate studies. This ...

    One, the original is a false positive — the study falsely detected evidence for an effect by chance. Two, the replication is a false negative — the study falsely failed to detect evidence for ...

  24. We Should Do More Direct Replications in Science

    The Reproducibility Project in Psychology found that only around 40% (give or take) of psychology experiments in top journals could truly be replicated (Open Science Collaboration, 2015). The Reproducibility Project in Cancer Biology similarly looked at studies from top journals in that field, and found that the replication effect was only ...

  25. What do genes have to do with psychology? They likely influence your

    Human psychology is influenced by a complex network of genes and environmental factors. Studying how and when genes fail to cooperate could broaden our understanding of behavior.

  26. Online Signals of Extremist Mobilization

    The social psychology and terrorism literatures have historically diverged in their definitions and conceptualizations of extremist action. The social psychology literature provides a broad definition, encapsulating extremist action within the wider boundary of collective action, and as involving militant, illegal, and/or violent aspects that violate societal norms, and seeks to challenge or ...

  27. Interventions to manage loneliness at an individual and community level

    Research shows that a new manualised psychotherapy program called 'Groups 4 Health' is more effective at reducing loneliness in clients in the long-term than cognitive behavioural therapy targeting depression. ... "Those results have now been replicated over multiple years. ... it's a good example of how it's important for psychology to move ...

  28. The Importance of Research to the Practice of Counseling

    Key points. The replication crisis challenges reliability—many landmark studies fail to replicate. Publication bias distorts findings—positive results are more likely to be published than null ...

  29. There are 6 forms of depression, study shows. Here's how they're different

    Tens of millions of people with depression aren't properly diagnosed. Stanford researchers show that brain imaging could revolutionize treatment. Researchers at Stanford recently described six ...

  30. Why, Exactly, Are Ultraprocessed Foods So Hard to Resist? This Study Is

    After munching on salty potato chips, "your brain is like, 'Oh my god, we need another bite of that,'" even if "your stomach is like, 'Oh, please don't do that, we're so full ...