This is a guest post by Future of Research policy activist, Adriana Bankston.
The AAAS meeting is a useful platform in which to discuss many important issues plaguing science today. Fundamental to the integrity of the scientific enterprise is being able to perform rigorous experiments at the bench, and successfully reproducing research findings. To this end, the National Institutes of Health’s (NIH) has implemented Rigor and Reproducibility guidelines, which represent fundamental changes to the grant application and review process. These guidelines went into effect on January 25, 2016. A session at the 2017 AAAS meeting entitled “Rigor and Reproducibility One Year Later: How Has the Biomedical Community Responded?” explored the feedback received from both the NIH and the research community following these guidelines, and discussed how to best implement them to achieve both increased rates of reproducibility and dramatic returns on research funding investments.
The session was moderated by Leonard Freedman from Global Biological Standards Institute (GBSI), a non-profit organization dedicated to enhancing the credibility, reproducibility, and translatability of biomedical research through best practices and standards. One of their initiatives, Reproducibility2020, aims to significantly improve the quality of preclinical biological research by the year 2020. The session featured Michael Lauer from the National Institutes of Health and William Kaelin from the Howard Hughes Medical Institute as speakers, and Judith Kimble from the University of Wisconsin, Madison, also a member of the steering committee for Rescuing Biomedical Research, as the discussant.
Michael Lauer on p-hacking and cognitive biases
Michael Lauer began his talk by discussing the John Ioannidis paper from 2005 entitled “Why Most Published Research Findings Are False,” which showed that “for most study designs and settings, it is more likely for a research claim to be false than true” due to multiple factors including smaller studies, smaller effect sizes, greater flexibility in design/definitions/outcomes/analytical modes, and others. He also highlighted how difficult it is to obtain a rigorous result, especially due to the practice of ‘p-hacking,’ which allows the tweaking of variables until statistical significance is achieved (discussed in a 2015 post entitled “Science Isn’t Broken”). This was the context in which he described the NIH initiative on rigor and reproducibility, as applied to grant application and review.
Reproducibility is also complicated by the fact that, “just because a result is reproducible does not necessarily make it right, and just because it is not reproducible does not necessarily make it wrong. A transparent and rigorous approach, however, can almost always shine a light on issues of reproducibility,” according to a 2014 report. However, as Lauer points out, reproducibility issues have to do with deep-seated cognitive biases that we have to overcome, including belief in small numbers, neglect of prior findings, non-adherence to basic principles of experimental design/reporting, and others. He suggested a multi-stakeholder strategy is key. Several other recommendations will likely stem from further evaluation of the NIH’s guidelines, however, as Lauer stated, it is currently too early to make any definitive judgements.
William Kaelin on false positives, robustness and rewards in science
In further asking why there are so many false papers, William Kaelin discussed several studies on this topic, including the 2005 Ioannidis study. These studies suggested that the percentage of false statistical positives in research is extremely high, which goes along with the low rates of reproducibility from the Ioannidis study. Kaelin suggests fraud and sloppiness as two possible reasons for lack of reproducibility. But, as Kaelin argues, this concept is complicated, since many times data are reproducible but then misinterpreted, or are reproducible but not robust, or the data may be good but the overall conclusion is not supported by the data. He therefore suggests that we should focus on the issue of robustness, and thus generate (robust) data that can still function besides perturbations.
He also brought up the idea of a sea change. By the old wisdom – publishing was reproducible, but unexpected findings could stimulate important new lines of investigation. Now, by the new wisdom – including reproducible, but unexpected findings in submitted manuscripts is often the kiss of death. This practice highlights how coming up with new ways to ask something used to be an encouraged practice, but is not necessarily the case today.
For improving reproducibility, he indicated the need for checklists and author guidelines. More generally, however, he argues that we need to change how we think about science and how we train scientists. This encompasses several points. One is how hard peer review is due to the sheer breadth of papers one can review. The second is publication rewards. He pointed out that it would be great for researchers to be able to publish certain information which they have seen over and over but don’t understand it. But unfortunately, this practice is not possible due to rewards being based on p-values and not scientific contributions.
Judith Kimble on hypercompetition and training scientists
Judith Kimble pointed out that one way to raise awareness to the issues brought up by Lauer and Kaelin is to read the NIH initiative on rigor and reproducibility in thinking about why there is so much irreproducibility being published. In Kimble’s view, this is because once a paper is submitted, somebody is saying that it is acceptable to publish it that way – when clearly, even if 1% of the paper contains irreproducible (or otherwise problematic) material, it should not be published.
As Kimble describes, part of the reason for this has to do with culture norms/standards in our own scientific disciplines, which may also differ between disciplines. Another reason could be publication pressures, since publications are the currency for scientific value, as well as for obtaining tenure and funding. This leads to the fact that people publish at all costs, which is very damaging for our community.
Indeed, one of the major problems in science is hypercompetition. According to Kimble, competition can lead to good science, but hypercompetition leads to mainstream science. And because there are too many scientists fighting for too few dollars, and wasted money goes on sloppy research, which is unethical.
Also, while a lot of focus is currently on training junior scientists, we also need to train senior researchers too because they are the ones reviewing the work. Therefore, we need to train senior scientists to watch for reproducibility in papers because they are the reviewers.
In addition to scientists themselves, journals should also be gatekeepers in terms of what gets published. Unfortunately, journals often only care about complete stories, the significance of these stories to human health, and their appeal to broader audiences. However, publishing only complete stories can lead to inadequate peer review due to the broad scope, but can also result in reviewer exhaustion and therefore lead to reviews of less high quality. This practice may also prolong training periods and delay progress, as well as encourage “cherry picking” among manuscripts, which are all detrimental practices to being an unbiased scientist or reviewer. Finally, journals should take on rigor and reproducibility as their primary criteria, but they are not doing so because it doesn’t sell their magazines.
Concluding thoughts and potential solutions
Audience questions (as well as my own questions) sparked a few possible solutions to changing the system. This is very difficult, due to multiple issues. Currently there are two types of scientists – those afraid of being wrong, and those afraid of being second, and our issue is the second group. In addition, hypercompetition is also tied to funding, and fixing this problem would also require changing the funding system, according to Judith Kimble.
In addition, there are still perverse incentives in the system. The fact that, for example, promotion is based on high impact publications instead of looking at the actual work may lie at the root of the problem. And also, right now, according to Kimble, if we reward high impact publications people will try to take shortcuts. Instead, we need to reward high quality work and innovative scholarship (i.e. coming up with hypotheses) instead of specific projects. But determining the best rewards system will take some work, and we should be doing experiments to see what works best. Finally, while true scientific misconduct is likely a very small part of the issue, the scientific system as a whole is highly decentralized, which makes it difficult to assess the impact of specific issues on science in general.
For me personally, this was a very thought provoking session, and elicited three main questions: 1) If high impact factor publications are being rewarded, but these publications are also the ones being most retracted, where is the system going?; 2) Would better statistical training help with the reproducibility issue?; 3) Can we come up with other ways to measure reward in science? I posed these questions to Judith Kimble following the session, and she made some great suggestions. She advised using the reproducibility index as an alternative to impact factor, agreed that better statistical training is needed, and suggested rewarding content but also changing the ways funders recognize these contributions.
Additional ideas from the speakers are also valuable resources for the readers. Mike Lauer mentioned other more specific resources for issues with small sample sizes leading to false discoveries in neuroscience, and the need for more transparent reporting of animal studies in preclinical research. Judith Kimble mentioned Brian Nosek and the Center for Open science as having published transparency and openness promotion guidelines signed by journals. She also mentioned the reproducibility project on cancer biology as another important initiative.