Critics of scientific reform say that transparency comes at the cost of speed. What can disciplines learn from each other to break away from this crisis?
The 21st century has seen some phenomenal advances in our ability to make scientific discoveries. Scientists have developed new technology to build vaccines swiftly, new algorithms to predict the structure of proteins accurately, new equipment to sequence DNA rapidly, and new engineering solutions to harvest energy efficiently. But in many fields of science, reliable knowledge and progress advance staggeringly slowly. What slows it down? And what can we learn from individual fields of science to pick up the pace across the board – without compromising on quality?
By and large, scientific research is published in journals in the form of papers – static documents that do not update with new data or new methods. Instead of sharing the data and the code that produces their results, most scientists simply publish a textual description of their research in online publications. These publications are usually hidden behind paywalls, making it harder for outsiders to verify their authenticity.
On the occasion when a reader spots a discrepancy in the data or an error in the methods, they must read the intricate details of a study’s method scrupulously, and cross-check the statistics manually. When scientists don’t share the data to produce their results openly, the task becomes even harder. The process of error correction – from scientists publishing a paper, to readers spotting errors, to having the paper corrected or retracted – can take years, assuming those errors are spotted at all.
When scientists reference previous research, they cite entire papers, not specific results or values from them. And although there is evidence that scientists hold back from citing papers once they have been retracted, the problem is compounded over time – consider, for example, a researcher who cites a study that itself derives its data or assumptions from prior research that has been disputed, corrected or retracted. The longer it takes to sift through the science, to identify which results are accurate, the longer it takes to gather an understanding of scientific knowledge.
What makes the problem even more challenging is that flaws in a study are not necessarily mathematical errors. In many situations, researchers make fairly arbitrary decisions as to how they collect their data, which methods they apply to analyse them, and which results they report – altogether leaving readers blind to the impact of these decisions on the results.
This murkiness can result in what is known as p-hacking: when researchers selectively apply arbitrary methods in order to achieve a particular result. For example, in a study that compares the well-being of overweight people to that of underweight people, researchers may find that certain cut-offs of weight (or certain subgroups in their sample) provide the result they’re looking for, while others don’t. And they may decide to only publish the particular methods that provided that result.
If a reader can’t find out how different versions of a study method have affected the results, how are they to know whether the finding really was robust – that it would be found regardless of how the data was looked at?
Now, it’s common to assume that these errors and flaws are largely smoothed out during the process of peer review, where peer scientists are asked to scrutinise a study and recommend to journal editors whether it should be revised or even published at all. But unfortunately peer review often fails to accomplish this, as the scientist Stuart Ritchie has described in his book Science Fictions.
Substandard methods and questionable research practices are widespread even in published work. There is an enormous number of journals available, with different standards of publication, in which scientists can publish their work if they are rejected by some. And many lines of evidence suggest that ‘high impact’ journals (journals which are said to be highly reputable and have a wide reach) accept papers with similar or weaker methods than low-impact journals do.
Peer review instead functions predominantly as a practice of gatekeeping, as the philosophers of science Liam Bright and Remco Heesen have explained. Reviewers provide scientists with comments that are rarely made public. Whether research is published in a particular journal largely depends on the idiosyncratic decisions of reviewers who are available at the time and who are liked by the journal’s editors.
Peer review filters which research is published, with no use of transparent metrics for others to understand the reasoning behind these choices. The lack of transparency lends itself to misconduct – reviewers may discourage the publishing of science that challenges the prevailing consensus of the time or fails to replicate it – and hype, encouraging the publication of results that seem exciting over those that provide nuance.
These decisions have consequences, because journals are the dominant platform for sharing scientific research. When non-scientists, who want to apply scientific knowledge elsewhere, read research, they have little way of knowing which methods are sound and which results are reproducible or replicable. Other scientists can face the same problems, finding it difficult to assess the state of knowledge in a field, particularly if they have expertise in a different subject. This is troubling: the longer it takes to sift through the research to find out which results are reliable, the longer it takes to apply knowledge to solve new problems.
This view paints a bleak picture of the state of scientific knowledge, and the question that arises is: is all of science like this?
Fortunately, the answer is no. But unfortunately, many good practices are fairly modern and limited to specific domains. Here are some of them.
The remarkable breakthrough by DeepMind in 2020 – in accurately predicting the folded structure of proteins using only the code of protein sequences – was made possible because DeepMind’s AlphaFold algorithm was trained using data from Protein Data Bank, a vast public database that dates back to 1971. Molecular scientists have been carefully applying difficult laboratory techniques to establish the structure of proteins and have manually submitted information about them to this database for decades. In parallel, data managers have been reviewing each of their submissions for accuracy. The platform now harbours over 150,000 curated entries of protein structures.
With databases like this, the benefits to other scientists are enormous: specifically, understanding the structure of proteins can help scientists design precise drugs that fit into them, as has been done for the spike protein of the coronavirus. It can also help them understand why resistance to a drug has developed, such as in HIV research. But more broadly, individual researchers no longer have to rediscover these protein structures over and over again if they have already been shared publicly, if they want to use that knowledge for other purposes.
The practice of sharing data is also the norm in the neighbouring field of genomics. The tradition largely stems from the Human Genome Project, which began in 1990 and aimed to determine the entire code of the human genome. During the project, a group of scientists developed protocols called the Bermuda Principles, which recommended that labs working on the project should upload the details of any new genetic sequences that they discovered onto public data repositories on a daily basis.
Though the project was completed in 2003, the movement paved the way for data sharing principles in many genetic projects to come, such as the HapMap project, the 1000 Genomes project, dbGaP, and more recently, GISAID and NextStrain – two databases which publicly share genomes of the SARS-CoV-2 (Covid-19) virus. As with protein databases, these open sharing practices are remarkably important in genomics, in particular because they allow scientists to investigate a large number of genes and mutations at the same time, each detailed with a large amount of information that other scientists have collected.
These databases are immensely useful in sharing data for other researchers to analyse, but different tools are necessary for researchers to share the code used to analyse data – for example, if they want to test specific hypotheses. Some researchers voluntarily share their analysis code on GitHub and OSF, two platforms where people can upload their data, code and research plans and update them when changes are made. But the practice of sharing code is not widely used by scientists even within these fields.
In computer science however, there are examples of journals that work with version control tools like Git. Take JOSS as an illustration, a journal that publishes research describing open source software. JOSS is actually built upon the platform of GitHub: software developers submit their software together with working papers describing them to the journal via GitHub, and the journal’s reviewers provide comments on their code and suggest revisions to their work publicly. Both reviewers’ comments and past versions of the software are therefore transparent and accessible to the public.
Apart from these examples of sharing data and code openly, some fields already commonly try to tackle p-hacking and other questionable research practices, described before. In economics, researchers typically perform “robustness checks” to demonstrate that their research findings hold up to different methods for analysing the data (a comprehensive version of this is often described as multiverse analysis in psychology).
There are also other approaches that scientists have taken to prevent manipulation. From the year 2000 onwards, scientists who have conducted clinical trials have been required to declare the methods they will use to analyse their data before they even collect it. This way, outsiders have been able to verify whether or not they have stuck to their original plans. Since the requirement was put in place, it is now far more likely that researchers find no benefit of many drugs tested in clinical trials.
And what about peer review? As the mathematician Timothy Gowers has explained, in fields such as mathematics and physics, academics routinely publish their research on publicly-accessible servers, where the wider community of scientists provide comments and suggest revisions openly on their work, before the research is sent to a journal as a final draft. Formal publication in journals is viewed as a formality in the research process.
Altogether, these examples are largely restricted to specific disciplines: while research in genomics has pioneered the use of massive open databases, it rarely contains robustness checks or the pre-registration of methods. While the methods of clinical trials are required to be pre-registered, their analysis code is rarely shared publicly.
We believe that good practices from individual fields should serve as models for how science ought to work across the board, and that the scientific process should be radically reassembled from start to finish. How would it work?
To begin with, scientists would spend far more time clarifying the theories they are studying – developing appropriate measures to record data, and testing the assumptions of their research – as the meta-scientist Anne Scheel and others have suggested. Scientists would use programs such as DeclareDesign to simulate data, and test and refine their methods.
Instead of writing research in the form of static documents, scientists would record their work in the form of interactive online documents (such as in Markdown format). Past versions of their writing and analysis would be fully available to view through platforms such as GitHub and OSF. Robustness checks and multiverse analysis would be the norm, showing readers the impact of various methodological decisions interactively.
Once research is freed from the need to exist in static form, it can be treated as if it were a live software product. Analysis code would be standardised in format and regularly tested by code-checkers, and data would be stored in formats that were machine-readable, which would enable others to quickly replicate research or apply methods to other contexts. It would also be used to apply new methods to old data with ease.
Some types of results would be stored in mass public databases with entries that would be updated if needed, and other researchers would reuse their results in further analysis. Citations would be viewer-friendly, appearing as pop-ups that highlight passages or refer to specific figures or datasets in prior research (each with their own doi codes), and these citations would be automatically checked for corrections and retractions.
Peer review would operate openly, where the wider scientific community and professional reviewers would comment on working papers, and Red Teams would be contracted to challenge research. Comments and criticisms of studies would be displayed alongside them on platforms such as PubPeer. Journals would largely be limited to aggregating papers and disseminating them in different formats (for researchers, laypeople and policymakers); they would act, perhaps, in parallel with organisers of conferences. They could perform essential functions such as code-checking and copy-editing, working through platforms such as GitHub.
We are already beginning to see these pieces coming together – with some specialist research labs, consultancies and think tanks producing transparent and high quality research – but the effort needs to be expanded much more widely, because for many researchers, these will sound like colossal changes to the status quo.
At the moment, good scientific practices are still largely limited to a small number of scientists on the cutting edge: individuals who possess the time, curiosity and willingness to voluntarily learn skills that allow them to be transparent with their research. And there is sparse institutional support for such skills and little professional incentive to commit to them.
Although young scientists are becoming increasingly well-versed with some of these tools, we cannot expect every researcher to excel at all, or even most, of these skills voluntarily.
Science needs to step up. How can we accelerate the process?
The answer, we believe, lies in the opening line of the 1776 book An Inquiry into the Nature and Causes of the Wealth of Nations:
The greatest improvement in the productive powers of labour, and the greater part of the skill, dexterity, and judgment with which it is anywhere directed, or applied, seem to have been the effects of the division of labour.
Adam Smith provided three key insights as to why a worker’s productivity was sparked by the division of labour. Firstly, skills are learnt – it takes scarce time to acquire and develop them, not to mention the time it takes to maintain those skills with the latest developments. Secondly, people who become specialists start to recognise patterns in their jobs. They develop tools and inventions to automate parts of their work. Thirdly, the division of labour expands the market, because firms begin to acquire products and services that make up the production chain from outsiders.
By the same token, we envision research as a production output, collectively produced by teams of people who become specialists in various aspects of a transparent research process, much like cutting-edge software engineering is already done. Scientists would not be all-rounders; they would be a team. Scientists working on a project would be a team of theorists, epidemiologists, statisticians, programmers, Red Team reviewers, code-checkers, managers, writers, copy-editors and communicators. And as databases are created and research is contracted and disseminated openly, there would be an ever-growing number of scientists who would become contributors to an expanding universe of scientific knowledge.
There would be benefits to scientific integrity too: if peer review was a speciality, there would be less pressure on reviewers to write favourable reviews for colleagues, because they would now have more independence from the spheres where researchers work.
Many critics of open science contend that radical reform asks too much of researchers, but this kind of specialisation would fill two needs with one deed: it would reduce the burden of work placed on each individual researcher while increasing the quality and quantity of science conducted by researchers overall.
It’s for all these reasons that we believe the most important way to accelerate science will be to drive the division of labour.
New research organisations, both within academic institutions and outside them, could be structured with these ideas in mind – serving as demonstrations of how this alternative approach to conducting research would work in practice. And they could seed the adoption of these principles in labs more widely.
We could also entrench the division of labour into science through norms and institutional policy, with ideas such as the funding of specialist work like Red Teams who would critique research, with academic policies that help labs allocate work and contract out parts of their research process, and with large institutional efforts to organise open data sharing projects, as with the Human Genome Project.
We’ve seen how particular disciplines have built databases, practices, and platforms that have improved the state of their science. We should learn from their successes and failures and apply these lessons more broadly.
Given the life-saving nature of progress we’ve seen this year, in some rapidly-advancing fields of science, it isn’t enough to simply imagine what’s left to discover elsewhere. It’s time to turn our ambitions into reality.