Bad incentives, muddled theory and no practical use. The condition of the social sciences has been blamed on a great variety of things; what’s really at fault and how do we know?
At the end of 2020, National Geographic, New Scientist, and Business Insider each looked back at the top scientific breakthroughs of the 2010s. The decade saw the detection of gravitational waves and the Higgs Boson; the advent of CRISP-R gene editing; edible lab-grown hamburgers; quantum supremacy achieved; and AlphaGo besting the world’s top Go player. Noticeably absent from the lists, however, were breakthroughs from psychology, sociology, economics, and political science.
For the social sciences, the big story of the 2010s was instead the replication crisis. Phenomena we thought to be universal and robust have turned out to be transitory, culturally specific, or simply non-existent. In many ways, what we learned in the 2000s was how much we did not actually know.
Why? What accounts for the vast gap between rapid advances in the hard sciences and slow progress in some of the social studies?
A popular explanation for the replication crisis in the social sciences is the system of incentives that operates in science: scientists advance their career primarily by publishing articles in peer-reviewed journals. Journals want to publish surprising, novel results, which incentivizes scientists to adopt research methods that reliably generate these kinds of results rather than replicable research. Since scientists who play this game advance more in their careers, they come to serve as peer reviewers, grant reviewers, and doctoral advisors of the field more than those who don’t, entrenching their weak methods. Meanwhile, since journals tend to be interested in novel work rather than replications, few replications are performed and the fragility of published work remains largely hidden. Computer simulations show the result of these kinds of incentives is a literature riddled with non-replicable research.
But this can’t be the whole explanation because this system of incentives is not unique to the social sciences. We know the scientific incentive system can produce great results simply because it has, as the results above indicate. More generally, we know much more than we did 50 years ago about how our world works, and most of that increased knowledge is the fruit of scientific inquiry that has operated under an incentive system that’s not too different from the one operating today.
So why is this system largely working to produce useful knowledge in some sciences, but not working so well for the social sciences?
Part of the answer is that the priority system in science – which awards the majority of prestige to the first to discover something – has not traditionally incentivized replication. This is not an insurmountable problem. Instead, it means a field must find a way to operate in a framework that does not require replication by scientists. Two factors can help make that possible: a unified theoretical framework, and a community of outside “users” for discoveries. Both are absent to a greater degree in the social sciences than in the hard ones.
Let’s look at theory first. Especially as compared to the hard sciences, the social sciences are more likely to lack unified theoretical frameworks. Instead, many of the social sciences have an enormous diversity of different theories, some of which are incompatible with each other (a problem Duncan Watts calls the incoherency problem in social science). Or, in many other cases, no explicit theory is used to guide experimental work at all, with researchers instead relying on personal intuition. For example, a recent review of all articles published in the last ten years in psychological science found only 24% of articles mentioned any specific theory, and that in this subset 359 different theories were mentioned (most only once).
The presence (or absence) of a unified theoretical framework can reduce the need to perform replications in two ways.
First, theory can help solve problems related to the base rate at which hypotheses are true. To illustrate the idea, imagine we’re going to investigate 1000 different hypotheses with lab experiments. Suppose that without theory to guide us, we are just grasping in the dark, and only 1 in 1000 hypotheses will turn out to be true (i.e., the base rate for a hypothesis to be true is 0.1%). Now suppose we are using statistical methods that will never fail to detect a true hypothesis, but also generate false positives 5% of the time. Out of our 1,000 tested hypotheses, we get 51 positive results. One of these is the actual “true” hypothesis. The other 50 are false positives (5% of 999). When we publish all this work, only 1 in 51 published articles will be true; less than 2%.
If we try to replicate all these results using the same procedures, we will once again detect the true hypothesis, but we will also “successfully” replicate 2.5 of the false positives (5% of 50). On average, under a third of replicated results are actually true!
Suppose, on the other hand, that we have theory to guide us. Theory directs scientists towards testing hypotheses more likely to be true. Suppose that using theory increases the share of tested hypotheses that are true to 5%. Out of our 1,000 tested hypotheses we get 98 positive results: 50 are actually true, and 48 (5% of the remaining 950) are false positives. Now when we publish our results, more than half are true. And if we try to replicate everything, all 50 true results replicate, and about 2.5 false positives replicate, so that the vast majority of replicated results are true (50 out of 52.5). While it’s not so simple in real life, this little example shows how having a good theory to guide research can significantly mitigate the need to replicate work to have confidence in it. The existence of these kinds of robust theories in the hard sciences may be one reason they have advanced faster than the social sciences, even in the absence of widespread replication.
That said, while a good theory may help the chances that a scientific finding will replicate, it’s easy to imagine a bad theory could be worse than no theory at all. If bad theories lead you to choose hypotheses that are even more likely to be wrong than chance, then they would exacerbate the problem of most research being non-replicable. This is where a unified theoretical framework helps.
A unified theory allows scientists to use research other than direct replications to serve as a check on their work. Without theory, only a direct replication can create confirming or disconfirming evidence for a finding. But with theory, a wide variety of different research projects can lend credence or cast doubt on a finding by supporting or not supporting the theory it’s based on. A good theoretical framework knits together disparate findings into mutually confirming evidence or points to contradictions that warrant new investigation. Just as a bad theory is less likely to produce replicable research, a bad theoretical framework is less likely to find confirmatory evidence across different research projects, and hopefully gets abandoned (if not by this generation, then by the next).
We can see some evidence for this argument looking across fields. Within the hard sciences, biology is less apt to be characterized by a unified theoretical framework than, say, physics and chemistry. And indeed, by some estimates, less than 50% of preclinical research is replicable. Within the social sciences, economics is a bit of an outlier in that it has a dominant theoretical framework, where agents act to optimally achieve mathematically stated goals (typically profit or utility) under various constraints. Whereas only 15% of articles published in psychological science over 2009-2019 claim to be testing a specific prediction of a psychological theory, over 60% of empirical microeconomics papers in three top journals claimed to be grounded in theory. Among the social sciences, economics seems to have a better track record on replication: while slightly more than one third of 100 psychology experiments were replicated by the Reproducibility Project, two thirds of 18 economics experiments were. Moreover, in a market to predict the replicability of social science papers, economics was the top field.
If there is too little theory in the social sciences, it’s tempting to conclude there must be excessive attention to practical questions, since we normally think of theory and practice as lying on opposite ends of a spectrum. But this isn’t what the advocates for more theory in the social sciences claim. The argument is that instead of well developed unified theory, research is guided by many competing, incompatible, poorly specified theories and personal intuitions. In fact, rather than excessive attention to practical questions, it may be the relative lack of an outside community of practitioners that adapts academic discoveries that contributes to the replication crisis in the social sciences.
The unusually large chasm between the social sciences and its practical applications has been noted before. Focusing specifically on social psychology, Berkman and Wilson (2021) grade 360 articles published in the top cited journal of the field over a five year period on various criteria of practical significance, generally finding quite low levels of “practicality” of the published research. For example, their average grade for the extent to which published papers offered actionable steps to address a specific problem was just 0.9 out of 4. They also look at the publication criterion in ten top journals; while all of them highlight the importance of original work that contributes to scientific progress, only 2 ask for even a brief statement of the public significance of the work.
While it is possible to think of professions that apply the insights of the social sciences (consultants of various stripes, polling firms, for instance), it’s harder to think of examples that involve any kind of back-and-forth interchange with fundamental research. In contrast, sociologist Duncan Watts points out how easy it is to think of research in physics, medicine, and engineering that sit at the intersection of fundamental and applied research: the Manhattan project, DARPA’s driverless car challenges, cancer research, the Netflix prize. To that list, we could add several of the breakthroughs of the 2010s that were highlighted at the top of this article.
A community of practitioners can speed the progress of science in several ways.
Whereas the priority system in science has not traditionally incentivized replication, private companies do require robust replicable results to invent new technologies, and will sometimes invest resources into performing the replication work that science does not. This is most clearly visible in the private pharmaceutical sector, which relies heavily on academic research conducted by universities and foundations. As noted previously, it may be that a lack of unified theory contributes to less replicable results in biology. But it remains true that promising preclinical research is “replicated” at great cost in the form of clinical trials that (hopefully) result in new approved therapies.
More broadly, some 10% of science and engineering research articles (published over 1945-2013) are directly cited by patents, but 80% are “linked” to patents via a chain of citations (i.e., cited by a paper that is cited by a paper and so on, until cited by a patent). Even when the private sector does not directly replicate a scientific study, attempts to translate the ideas developed in academia into a practical application provide information about where theories come up short, or do not work in the ways anticipated. It can also force a field to reconcile competing theories of the same phenomenon. Indeed, as has been argued in this magazine, history is stuffed with examples where back-and-forth interactions of science and technology contributed to significant advances in both directions.
Finally, a community of “end-users” of new knowledge can also contribute to the health of a scientific field by providing a perspective that is capable of recognizing and championing methods and ideas that are outside the field’s dominant paradigm. For example, high-impact and highly novel academic papers receive a higher share of their citations from outside their own discipline. Outside of academia, the field of economics has a community of practitioners who use its work in the policy world, and policymakers preference for experimental and quasi-experimental methods likely contributed to the turn towards more credible methods in economics as a field. But on the whole there seems to be less engagement between the private sector and the social sciences than in the hard sciences. While private business provided 2.6% and 6.8% of all funding for academic research in physical and life sciences, it provided just 1.4% of all funding for the social sciences and psychology combined.
This is not to say that private sector research is perfect either; it has its own litany of problems. But to the extent these problems differ from those in academia, interaction with these outside communities can be a useful check on a field.
It’s easy to imagine how these factors compound each other. Poor research practices means research findings will commonly turn out to be false positives; this means assessing the veracity of paper’s claims requires either replications or comparison with the results of other papers. But if theories are fragmented, it becomes difficult for papers to shed much light on other research. Direct replications can help, but the scientific incentive system is not well placed to incentivize these replications. If the work was useful, it is possible that a non-academic group would perform replications instead, or at least provide evidence about which discoveries were robust enough to build practical solutions on. But since there is a disconnect from practical use, this isn’t the case for much of the field. Moreover, the disconnect from non-academic groups that use knowledge makes it harder to muster support for potentially better approaches that challenge the status quo. And the chasm between theory and practicality means there is little pressure to confront various theories, selecting some on the basis of their utility and discarding others.
But better days may be coming. These problems are recognized – indeed, this article is itself based on social science scholarship about how well the scientific system functions or does not. The most immediate response to the replication crisis has been greater emphasis on statistical and methodological procedures that reduce the likelihood of non-replicability; for example, preregistered reports (which help prevent scientists from changing their analysis plans or hypotheses to fit data) have become more common and replication itself has become far more common. The fact that these reforms are finding support indicates the social sciences are not powerless to reform. Calls for corrections to the deeper issues highlighted in this essay are also underway, such as more attention to building theory.
In fact, a new paper in PNAS heralds the dawn of the Golden Age of Social Science. The authors point to a number of encouraging trends that begin to strike at the root of the slower progress in the social sciences. Interdisciplinarity across the social sciences is on the rise – as noted, this can help fields break out of bad equilibriums and diffuse new methods. The authors also point to a series of recent case studies where interdisciplinary teams are tackling serious practical questions that also relate to more fundamental social science questions: how can mobile banking help low income countries respond to economic shocks; when will bystanders intervene in a conflict; how do social networks affect exercise habits?
Finally, social science data and the resources to analyze it has become abundant. Larger samples enable more precise estimates and, combined with calls for better theory, it may be that this accelerates the selection of good theories from bad. As importantly, private businesses with data on their consumers now regularly hire social scientists to analyze their data. Is this the beginning of the kind of community of practitioners that provides feedback and implicit replication of the ideas discovered in academia? I hope so.