The elements of scientific style

How many scientific papers have you read in full over the past year? Presumably, none or very few, unless you belong to a small number of specific professionals, like researchers or science communicators. Even if you do read papers as part of your job, you probably haven’t read many of them beyond the title and abstract, especially if they lie outside your field of expertise.

There are several reasons for this, including the difficulty of accessing papers that lie behind the paywalls of large publishing companies and the extreme specialization of most academic articles, which limits their appeal. Also, scientific topics are often complex, and it’s generally not worth spending time and energy to understand a technical argument unless it’s particularly germane to your work.

But compounding these issues is a simple truth that has made science less impactful than it could otherwise be: Scientific papers are poorly written. And they’ve been getting worse.

Anyone who researches scientific style is bound to, sooner and later, cite Robert Boyle and Thomas Sprat. Right around the time of the Scientific Revolution and the establishment of the English Royal Society, in the 1660s, these two men defended opposing views on how to write science.

Boyle, now known as a founder of modern chemistry, argued in favor of making things interesting: A philosopher’s (or, as we would now say, a scientist’s) style ‘should delight its reader with his floridnesse’ and ‘disgust not its reader by its flatness’. A few years later, in a history of the Royal Society that is perhaps more accurately described as a manifesto, the clergyman and scientist Sprat disagreed, fighting fiercely against the use of tropes, metaphors, and ‘the volubility of tongue’.

It would be one thing if either vision had prevailed. But today, most scientific papers seem to reject both. Boyle would be appalled by the now ubiquitous terse, bland, supposedly objective language. Meanwhile, Sprat would certainly not agree that we have achieved his ideal of ‘primitive purity, and shortness, when men delivered so many things, almost in an equal number of words’.

Academic papers today are filled with jargon and abbreviations. Subject to the contradictory requirements of presenting impactful work and respecting arbitrary word limits, they cram a lot of information in very little space, leaving no room for interesting style or concrete examples. Unrelated ideas are strung together in wall-of-text paragraphs, providing no guidance to readers, who must then spend their cognitive resources figuring out the structure rather than absorbing the contents. Citations, though necessary, cause unending distraction with a profusion of parenthetical names and years in the middle of long, meandering sentences.

All of this adds to the inherent complexity of a paper’s topic. And so reading a paper in full becomes a chore – something that you do only if strictly necessary.

There is evidence that the situation used to be better. For example, in 2017, Pontus Plavén-Sigray and colleagues analyzed hundreds of thousands of paper abstracts from 1881 to 2015, concluding that scientific writing has become steadily less readable over that period. They computed simple readability measures that aggregate the average number of words per sentence and of syllables per word – the lower, the more readable. Their results are a sign that scientific writing has been drifting away from simple, everyday language. As a related opinion piece puts it, ‘scientific texts are more impenetrable than they were over a century ago’.

In parallel, acronyms have become more common over the past 70 years. Acronyms and abbreviations are a perennial temptation of technical writers, since they greatly help with minimizing word count and packing complicated ideas into reusable handles. But this comes at a cost, which is borne by readers. When an abbreviation is unfamiliar, readers must do some cognitive work to unpack it. This is true even if the abbreviation is defined at the beginning of the paper, as common practice demands. Humans do not read like computers: Declaring a ‘variable’ once does not guarantee that people will be able to remember it without effort when it comes up later. So, while acronyms and abbreviations do have their uses, their increased frequency over time is rather bad news.

Data showing the decrease in readability and increase in acronyms in scientific papers over time.

Image

Top chart from Plavén-Sigray et al., 2017. Bottom chart from Barnett and Doubleday, 2020.

Another culprit of poor style has been on the rise: the use of jargon. Jargon consists of rare technical words, and it serves a purpose similar to abbreviations: It allows researchers to quickly specify what they mean when they speak to an audience of their peers. But, again like abbreviations, specialized words can quickly turn a paper into a tough read, even for people who should have sufficient context. It is all too easy to underestimate how familiar others are with the vocabulary we use. So, although the specialization of science makes some amount of jargon more necessary now than in the past, it seems that we have gone too far. Ironically, even in a journal called Public Understanding of Science, jargon is on the rise over the past few decades. Rare words are becoming less rare – and most of them, according to the paper, appear in only one article, rather than being either general academic vocabulary or disciplinary terminology.

Most studies of scientific style look at what is easily computable: readability scores or the presence of certain text strings. It is more difficult to assess whether the subjective, aesthetic aspects of style – like evocativeness, elegance, or even just pleasure – have gotten worse over time. But perhaps we don’t need to: Pick any old paper from the nineteenth or early twentieth century, and chances are that it is going to be a far more pleasant read than almost any current paper. For instance, consider John Snow’s On the Mode of Communication of Cholera, published in 1855:

[Cholera] travels along the great tracks of human intercourse, never going faster than people travel, and generally much more slowly. In extending to a fresh island or continent, it always appears first at a sea-port. It never attacks the crews of ships going from a country free from cholera, to one where the disease is prevailing, till they have entered a port, or had intercourse with the shore. Its exact progress from town to town cannot always be traced; but it has never appeared except where there has been ample opportunity for it to be conveyed by human intercourse.

Compare with a recent cholera paper (from 2020), whose section on the transmission of the disease includes sentences such as,

Cholera is spread through the fecal-oral route, either directly from person-to-person or indirectly through contaminated fluids from an environmental reservoir of varying duration, food and potentially flies and fomites.

Not the worst offender by any stretch – it’s a relatively well-written paper – and yet a certain aesthetic seems to have been lost. As a recent essay by the pseudonymous Roger’s Bacon points out, citing science journalist Roberta Kwok, scientists used to be much more playful in their academic writing: they used italics and exclamation marks, they wrote ‘charming descriptions’, and they didn’t hesitate to tell their own research in the form of a story, narrating moments of both confusion and excitement.

Why did readability and style get worse? We don’t know for sure.

It seems plausible that the current system of academic credit, in which papers are used to measure the quality of a researcher’s work, is a factor. Since the 1970s, the prestige associated with publishing in certain journals or conferences has been far more important for career advancement (in terms of employment, tenure, and grants) than whether anyone reads the results. The supremacy of pre-publication peer review may also contribute by encouraging scientists to write in a ‘safe’ way that will pass the many filters between their work and official recognition. There are other possibilities, such as the fact that ideas and methods are not protected by intellectual property laws, making indecipherable language an attractive way to thwart would-be competitors without giving up on credit.

In other words, scientists contend with multiple incentives to ignore or even avoid good communication. In time, those incentives crystallized into norms. And norms are hard to defect from. Scientific papers today are written a certain way, and anyone not writing that way risks making life more difficult for themselves.

The aforementioned Roger’s Bacon essay argues that Sprat’s vision won over Boyle’s: We have foregone aesthetic style in order to emphasize clarity. I only partly agree. It may be true that clarity – or, perhaps more accurately, objectivity – is the stated goal. But if so, it is hardly being achieved by the scientific papers of today, thanks to their heavy use of jargon, abbreviations, and long sentences and paragraphs. It almost seems that, to paraphrase Benjamin Franklin, we have sacrificed aesthetics to gain a little clarity, deserved neither, and lost both.

If scientific writing was better in the past, this suggests that we can, through some effort, improve on the status quo. But would it be worth making that effort?

At the individual level, it can be. Several researchers have examined the impact of good writing in various disciplines, concluding that papers are published and – importantly – cited more when the quality of the writing is higher. In economics, better-written articles are deemed more likely to be accepted for conferences and publications by specialists. In biology, the presence of jargon is correlated with fewer citations. Another study of abstracts in environmental, social, and medical science found a similar pattern, suggesting that ‘writing with the reader in mind’ is a good way to increase citation count.

So there are reasons for scientists to write well. There is also no shortage of resources to learn how to do that, from writing guides to professional coaching and editing services. Despite this, most scientists don’t master the art of communicating with clarity and style – at least not beyond whatever it takes to get acceptance from a journal. Many will recognize that writing better would be nice, in theory, but they don’t have the time.

Thus there are deeper, more structural forces at play. Is it worth fixing those? Would society as a whole benefit from scientific institutions that produced better writing on average? Here it is useful to make a parallel with two existing movements: open access and plain language.

Open access advocates argue that we are collectively better off when scholarly articles are freed from publisher paywalls and made accessible to anyone with an internet connection. At the core of the movement is the assumption that the point of doing science in the first place is to disseminate results, which can be used to further advance science or improve the world. Professional scientists usually don’t need to worry about paywalls – they belong to institutions that have paid the necessary subscription fees – but just about everyone else is likely to be hindered if they try to dive deeply into a scientific topic. This creates issues of fairness: Researchers from poorer countries, as well as interested outsiders such as students, entrepreneurs, or journalists, are in a worse position to turn existing scientific results into useful work.

Open access is generally considered a valid way to improve science. But what good is a publicly accessible article if it is so tedious that no one reads it? Bad writing can be seen as another barrier: less tangible than a paywall, perhaps, but no less real.

The problem of inaccessible language has long been recognized in fields outside science, such as in law and administration. Several countries have now passed legislation to force themselves to communicate in what is known as plain language, a writing style that helps readers find the information they need easily. The rationale is that plain language saves time and therefore money. Without it, readers need to spend considerable effort understanding forms and documents written in ‘legalese’, sometimes to the point of hiring lawyers; and organizations need to spend considerable effort in customer service to explain what they mean. Sometimes, difficult language can even harm people, as in this example:

Image

Image from the United States government’s website on plain language.

There is no reason that scientific papers couldn’t be written like the ‘after’ version in this example of plain language improvement about flood cleanup.

Scientific writing is less likely to contain vital information with immediate consequences on health and safety or compliance to law, but otherwise the situation is not so different. Whatever positive impact science is supposed to have, it is reduced to the extent that bad writing hampers access.

Plain language corresponds to Sprat’s ideal of purity and shortness. There isn’t really an equivalent movement in favor of Boyle’s aesthetics, perhaps because that is difficult to quantify and therefore justify. But it can be argued that style is as important as readability. First, because the tropes that make writing pleasant, such as humor and metaphor, often ‘provide the seed of a new idea or observation’, as Roger’s Bacon also writes: They are creative tools, fit for such a creative activity as science. Both writers and readers can benefit when an unexpected comparison casts light onto an idea from a different angle than the usual bright, clinical light of plain language.

Second, because good style also helps, indirectly, with access. Even if a paper is publicly available and is devoid of long paragraphs and excessive jargon, it can still be boring. It can still be a chore to read. Scientists are human, and they will be more likely to focus on tasks that they find enjoyable. They will still read dull papers when they have to, but they will perhaps read fewer of them, and fewer that are not directly relevant to their research.

To be sure, some parts of science will always be bland. The effort needed to turn a highly technical methodology paper into a fun read would probably not be worth the trouble, even if it could increase readership a bit. But the problem is that norms insist on avoiding humor, metaphor, evocative words, and beautiful phrasing across the board. Adam Mastroianni writes that ‘a reviewer once literally told me that my paper was “too fun” and that I should make it more boring’. This sort of behavior drains the joy from science, to no benefit.

The consequences of this lack of fun are speculative, but it seems plausible that, for many people, it tilts the balance of inconvenience away from using and contributing to science. I offer myself as anecdotal evidence: I quit the academic career path early in part because I found no pleasure in reading and writing papers, which I had expected to be among the best parts of the work. I was, to use Boyle’s words, disgusted by the flatness.

One could point out that there are still quite a lot of graduate students, professional scientists, and innovators out there, and it may be fine if boring papers filter out the less motivated people (like me). This may be true, but it assumes that the filter selects the right kind of researcher. What if instead it prevents the most creative individuals from contributing? If the only people who do science are extremely specialized professionals who write only for colleagues within their own small bubble, we may, for instance, get fewer new ideas from cross-pollination between fields.

Most worryingly, this ties into the observed slowdown of science. Matt Clancy suggests that scientific progress is getting harder because of the burden of knowledge: It takes more effort to reach the frontier now than it used to, which means that new researchers take longer before they are able to contribute new ideas. Every time a new paper is published with poor readability and lack of any redeeming aesthetic qualities, the burden gets a little heavier. And therefore we reap the fruits of science less often.

To be precise, the current publishing model may be well suited to minor discoveries made within an existing paradigm. But it is plausible that paradigm shifts, which upend our understanding and are associated with fast progress, are rarer when everyone avoids reading papers outside of their field because of the friction and tedium. This argument is probably impossible to support directly with data, but we can note that paradigm shifts do seem to have happened more often in a time, in the mid-twentieth century and before, when science writing was more about communication and less about prestige.

Aside from concerns about access and the slowdown of science, bad writing also has direct consequences on our relationship to truth. When a document is hard to read, it is also hard to check for mistakes and misinformation. If you don’t have the time or the specialized skills to read a difficult technical article, then you need to rely on professional communicators such as journalists, or even more informal ones, like bloggers and YouTubers. The work of communicators is important, but it can introduce new errors and biases that propagate over time. These ‘translation errors’ are unlikely to be caught if the source material is read by only a precious few people.

Incidentally, this is an old problem: As I discovered when reviewing the book Making ‘Nature’, one of the motivations to create the now-prestigious journal Nature in the 1860s was to allow more direct communication from scientists to the public and to each other, and in doing so avoid the distortions created by journalists.

But we don’t even need to involve professional communicators here: Scientists themselves can make mistakes, interpret data in a biased way, or, worse, commit fraud. Bad writing can serve, intentionally or not, as a way to hide the truth. The fewer eyeballs that fall onto a paper, the less likely any issues with the research will be brought to light.

Over years, the outcome may be an erosion of the trust between scientists and the wider public. The vacuum left by poor communication from scientists can be filled with misinformation. Hot-button scientific issues like the Covid-19 pandemic and climate change have shown that mistrust is growing. And why wouldn’t it, when even scientifically educated people find it difficult to engage with the primary sources directly? But the solution can hardly be to demand that we trust scientific institutions blindly. Science should instead strive to make itself more intelligible.

Assuming that ‘fixing’ bad scientific writing is both possible and desirable, what can we do about it?

It seems clear that not much progress will be made by publishing opinion pieces to call upon scientists to improve their writing, or upon journals to enforce higher standards. Such exhortations are frequently found everywhere from Twitter to top journals, and there’s no harm in that. But they inevitably run into the systemic problem that good writing is difficult to prioritize and easy to sacrifice, not to mention that it’s difficult to challenge established norms. Perhaps with time those calls, combined with the minor incentives that encourage well-written articles in the form of increased citation counts, will gradually improve the situation. Or perhaps not.

(To those looking for advice on how to improve their academic writing, my suggestion would be to read The Sense of Style by Steven Pinker. It’s about as good a style guide as one can find.)

At the other extreme, changing scientific writing could be done through a complete overhaul of the publishing system. In recent years, there have been suggestions that the scientific paper is obsolete and that perhaps we should get rid of it. Journals are increasingly seen as a point of friction in science as opposed to what they used to be: a tool to facilitate communication. A number of proposed reforms exist, from publishing data sets and code notebooks instead of PDFs to abolishing systematic prepublication peer review to elaborating completely new systems for assigning academic credit. Any change along these lines would be an opportunity to implement better communication norms.

But we should be careful. There are features of the current dominant style that are essential for good science. One is the practice of systematically citing relevant prior work, which is almost never done outside of academia but is extremely useful in a field devoted to advancing knowledge. Another is the predictable format of papers, often the standard IMRaD model – introduction, methods, results, and discussion – which, although it can stifle expression, also allows readers to access the information they care about faster, something perhaps especially useful to non-native speakers of English. Relatedly, the norms of writing informative titles and abstract are helpful to the point of being all that most people will read of a paper. (Which isn’t to say that abstract style couldn’t be improved. In particular, most abstracts should probably not be walls of text.)

In other words, it could be detrimental if science publishing got rid of some of its characteristics to follow a model closer to mainstream publishing. After all, mainstream publishing has its own perverse incentives, for example toward sensationalism and virality.

This illustrates the main difficulty with any metascience topic: There’s always a chance that we are, in fact, somewhere close to the optimum. The inefficiencies and frustrations of science writing could be a natural consequence of a system that is as good as it gets, and any attempts at improvement may backfire.

But while we may be at an optimum, we can’t be sure of that either. Legal writing wasn’t at an optimum before the plain-language movement. And there in fact have been minor improvements in academic writing, such as the decrease in passive voice usage over time.

In a past issue of Works in Progress, José Luis Ricón wrote that we don’t know how to fix science, and suggested that we try many ideas to improve the chances of stumbling on better practices. This seems true for academic writing, which can be seen as a microcosm of the wider issues in science: a nice case of apparently broken incentives that we may be able to improve upon without sending the whole system crashing.

In practice, experimenting with scientific style could mean many things. A few ideas:

Creating interactive papers, such as Jupyter Notebooks with code that can be run by readers, or figures that can be manipulated with simple user interfaces (see a nice proof of concept here).
Writing papers with storytelling principles, or in formats such as letters (once the most common form of communication among scientists), dialogues (informal example here), interviews, etc.
Publishing research artifacts that are not writing based (but are still indexed and citable), such as video (for example, the Journal of Visualized Experiments).
Developing new AI-based tools for writing assistance (for example, Writefull).
Founding new journals to explore alternative strategies, such as Seeds of Science.
Publishing findings on preprint servers, blogs, or other non–peer reviewed outlets without worrying too much about credit. This is happening more and more as scientists publish on preprint servers like Arxiv, sometimes several months or years before a ‘more serious’ publication in a peer-reviewed journal (mirroring the way a journal like Nature was also used before the 1970s).
Hiring more specialized communicators in large research teams to write papers, just like tech companies hire technical writers for their documentation.

The examples in this list show that at least some attempts are being made at improving scientific writing. But most are not particularly well-known or prestigious, and the full set of initiatives isn’t that large. This is because anything that goes against the current norms of the academic community is, by definition, low status.

Quitting status games is difficult. Yet sometimes it happens, to an extent. Someone following the research in machine learning, for instance, might notice that its most impactful papers are often surprisingly well written. This could be due to the fact that some of the top research is led by large companies with considerable resources – incentives are always easier to escape with money. Or it could be due to the fast-moving nature of the field, in which more than 100 papers are published per day. In any case, it has led to some interesting communication experiments, such as Distill, a machine learning journal that is often cited as an example of a highly readable academic publication.

Distill is now inactive, for several good reasons. We can guess the underlying cause: Good writing is an economic externality. It benefits science and society as a whole, but its value cannot be directly captured by the authors, editors, and publications who provide it.

Yet, when they do provide it anyway, we end up with papers that communicate effectively and reduce the burden of knowledge; with more cross-pollination between ideas; with scientists who draw more joy and less frustration from their work; with fewer mistakes and higher trust. It might be time, then, to tweak the scientific status game. To reward the researchers who – like their predecessors a century ago – write with no other goal than telling us what they did.

Scientific papers are dense, jargon-filled, and painful to read. It wasn’t always this way – and it doesn’t have to be.

The story of VaccinateCA

Developing the science of science

Pandemic prevention as fire-fighting