Rivers vs lakes of information: are scientific papers "the news" or "the encyclopedia"?

← back to the blog

In Present Shock by Douglas Rushkoff, the author makes a distinction between communications that have value as a result of being current versus communications that have value as a result of being curated, accurate and complete. You can think of one like a river - information that is constant flowing, and the other like a lake - information that is discarded when it becomes obsolete but comparatively stays quite stable over time.

Think of the newspaper vs the encyclopedia: one is useless almost the next day and the other only needs to be updated every few years, but in general contains a higher quality of reporting. Both types of information are needed, but for different reasons.

I think that a lot of the confusion and debate over the role and inadequacy of the scientific literature stems from the fact that we expect it to be both of these things at once. People object to the end of peer review because it will because it then be ridiculously easy to add to the scientific cannon. We expect papers to contain a high quality of information: edited, peer-reviewed, and generally speaking, right. But the actual scientific process is almost incomprehensibly messy, and way that ideas are discarded or adopted more closely resembles a river than a lake. Nevertheless, the whole body of the scientific literature is out there: readable, citable, and - except in the rare case of retraction - immutable.

To take an example from my field, a debate erupted in 2011 over whether filaments in a part of the cell called the lamellipodium are branched or linear. Advances in electron microscopy techniques had prompted a reevaluation of the existence of branched filaments in living cells. Using more advanced "tomography" techniques, Vic Small's group concluded that the branch junctions which had been previously observed did not exist (paywalled), and the earlier, more crude analysis performed by Tatyana Svitkina's lab had produced the branches artificially during the preparation of the cells for the microscope. For a brief period, Small reveled in his role as a challenger to the scientific status-quo, publishing a commentary piece titled "Dicing with dogma: Debranching the lamellipodium". This lasted until the Svitkina group got Small's data: they looked for the branches and found them. Positive data trumped negative data, and Small had to concede that his group had simply missed the branchpoints in his earlier analysis.

While this was ongoing, the literature resembled a river even more strongly: the original paper was not retracted; instead numerous commentary pieces simply cited it, adding to the conversation but not rewriting it (and adding to the impact factor of the original, wrong, study). I don't think the whole story is fairly represented anywhere in a single publication. In short, there is no encyclopedia entry for this in the scientific literature; one has to piece it together from reading the newspaper articles.

Also in 2011, a paper strongly implying that a particular bacterium can incorporate arsenic into DNA instead of phosphorous (paywalled) was published in Science, and was almost instantly identified as containing numerous errors in methodology and analysis. A subsequent study concluded that the was no arsenic in the DNA of that bacterium. As of today, the paper has not been retracted. Perhaps it should not be, because the data in the paper were not wrong, only the implications. The most unfortunate thing is that the findings are actually important, but they are less grandiose than was imagined than the authors, and so once again the whole story is not told anywhere, but in a series of commentaries and letters to the editor.

If we expect papers to be lakes, shouldn't authors' be able to update them after publication? And if we expect them to be rivers, why hide them until they've vetted by peer-review? Isn't this system the worst of both worlds?

The biggest place this seems to be an issue is when direct replication studies are being performed. I find it strange that even when we know that concepts like statistical significance are just calculations about the probability of whether a hypothesis is true, we still want to be able to "accept" or "reject" hypotheses and "invalidate" conclusions. With a few exceptions, most things we know in science are not true or untrue, they simply have a probability of being true given the data at hand. I'm not qualified to judge the back-and-forth that's going on the psychology literature at the moment, but here's my read as an experimentalist in a different field: experiments do not really analyze general phenomena. They analyze phenomena which occur in very specific, controlled conditions, which by implication point to general phenemena when that becomes the most likely explanation for the results. Failure to replicate, especially in a field like psychology, does not point to sloppiness, error, or fraud. But it does make it less likely that the outcome of the experiment illustrates a general phenomenon. Getting to the truth has to involve a conversation, an open mind, and open debate. As I wrote in my piece about the Small-Svitkina controversy,

This recent debate serves as an important reminder that seemingly irreconcilable views can in fact reveal deeper truth, when both sides relentlessly pursue that end.

In any case, the practice of publishing papers and expecting that the results will simply be "accepted" and folded into the scientific cannon surely has to stop.