In case you missed it, the ‘big story’ in academic news in the past week was the retraction of more than 120 papers that had been published by Springer and the Institute of Electrical and Electronic Engineers (IEEE). The retraction followed the discovery, by Dr. Cyril Labbé of Joseph Fourier University, that all the fake papers in question had been generated by SCIgen, a computer program that automatically produces nonsense academic papers.
Judging by the media reaction, public opinion seemed incredulous that such a thing could have happened. In this post, I argue that such embarrassments are common, and that fake papers are not even the worst problem in academic publishing.
A history of hoaxes
Fake papers have regularly appeared in the scholarly record, often in order to demonstrate problems with the peer review process. For instance, in 1994, Alan Sokal famously submitted a paper to Social Text, in which he boldly fused string theory, Lacan and Derrida, and argued that quantum gravity had profound political implications. When the article was published, he revealed the hoax in a simultaneous publication, where he explained his rationale as follows:
For some years I’ve been troubled by an apparent decline in the standards of intellectual rigor in certain precincts of the American academic humanities. But I’m a mere physicist: if I find myself unable to make head or tail of jouissance and différance, perhaps that just reflects my own inadequacy.
So, to test the prevailing intellectual standards, I decided to try a modest (though admittedly uncontrolled) experiment: Would a leading North American journal of cultural studies – whose editorial collective includes such luminaries as Fredric Jameson and Andrew Ross – publish an article liberally salted with nonsense if (a) it sounded good and (b) it flattered the editors’ ideological preconceptions?
Since then, there have been reports of numerous fake papers, which have tried to raise awareness of pseudo-academia, such as spamferences and predatory publishers. Most recently, Science reported on a massive ‘sting’ operation which used computer-generated variants of a fake paper, in an attempt to expose ‘predatory’ publishers. John Bohannon, the scientist behind the sting, reports how he created a “credible but mundane scientific paper”, which was filled with such “grave errors that a competent peer reviewer should easily identify it as flawed and unpublishable”. He then submitted 255 versions of the paper to various journals, resulting in no fewer than 157 publications.
Returning to this week’s case, in a statement issued immediately after the story went public, Springer has expressed confidence in the standards of peer review that they employ. In their words:
There will always be individuals who try to undermine existing processes in order to prove a point or to benefit personally. Unfortunately, scientific publishing is not immune to fraud and mistakes, either. The peer review system is the best system we have so far and this incident will lead to additional measures on the part of Springer to strengthen it.
The sentiment expressed by Springer may be due to the fact that all the papers that were identified seem to have been published in conference proceedings, which do not always adhere to the same standards of peer review that apply to research articles. Conference contributions are often judged on the merit of a short abstract, so that scholarly output can be rapidly disseminated. This allows academics to benefit from feedback from other conference participants, in order to develop ideas that are still rough around the edges into a ‘proper’ academic article. It is therefore possible for publishers like Springer to feel confident about the quality of their journal articles, while at the same time acknowledging the scope for more stringent processes in other areas.
Three problems at the heart of science
That having been said, there is reason to think that increased diligence by referees may not be enough. It seems that the SCIgen incident, and the Science ‘sting’ before it, and the numerous retractions regularly reported in sites such as Retraction Watch, are just symptoms of more serious, and deepening, crisis in scholarly communication. In an article originally published in the Guardian, Curt Rice describes three aspects of this crisis, namely increased retractions, low replicability and problematic measures of research quality.
Bad science gets published too often
First, the number of articles that are retracted seems to be increasing, with most retractions appearing in the more prestigious journals. It is unclear whether more retractions are a sign of increased malpractice or closer scrutiny, but either way, it seems that a good part of the academic record is tainted, and that peer-reviewers, editors and publishers are all to blame.
As academics who were involved in fraud are increasingly made to face the consequences (link no longer active) of their actions, I believe that similar standards of accountability should apply to everyone involved in the publishing process. For instance, the names of the referees who reviewed each article should be made available (as done by the Journal of Bone and Joint Surgery), so that we might be able to assign blame when the system breaks down. It has also been suggested that publishers should refund academics for those publications which clearly fail to meet academic standards. Whether or not such a measure would be feasible is up for debate, but it seems hard to argue against such idea from an ethical perspective.
We can’t be even certain about ‘sound’ papers
The second problem Rice highlights pertains to replicability. Academic journals are understandably keen to publish studies which report on surprising or unusual findings. However, sometimes these findings are just statistical flukes. In the social sciences, for example, the common threshold of statistical significance is p<0.05, which means that one finding in every 20 could just be a product of chance. This is not, in itself, a problem, because ‘science corrects itself’ as the adage goes. Since researchers (should) report their methods in sufficient detail for their study to be replicated, if follow-up research fails to reproduce an unusual finding, then the finding can be disregarded or even retracted.
In practice, there are at least three problems with this arrangement. First, a very large proportion of studies is impossible to replicate, either because they were conducted in the field rather than in controlled laboratory conditions, or because they report on complex phenomena that are sensitive to shifting conditions, or because the methods description is opaque. Secondly, replication studies that report on negative results are notoriously hard to publish: who would want to read, let along fund, a study reporting that nothing new was discovered? Although there are a few journals, like the Journal of Negative Results in Biomedicine, dedicated to publishing such studies, it does seem a lot of science goes unchecked. Lastly, it appears that even after research findings are challenged by a replication study, belief in the discredited results persists.
To me, all this seems to suggest a need to rethink how research should be reported. This might involve more effective screening prior to publication, perhaps by requiring independent replication before a study is published. It could also involve greater transparency, e.g., by making datasets publicly available as per the new PLoS policy. And it should almost certainly involve better training and original thinking (e.g., reporting experiments in video format!).
We don’t measure the quality of science correctly
The final problem in Rice’s list is the prominence attached to the impact factor (IF). This is a metric which shows how often articles in a certain journal were cited in the past two years. The impact factor was originally designed to help acquisitions librarians manage subscriptions: Journals with high IFs were likely to contain more useful research, and were therefore assigned higher priority when making purchasing decisions.
There are many ways to artificially inflate a journal’s IF, but this may not be the main weakness of the system. Nor am I too concerned about the fact that papers published in high IF journals tend to be retracted more often than less prestigious publications. For me, what is particularly problematic is that the IF is also often used as a metric for indirectly assessing the quality of individual papers or researchers, enabling administrators and politicians to make judgements about research without actually needing to understand it. As Rice comments:
Politicians have a legitimate need to impose accountability, and while the ease of counting – something, anything – makes it tempting for them to infer quality from quantity, it doesn’t take much reflection to realize that this is a stillborn strategy.
The reasons why this is a stillborn strategy is that it is embarrassingly easy for bad quality papers to get published in journals with high IFs, as the string of hoaxes demonstrates; and it is even more common for good quality papers to be published in journals with a less-than-stellar IF (here’s some research to prove as much). It also seems that publishing in a high IF journal does not correlate well with tenure decisions, which suggests that the metric is not a useful way of judging individual researchers, either.
A much better way to evaluate the quality of individual papers would be to count the number of times that they have been cited. While shifting emphasis from the journal to the paper seems intuitive, it would mean that we should withhold judgement on papers for maybe several years, until research influenced by them has reached publication stage.
Difficult questions, few answers
It is an often-quoted truism that complaining about problems is easy, but proposing solutions is hard. If that is true, then it would perhaps be presumptuous of me to claim that I know what needs to be done. Sadly, I don’t.
But what does seem self-evident to me is that the common underlying cause of all three problems which Rice has identified is a culture of accountability, in which academics are placed under intense pressure to demonstrate that they are engaged in useful work. The fact that such pressure is not compatible with the careful reflective process that is requisite to quality research is perhaps obvious to academics. In the words of Jean Colpaert:
How many points would Louis Pasteur, Henri Poincaré, Claude Shannon, Tim Berners-Lee and others nowadays earn within the new academic evaluation system?
It may, however, be the case that this paradox has not been effectively communicated outside the Ivory Tower: maybe we should be doing a better job explaining why, and how, outside pressures to maximize performance are inhibiting research, to the detriment of everyone involved.
Featured Image: Nottingham University @ Flickr | CC BY-NC-SA
“…names of the referees who reviewed each article should be made available […], so that we might be able to assign blame” — good article overall, this point is misguided though: ultimately, it is the editor’s not the reviewer’s responsibility..
Thank your for your comment. On reflection, I agree that the decision to publish an article is taken by the editor, and that editors therefore have the final responsibility for content published. Still, it seems to me that when clearly problematic (e.g., plagiarised) content makes its way into the scholarly record, this represents a failure of process at multiple levels. Certainly, the editor may have to be held accountable for their choice of reviewers in such cases, but I think that the reviewers would need to be held accountable as well for their lack of due diligence. I realise that there are good reasons for keeping peer review anonymous, and I do not wish to challenge the spirit of this arrangement. What I would like to see, rather, is a way of improving on it so that anonymity is not abused.
It should not be the task of the reviewer to check for plagiarism.
Clearly methodological failures, yes, but also there one should be careful. I’ve frequently reviewed papers where I could easily comment on one part, but had to leave another part for someone else, as that went beyond my understanding. I generally make this clear in my review and comments to the Editor (if needed). I could thus be ‘blamed’ for allowing a paper through with errors in the part I made clear I could not evaluate!
Also, what if the reviewer recommends major revision, perhaps even a reject, and the Editor decides to publish the paper anyway?
I personally would not mind if I am mentioned as reviewer of a paper, but then this should include some kind of statement of my assessment of the final paper.
[…] piece by Achilleas Kostoulas originally appeared at his blog under the title, “Fake Papers are Not the Real Problem in Science.” It appears here with his kind […]
As far as I know, editors are never anonymous, so their responsibility is always clear. Reviewers are generally anonymous and protected from scrutiny and that may contribute to the incidence of lazy, flawed or biased reviews. Anonymity does seem to be critical, though, both in allowing reviewers to speak frankly and in avoiding open warfare among competing labs.
When scientific publication was restricted to printed editions there was no real alternative. Once an issue was printed, it was set in wood. Now that printed journals appear to be obsolescent and people are comfortable with digital publication, I’m not sure why we can’t work out a better system. How about a two-tiered review system: traditional anonymous review (this should help clear out the dreck and clean up the presentation) followed by a ‘pre-publication’ web posting that is open to criticism by anyone interested (named or anonymous)? If the paper survives that pillorying, then it moves on to official publication.
In my (perhaps too optimistic) opinion, I think public review should deter a significant amount of fraud and plagiarism, because it increases the chance of being caught and punished promptly. It should also help expose flawed designs and analyses before they infest the literature.
I’ve seen one physics journal on the web that seems to be using open reviewing and I wonder why I don’t see more? There could be problems with establishing priority or having technique pilfered, but there should be solutions to those problems.
Here’s a take on what they are calling Open Evaluation – not exactly my point of view, polluted by popularity indices like ‘usage statistics’ , ‘social web information’, but I think it should be possible to adopt an open evaluation format that would help sort out the plagiarism, fraud, poor methodologies, and improper analyses that seem to be at the base of the retraction iceberg:
This is all well and good but I don’t think it is a good strategy for academics to tell politicians and the public that they should stop bean-counting because, duh, “it is embarrassingly easy for bad quality papers to get published in journals with high IFs”. Research output will always be judged by people who are not qualified to judge its merits. They need to rely on some metric. Academics need to propose their own criteria of quality and be able to demonstrate that those criteria are robust and can’t be gamed so easily. That means they have to get their act together on peer review. There have to be real consequences when journals, editors, reviewers and authors screw up. What happened at IEEE and Springer is not encouraging because their response is inadequate. Their system of quality control was shown to be not only flawed but virtually nonexistent: they can’t even weed out total nonsense!
That these were “only” conference proceedings is irrelevant as long as the publisher claims to review each contribution, and authors cite their proceedings papers on their publication list as if they were fully peer-reviewed. What we need is more pressure on journals to do their job (for which they amply charge) and no excuses for failure.