Abstract image of people icons with interlinking connections

For many papers, citation data are locked inside proprietary databases.Credit: iLexx/Getty

Whenever scientists are ranked and rewarded by metrics, such as citations, some are tempted to grab a little extra credit where they can. As we report this week, the publisher Elsevier has been investigating cases in which reviewers have repeatedly asked authors of papers to cite the reviewers’ own work.

This is not an isolated incident. Last month, we reported that some 250 highly cited scientists had amassed more than half of their citations from their own work or that of co-authors — much more than the usual proportion for their field or career stage (see Nature 572, 578–579; 2019).

Such examples should not come as a surprise, because the gaming of measurement systems is well known. In economics it is called Goodhart’s law, named after the economist Charles Goodhart, who described the concept. It was refined by the anthropologist Marilyn Strathern, and states that when a measure becomes a target, it ceases to be a good measure.

One obvious answer is for institutions and funders to just stop using citation-based metrics as a proxy for importance or quality when assessing researchers. “Stop the damn bean-counting!” one reader exclaimed in response to an online poll in Nature last month, in which we asked what — if anything — needed to be done to curb excessive self-citation. Metrics-based analysis can certainly reveal useful insights about research. But any assessment procedure that rewards scientists according to citation-based metrics alone seems designed to invite game-playing.

It can also be argued that, all things considered, excessive self-citation is a minor problem and therefore doesn’t need a particular response. Of the more than 5,000 readers who answered Nature’s poll, 10% said nothing needed to be done. “Let active researchers draw their own conclusions about self-citing researchers, and allow reputation to build naturally,” one respondent wrote.

However, most poll respondents felt that citation-based indicators are useful, but that they should be deployed in more nuanced and open ways. The most popular responses to the poll were that citation-based indicators should be tweaked to exclude self-citations, or that self-citation rates should be reported alongside other metrics (see ‘The numbers game’). On the whole, respondents wanted to be able to judge for themselves when self-citations might be appropriate, and when not; to be able to compare self-citation across fields; and more.

But this is where there is a real problem, because for many papers citation data are locked inside proprietary databases. Since 2000, more and more publishers have been depositing information about research-paper references with an organization called Crossref, the non-profit agency that registers digital object identifiers (DOIs), the strings of characters that identify papers on the web. But not all publishers allow their reference lists to be made open for anyone to download and analyse — only 59% of the almost 48 million articles deposited with Crossref currently have open references.

There is, however, a solution. Two years ago, the Initiative for Open Citations (I4OC) was established for the purpose of promoting open scholarly citation data. As of 1 September, more than 1,000 publishers were members, including Sage Publishing, Taylor and Francis, Wiley and Springer Nature — which joined last year. Publishers still to join I4OC include the American Chemical Society, Elsevier — the largest not to do so — and the IEEE.

Last January, I4OC co-founder David Shotton at the Oxford e-Research Centre, University of Oxford, UK, urged all research publishers to join the initiative (see Nature 553, 129; 2018). They should. Excessive self-citation cannot be eliminated, but free access to citation data for everyone — researchers and non-researchers — will help to illuminate some darker corners. Without more journals coming on board, these necessary efforts to analyse self-citation data will remain incomplete.