There exists an ever lasting war of opinions in the academic community of how the impact of papers, the prestige of journals and conferences, and the prominence of university departments should be measured. The centre of this conflict, the battleground that is heavily fought over, surrounds the question of what exactly citations of academic papers measure. What is the value of a citation? What is the meaning of a citation?
This blog post is a brief introduction on what citation counts can and cannot measure. In this discussion I am going to pretend to be from Switzerland and merely be a non-partisan presenter of the topics that kindle disputes between the various parties. This discussion is followed up by a short talk on algorithms such as PageRank and how they can be used to compute the importance of academic papers by overcoming or reducing the problems inherent to pure citation counts.
What citation counts do measure
The first battleground surrounds the question of whether a high citation count of a paper equates to quality work and high impact research. One camp is of the strong belief that this is not always true because a paper of low quality or that contains incorrect results could also have achieved a high citation count because it received a lot of criticism. On the other side of the trenches the argument is that this situation is highly unlikely because of the, in general, reluctance of academics to go to all the trouble to refute inferior work. It is more likely that bad material is bypassed and simply dies never to be cited again. The launch of a frontal attack only becomes necessary if incorrect results stand in the way of further development of a subject or if they contradict work in which someone else has a vested interest. Some even go further and state that if effort is invested into criticizing work, the work must be of some substance. Extremists claim that formal refutes are also constructive and can clarify, focus and stimulate the research surrounding a certain subject. They also use this argument as rebutting evidence that high citation counts are not a measurement of how many times an individual was right and rather measures the level of contribution of an individual to the practice of science.
Lets move on to another topic of dispute. Self-citation. Again there are opposing forces at play. The one side believes that self-citation manipulates citation rates. The other side, the majority, believes that self-citation and even team self-citation is very reasonable because it is only an indication of a narrow speciality since scientist tend to build on their own work and that of collaborators.
Another problematic area is the varying citation potentials in different academic fields. Some researchers are of the opinion that methodological advances are less important than theoretical ones. They believe that citation counts cannot be a valid measure because they favour those who develop research methods over those who theorize about research findings. In general, method papers are not highly cited but this is also field dependent. Academic fields that are more oriented to methodology tend to be cited more. Instead of the “importance” or “impact”, the quality that citation counts measure is actually the utility or the usefulness of a paper to a large number of people or experiments. On the other hand, the citation count of a work does not necessarily say anything about its elegance or its relative importance to the advancement of science or society. It only says that there are more people working on a specific topic than on another topic and therefore citation counts actually measures the activity of a topic at a certain point in time.
Every point mentioned here so far can be measured using citations counts. The output or value of this measurement simply depends on the interpretation of what citations actually mean.
What citation counts don’t measure
Lets move on to what citation counts do not reflect. These points are very important since different techniques of calculating a paper’s importance have to be devised that are not only based on pure citation counts in order to assist or replace (in certain scenarios) expert opinions.
Firstly, work that is very significant but so far ahead of the field to be picked up by others will go unnoticed until the field catches up. Citation counts will not identify significance that is unrecognized by the scientific community. They only reflect the community’s work and interest.
Secondly, obliteration is another issue that is not measurable by merely looking at a paper’s citation counts. Obliteration occurs when some work becomes so generic to a certain field or has become so integrated into the body of knowledge that researchers neglect to acknowledge the initial research with a citation. It is obvious that obliteration occurs to every work that is of high quality or that had a great impact in a certain field. The problem is that obliteration can either occur shortly after publication or slowly over time which in turn will result in a high citation count and will render additional citations redundant. Either way, obliteration is not reflected in the citation counts of papers.
Another factor where additional information is required are the impact factors of the publication venues of citing or cited papers. Here it is very difficult to decide how individual citations should be weighted if the information about publication venues is know. Should a citation to a paper published in a renowned journal, such as Nature, count more because it indicates excellent work. On the other hand, should the citation not count less because of the high visibility of the renowned venue? What is even more important is the question of whether the impact factor of the the venue of the citing paper is as important as the impact factor of the venue of the referenced paper. For example, a reference from an article that is
published in the Nature journal clearly indicates that the cited paper is of high quality.
One last aspect I want to mention here is that journal cross citation is also important. Different academic fields have varying citation potentials which are dependent on aspects such as how quickly a paper will be cited, how long the citation rate will take to peak, the average length of reference lists in a certain field and how long a paper will continue being cited.
The bottom line is that when evaluating individual papers, citation counts can only be used as an aid to provide an objective measure of the utility or impact of academic work. They say nothing about the quality of the work and nothing about the reason for the utility or impact of the work. So why is this such a problem? The big problem is that the scientific enterprise is growing exponentially and with the access to online libraries and research tools available to students nowadays this growth is likely to accelerate. Therefore, it is becoming more and more important that evaluation metrics are devised that can be used to aid (creating shortlists for example) certain peer review processes that are required for performance-based fund allocations or prize recipient adjudications. To give you an example of an application that can greatly benefit from techniques that compute the quality and importance of publications consider online libraries. Their paper recommendation systems could use this addition information to direct researchers to the most appropriate/best papers of a certain topic.
Various methods have been proposed over the years and once popular techniques have been replaced by newer methods. Currently the h-index method, developed by Hirsch, is the de facto technique of calculating (biased) quality and impacts in the academic community. This metric is only applicable to compute scores for an author or a group of authors and not for individual papers. Therefore, the h-index can only be used to compute scores for journals, conferences, individual authors or academic departments. This h-index is a very simple metric that is based on the citation counts of papers directly. The h-index value is dependent on an author’s most cited papers and the number of citations that they have received in other publications. Therefore, the h-index measures both the (biased) quality and quantity of an author’s work. As with all other citation analysis methods that use citation counts directly, the h-index does not account for a lot of the characteristics described above and features that are common to citation networks. For example, the h-index does not consider the number of authors of a paper, the varying citation potentials of different academic fields and is dependent on the total number of publications of authors. The biggest drawback is that the h-index cannot be used to compute scores for individual papers. For more information about the h-index read the Wikipediaarticle. It nicely describes the advantages and drawbacks of this method.
To overcome some of the drawbacks of using citation counts of papers directly various algorithms have been proposed that are all based on traffic models. A very famous algorithm of this type and that has been used on citation networks is Google’s PageRank algorithm. The PageRank algorithm is a very simple algorithm yet I am only going to give an intuitive description of how this algorithm works to keep this blog post clear of any mathematical formulas.
The PageRank algorithm in it’s very basic form calculates the probability that a random surfer who randomly clicks on links on the Internet reaches a certain page. It uses the web graph as input where web pages are nodes and links are directed edges between two nodes. Therefore, the PageRank algorithms can also be applied to citation graphs, where this time papers are nodes and citations are directed edges.
Let’s use “random researcher” instead of “random surfer” since we are in fact talking about citation networks. The intuition behind the PageRank algorithm is that random researchers start a search at some (randomly chosen) nodes in the citation graph and follow citations (randomly) until they eventually stop their search, controlled by a damping factor, and restart their search on a new nodes. The intuition here being that of researchers getting bored and deciding to investigate a new topic.
The PageRank algorithms with certain tweaks can be used to overcome the following problems:
- Recently published papers have not been around very long and therefore have not accrued a lot of citations yet. The basic PageRank algorithm does not address this problem since the random researchers are randomly placed on the citation graph when the search is started or restarted. In order to address this problem one can simply let the initial paper selection be dependent on the publication date of the corresponding paper. This makes sense since researchers typically start investigating a research topic with recently published articles found in journals or conference proceedings and then continue to follow references to older papers.
- Problem two is that the age of citing papers is not taken into consideration. Citations from newer papers should count more than citations from older papers. For example, an old paper which is directly cited by a new paper indicates that it still bears current relevance. Again, the basic PageRank algorithm does not address this problem. Fortunately, one could simple adjust the algorithm by assigning weights to each citation that is based on the citation age (the time between the publication dates of the citing and the cited paper). Random researchers will then chose citations to earlier papers with a higher probability than to older papers.
- The third problem is that citations from popular papers should be regarded more important than citations from less important papers. This problem is intrinsically addressed by the PageRank algorithm since it was developed to calculate the predicted traffic to the web page instead of simply counting the number of links that point to the a web page.
- Citations from papers that were published at prestigious venues should bear more importance than citations from papers published at less renowned venues. Again, the PageRank algorithm does not address this problem directly. One could simply adjust the PageRank algorithm by weighting each edge of the graph with the impact factors of the nodes from which the edge is incident from.
The adaption of the PageRank or similar algorithms boils down to the following questions which are, unfortunately, dependent on the interpretation of what citations actually mean as discussed earlier:
- How should random researchers be positioned on the citation graph when they start or restart their searches. Should a random researcher be randomly placed on any node in the graph or does the (not so random) random researcher prefer nodes corresponding to recent papers or papers published at renowned publication venues?
- Which citation should the random researcher follow to the next paper. Should the decision depend on the age of the citation? Should the impact factor of the the citing or cited paper contribute the decision? Should citation potentials of different academic fields be considered? Should self-citation or team self-citation be penalised?
Conclusion: there is no end in sight for this war of opinions. The academic fallout can only be reduced with the use of some clever algorithms.