Citation Metrics

Two stories caught my attention yesterday. The first was a review of some recent studies of citation practices by field, broadly considered. The claim that alarmed a number of people on twitter was that “82%” of humanities scholarship was never cited. I pointed out that it was a mistake to assume that “never cited” means “never read.” That someone would even make this inference is quite mysterious to me. Let me explain: this semester, I have been teaching, for the first time, a course on the Victorian novel. I am teaching this class because our department’s primary Victorianist has recently become the director of our graduate program and thus was unable to teach a course in her normal rotation. The texts that I assigned were Villette, Bleak House, Lady Audley’s Secret, Daniel Deronda, Jude the Obscure, and Dracula. (That’s about 3500 pp. of reading, which I’m now thinking might have been a bit much.) Since I have never taught any of these texts before, I have read as much scholarship on them as possible for preparation. I estimate that I’ve read at least twenty articles or book chapters per book. Nothing I have encountered in my seventeen years in the profession has led me to believe that there’s anything unusual about this. Professors routinely consult scholarship in preparation for their teaching, including many sources they will never cite in their own scholarship. There are several reasons for this: 1) most people who teach in humanities departments do not publish very much in absolute terms, so they will not be citation-providers. 2) People who do publish scholarship have, most of the time, to teach a wide variety of things that do not have anything to do with their scholarship, yet they read it to prepare. (I’m aware that there are a small number of professors who have not read anything new in x amount of years, but this is mostly a stereotype rarely met in sublunary lands.) 3) Scholars read many things in their research that informs their understanding of their subject that they do not eventually cite.

I know this last point might be the most questionable. Some journals seem to encourage the footnote that mentions a broad range of background sources, but this practice is far from universal. Citations are often given in a rote or formulaic sense, with a ritualistic nod to some authority that may have little bearing on the subject at hand. There is much more to say about this point, but I now want to consider the question of measuring access. It would be normal to wonder if database-access metrics could provide librarians, scholars, and other interested parties with the necessary information to determine whether a source is being used even if it is not being cited. Anyone who has examined their own server logs knows, however, that determining whether or a machine or a human is at the other end of an HTTP request is more difficult than it seems. Paywalls mitigate some but far from all robotic access. I have no idea whether the access statistics that JSTOR provides, for example, make any attempt to separate human requests from robotic ones. A casual examination of the most-accessed lists of various journals will show significant variance from their most-cited lists, which supports my broader point.

The second outrage came from the University of New Hampshire, where thousands of library books were found in a dumpster. I recall being outraged by the discovery of some Nation magazines from the 1930s in the dumpster at the University of Florida when I was a graduate student, but thousands of books is on a different scale. A librarian is quoted in the article as saying that the books in question had not been checked out for a very long time and thus, the implication goes, were not needed. There are many disturbing aspects to this story, but the one that most immediately came to my mind is the difference between a book being checked out and read in the library. I frequently consult and re-shelve books in libraries. I’m not the only one. When I was an undergraduate, I worked in the UNC-Wilmington library, and I know that statistics were collected on books that had to be re-shelved. Even if UNH did this, which is far from clear, it would not count patron re-shelving. (I could be wrong about this, but my impression is that it’s not common for libraries to collect re-shelving data, and I’m not sure to what use it was put at UNCW.)

I understand that space in a library is finite, though why you would throw books away rather than having a community sale/giveaway is beyond my personal comprehension. I understand that Georgia Tech’s library is moving away from a books-included model, and they had conspicuously few to begin with when I taught there several years ago. In the spirit of self-criticism, I will now ask myself whether my interests in visualizing co-citation graphs and the use of quantitative methods for disciplinary history are at odds with my belief that libraries should have books in them and that scholarship is valuable in ways that citation metrics cannot measure.

They are not.

Metadata and Co-Citation Graphs

I will conclude this missive with an overview of some recent advances I’ve made in improving the d3.js co-citation graphs that I first wrote about here. I have experimented with various measures to increase the utility and visibility of these graphs: creating a threshold slider (based on in-degree of co-citation nodes), adding a chronological dimension, adding expandable and contractible hulls around the communities, analyzing the composition of the communities at different threshold levels, and, most recently, adding animations to show how the network grows on a year-by-year basis. (“Animation” is an aspirational term, as I have not yet been able to adjust the underlying data structures in such a way as to use d3.js’s nifty smooth transitions. Elijah Meeks was kind enough at the recent Texas DH conference to show me this <a href=“http://bl.ocks.org/emeeks/9357131")>intriguing code that performs many of the stupendous operations I have in mind. Javascript is far from my best symbolic instruction code, however.)

My most recent advance was adding what (generally cryptic and incomplete) metadata that Web of Science offers about citations to the graphs. I have now created three co-citation graphs that will show metadata on mouseover:

If Web of Science supplied a DOI, you should have a direct link to the article. If not, I provided links to search Worldcat and Google Scholar, though I should probably munge the search string to increase the likelihood of success. Though many famous articles and books can be recognized from author and date, not all can. (And of course this varies depending on your knowledge of the field.) I hope this improves the usefulness of these as an exploratory tool. What conclusions about a discipline’s history and formation can be responsibly drawn from this data? I’ve been thinking about this question and the somewhat related question of how citation metrics can be correlated with topic models for several months now. I don’t have any conclusions interesting enough to share at this moment, but I’m optimistic about the heuristic value of these search-and-visualization tools for pedagogy.