Citations to Women in Theory

After reading Kieran Healy’s latest post about women and citation patterns in philosophy, I wanted to revisit the co-citation graph I had made of five journals in literary and cultural theory. As I noted, one of these journals is Signs, which is devoted specifically to feminist theory. I didn’t think that its presence would skew the results too much, but I wanted to test it. Here are the top thirty citations in those five journals:

Creating a Threshold Slider

I wanted to modify this script by Neal Caren to create an adjustable graph that allows you to control the threshold of citations for nodes that will appear on the graph. If for example, you wanted to see only those nodes with twenty or more citations, you can just move the slider over to see those, and the data will automatically update.

I have created three of these: Modernist Journals, Literary Theory, and Rhetoric and Composition. I’m sure there are several ways of going about doing this, and I’m equally as sure that mine is far from the most efficient or practical.

Citational Network Graph of Literary Theory Journals

I’ve been interested in humanities citation analysis for some time now, though I had been somewhat frustrated in that work by JSTOR pulling its citation data from its DfR portal a year or so ago. It was only a day or two ago with Kieran Healy’s fascinating post on philosophy citation networks that I noticed that the Web of Science database has this information in a relatively accessible format. Healy used Neal Caren’s work on sociology journals as a model. Caren generously supplied his python code in that post, and it’s relatively straightforward to set up and use yourself.*

Dying Rabbits

I checked back in to Project Rosalind a few days ago, and I noticed that they had added several new problems. One was the familiar Fibonacci sequence, beloved of introdutory computer science instruction everywhere. There was also a modified version of the Fibonacci problem, however, which requires you to compute the sequence with mortal rabbits. (The normal Fibonacci sequence is often introduced as an unrealistic problem in modeling the population growth of immortal rabbits.)

Interpreting Topics in Law and Economics

Of the many interesting things in Matthew Jockers’s Macroanalysis, I was most intrigued by his discussion of interpreting the topics in topic models. Interpretation is what literary scholars are trained for and tend to excel at, and I’m somewhat skeptical of the notion of an “uninterpretable” topic. I prefer to think of it as a topic that hasn’t yet met its match, hermeneutically speaking. In my experience building topic models of scholarly journals, I have found clear examples of lumping and splitting—terms that are either separated from their natural place or agglomerated into an unhappy mass. The ‘right’ number of topics for a given corpus is generally the one which has the lowest visible proportion of lumped and split topics. But there are other issues in topic-interpretation that can’t easily be resolved this way.

Recent Developments in Humanities Topic Modeling Matthew Jockers's Macroanalysis and the Journal of Digital Humanities

1. Ongoing Concerns Matthew Jockers’s Macroanalysis: Digital Methods & Literary History arrived in the mail yesterday, and I finished reading just a short while ago. Between it and the recent Journal of Digital Humanities issue on the “Digital Humanities Contribution to Topic Modeling,” I’ve had quite a lot to read and think about. John Laudun and I also finished editing our forthcoming article in The Journal of American Folklore on using topic-models to map disciplinary change. Our article takes a strongly interpretive and qualitative approach, and I want to review what Jockers and some of the contributors to the JDH volume have to say about the interpretation of topic models.

Topic Models and Highly Cited Articles Pierre Nora's Between Memory and History in Representations

I have been interested in bibliometrics for some time now. Humanities citation data has always been harder to come by than that of the sciences, largely because the importance of citation-count as a metric has never much caught on there. Another important reason is a generalized distrust and suspicion of quantification in the humanities. And there are very good reasons to be suspicious of assigning too much significance to citation-counts in any discipline.

Learning to Code

One of my secret vices is reading polemics about whether or not some group of people, usually humanists or librarians, should learn how to code. What’s meant by “to code” in these discussions varies quite a lot. Sometimes it’s a markup language. More frequently it’s an interpreted language (usually python or ruby). I have yet to come across an argument for why a humanist should learn how to allocate memory and keep track of pointers in C, or master the algorithms and data structures in this typical introductory computer science textbook; but I’m sure they’re out there.

The Awakening of My Interest in Annular Systems

I’ve been thinking a lot recently about a simple question: can machine learning detect patterns of disciplinary change that are at odds with received understanding? The forms of machine learning that I’ve been using to try to test this—LDA and the dynamic LDA variant—do a very good job of picking up the patterns that you would suspect to find in, say, a large corpus of literary journals. The model I built of several theoretically oriented journals in JSTOR, for example, shows much the same trends that anyone familiar with the broad contours of literary theory would expect to find. The relative absence of historicism as a topic of self-reflective inquiry is also explainable by the journals represented and historicism’s comparatively low incidence of keywords and rote-citations.

Two Topic Browsers

Ben Schmidt, in a detailed and very useful post about some potential problems with using topic models for humanities research, wondered why people didn’t commonly build browsers for their models. For me, the answer was quite simple: I couldn’t figure out how to get the necessary output files from MALLET to use Allison Chaney’s topic modeling visualization engine. I’m sure that the output can be configured to do so, and I’ve built the dynamic-topic-modeling code, which does produce the same type of files as lda-c, but I hadn’t actually used lda-c (except through an R package front-end) for my own models.