After reading Kieran Healy’s latest
post
about women and citation patterns in philosophy, I wanted to revisit the
co-citation graph I had
made of five journals in literary and cultural theory. As I noted, one
of these journals is Signs, which is devoted specifically to feminist
theory. I didn’t think that its presence would skew the results too
much, but I wanted to test it. Here are the top thirty citations in
those five journals:
I wanted to modify this
script
by Neal Caren to create an adjustable graph that allows you to control
the threshold of citations for nodes that will appear on the graph. If
for example, you wanted to see only those nodes with twenty or more
citations, you can just move the slider over to see those, and the data
will automatically update.
I have created three of these:
Modernist
Journals,
Literary
Theory, and
Rhetoric
and Composition. I’m sure there are several ways of going about
doing this, and I’m equally as sure that mine is far from the most
efficient or practical.
I’ve been interested in humanities citation analysis for some time now,
though I had been somewhat frustrated in that work by JSTOR pulling its
citation data from its DfR portal a year or so ago. It was only a day or
two ago with Kieran Healy’s fascinating
post
on philosophy citation networks that I noticed that the Web of Science
database has this information in a relatively accessible format. Healy
used Neal Caren’s
work on
sociology journals as a model. Caren generously supplied his python
code in that post,
and it’s relatively straightforward to set up and use yourself.*
I checked back in to Project Rosalind
a few days ago, and I noticed that they had added several new problems.
One was the familiar Fibonacci sequence, beloved of introdutory computer
science instruction everywhere. There was also a modified version of the
Fibonacci problem, however, which requires you to compute the sequence
with mortal rabbits.
(The normal Fibonacci sequence is often introduced as an unrealistic
problem in modeling the population growth of immortal rabbits.)
Of the many interesting things in Matthew Jockers’s Macroanalysis, I
was most intrigued by his discussion of interpreting the topics in topic
models. Interpretation is what literary scholars are trained for and
tend to excel at, and I’m somewhat skeptical of the notion of an
“uninterpretable” topic. I prefer to think of it as a topic that hasn’t
yet met its match, hermeneutically speaking. In my experience building
topic models of scholarly journals, I have found clear examples of
lumping and splitting—terms that are either separated from their
natural place or agglomerated into an unhappy mass. The ‘right’ number
of topics for a given corpus is generally the one which has the lowest
visible proportion of lumped and split topics. But there are other
issues in topic-interpretation that can’t easily be resolved this way.
1. Ongoing Concerns Matthew Jockers’s Macroanalysis: Digital
Methods & Literary History arrived in the mail yesterday, and I
finished reading just a short while ago. Between it and the recent
Journal of Digital Humanities
issue
on the “Digital Humanities Contribution to Topic Modeling,” I’ve had
quite a lot to read and think about. John
Laudun and I also finished editing our
forthcoming article in The Journal of American Folklore on using
topic-models to map disciplinary change. Our article takes a strongly
interpretive and qualitative approach, and I want to review what
Jockers and some of the contributors to the JDH volume have to say about
the interpretation of topic models.
I have been interested in bibliometrics for some time now. Humanities
citation data has always been harder to come by than that of the
sciences, largely because the importance of citation-count as a metric
has never much caught on there. Another important reason is a
generalized distrust and suspicion of quantification in the humanities.
And there are very good reasons to be suspicious of assigning too much
significance to citation-counts in any discipline.
One of my secret vices is reading polemics about whether or not some
group of people, usually
humanists or
librarians,
should learn how to code. What’s meant by “to code” in these discussions
varies quite a lot. Sometimes it’s a markup language. More frequently
it’s an interpreted language (usually python or ruby). I have yet to
come across an argument for why a humanist should learn how to allocate
memory and keep track of pointers in C, or master the algorithms and
data structures in this typical introductory computer science
textbook;
but I’m sure they’re out there.
I’ve been thinking a lot recently about a simple question: can machine
learning detect patterns of disciplinary change that are at odds with
received understanding? The forms of machine learning that I’ve been
using to try to test this—LDA and the dynamic LDA variant—do a very
good job of picking up the patterns that you would suspect to find in,
say, a large corpus of literary journals. The model I built of several
theoretically oriented journals in JSTOR, for example, shows much the
same trends that anyone familiar with the broad contours of literary
theory would expect to find. The relative absence of historicism as a
topic of self-reflective inquiry is also explainable by the journals
represented and historicism’s comparatively low incidence of keywords
and rote-citations.
Ben Schmidt, in a detailed and very useful
post
about some potential problems with using topic models for humanities
research, wondered why people didn’t commonly build browsers for their
models. For me, the answer was quite simple: I couldn’t figure out how
to get the necessary output files from MALLET to use Allison Chaney’s
topic modeling visualization engine.
I’m sure that the output can be configured to do so, and I’ve built the
dynamic-topic-modeling code, which does produce the same type of files
as lda-c, but I hadn’t actually used lda-c (except through an R package
front-end) for my own models.