Film Studies Co-Citation Network

Sun, Sep 15, 2013

I’ve created several new co-citation graphs recently. While I enjoy looking at the visualizations, I haven’t yet analyzed any of them thoroughly. The film studies network was intriguing to me for several reasons, and I’m going to explore it now in more detail.

I downloaded just over 12K articles from various film studies journals in Web of Science. The journals are Sight and Sound; Film Comment; Literature/Film Quarterly; American Film; Cinema Journal; Screen; Historical Journal of Film, Radio, and Television; Journal of Popular Film & Television; Wide Angle; Film Quarterly; Journal of Film and Video; Film Criticism; and Quarterly Review of Film & Video. Not all the journals are represented equally in the database. The following graph shows their distribution:

There are 147,800 total citations (the unique number of citations is lower) in these 12K articles. At the lowest co-citation threshold I calculated (“3” on the slider), there are 1630 citations, or 1.10% of the total. A ranked list of these citations can be found here. And here is a graph of the top twenty-five citations:

At the highest threshold (“20”), there are 126 citations. This distribution and the distribution of the number of communities identified in each citation-threshold appears to be the typical rank-order distribution often encountered in citation analysis. Here is a graph showing the distribution of communities within each threshold:

In the lowest threshold, there are 235 separate communities. The vast majority, as this graph shows, contain only two members:

The largest of these communities contains 160 nodes, and it is colored green on threshold three of the graph. It contains a wide variety of film theorists, plus a strong proportion of writers from other disciplines who have influenced film theory at various stages (Freud, Lacan, Althusser,Foucault, etc). There are no films in it that I see. The next largest community contains 128 nodes, and it comprises another theoretical cluster. The theorists here are closer to film and media studies in general (there is a high concentration of Baudrillard, for example). This cluster’s color is a “beige, buttery yellow” in Clancy’s words, though I might call it more of a soul-bleaching orange. Your perceptions may vary.

Next down on the list at 95 citations is a pure film theory cluster, colored brown on the graph. David Bordwell occupies a central position here. The next cluster is entirely films: Coppola, Scorsese, Kazan, Lucas, Spielberg, Malick, and several others. It is highly isolated from the center nodes of the graph, and it may not even display on your monitor (depending on your resolution). (The “force” attribute of the D3 force-layout setting controls how strongly the nodes repel each other. Set this too high, and some nodes will bounce away beyond visibility from each other on a complex graph. Too low, on the other hand, and the clusters will be so tight that they are illegible.)

The next cluster, at 81 citations, is a strongly French group of film theory: Barthes, Metz, and Bazin are all well represented. This cluster is pink on the graph, and it is also tightly centralized. Judith Butler and Eve Sedgwick are strongly represented in the next cluster (79 citations). Perhaps curiously, Romero’s Dawn of the Dead is also here. This cluster is gray. Hitchcock has his own cluster (51 citations), which is pink on the graph. As I mentioned on twitter, it is noteworthy that Hitchcock’s films connect to the large film cluster mentioned above via John Carpenter’s Halloween.

The last notable cluster I will consider at this threshold is bright blue on the graph. Its main hub is Miriam Hansen’s Babel and Babylon: Spectatorship in American Silent Film (Harvard 1991). If we move to the opposite side of the slider, at the twenty-threshold, Hansen appears in the largest remaining cluster with the other surviving film theorists. Raising the threshold results in a gradual synthesis of citational categories; Hitchcock remains–along with Welles, Antonioni, and Truffaut–and Lucas, Scott, Altman, and Scorsese are still in their own cluster. Jameson’s The Political Unconscious is in a different cluster than his work on film at the final threshold. And Deleuze is left alone at last in his own co-citational island.

My next project will involve exploring the Louvain community-detection algorithm that this script uses (I should remind readers that I am using Neal Caren’s code that is described in this post.) I also intend to load these data sets into gephi and sci^2; I was very impressed with the work that Scott Weingart describes here, for example. Evolving measures of betweeness centrality as the node-threshold is raised, for example, might be worth exploring. I could also experiment with adding or subtracting journals from this corpus and cleaning up the data that’s there.

Another serious problem is that I don’t yet know the D3.js framework well enough to do things like isolate a particular community and hide the other nodes when it is selected. I can see that this is possible, but I’m not familiar enough yet with javascript or the D3 data structures to be able to implement it. Many of the graphs that I’ve created have far more data in them than Caren’s or Healy’s, and, while I created the sliders to help with the overcrowdedness, there are still better solutions.