RNN M/m

Sat, Jun 24, 2017

Like many netizens, I was amused by Andrej Karpahty’s “The Unreasonable Effectiveness of Recurrent Neural Networks” when it first appeared. I don’t mean the explanation of what a recurrent neural network is or the claim that there’s much wisdom in Paul Graham’s essays. The text-generation samples, however, were really neat. RNN text-generators power many bots on the social media platform known as “twitter,” and I suspect that they may also be used in commercial solicitations. They’ve stimulated a lot of commerce and experiment, in other words. What more can you ask of software? Computer scientists put them to work almost immediately, as is natural, at generating computer science papers. Karpathy’s post has a few examples, and there are many others.

To the best of my knowledge, however, no one has tried to train an RNN on modernist scholarship. Last year, I wrote a study of some various methods that could be used to quantify and analyze one important journal in modernist studies. I needed the entire run of the journal in plain-text to be able to finish that project, so it was handy. One of the things that most users discover about neural networks quickly—and to their bitter disappointment—is that their processing requirements far exceed their computing resources. Statisticians teach us that the sample is greater than the whole, but even a 5% sample of the journal would have taken days to train on my (once) top-of-the-line Macbook Pro.¹

I extracted the sample nonetheless and began the tedious process of renting an Amazon Web Service instance and running the code there. The bottom of this post contains some technical details if you’re interested. What resulted? And why bother doing this at all? Let me start there. I’m writing a book chapter about Samuel Beckett’s combinatorial interests compared to the then-current thinking about computation. Beckett was very interested in automatic writing, writing machines, and the mechanical generation of text. There are many passages of his writing that show attempts at replicating mechanical writing; Watt contains most of my main examples. I also delivered an MLA paper last year entitled “The Automation of Scholarship.” I have not published that paper, even on my personal blog, as I think I may use part of it in some larger bit of writing. The argument, surprisingly enough, was that automation is coming to scholarship too.

I think it’s at least possible that in some fields, with a large training sample and some editing, that an RNN-generated paper might be able to fool someone in a way comparable to the Sokal paper.² It may have already happened with certain OA journals whose peer-review protocols may not align with prevailing standards. Again, I haven’t thoroughly investigated the matter. So: what did this experiment with Modernism/modernity produce? Could it generate text that could pass for what’s published in the journal? No. It does, however, generate text that I think is a plausible, perhaps uncanny, simulacrum of certain aspects of its style.

Here is an example:

In the narrator of the late 1950s, which he had succeeded in the first prominent contributions of the state of the past and the status of the play, which are the form of place in the first time the production of the press is a protagonist in the page, the companion of the first painting of the distance of the particular strategies of the audience and contradictions in the state of painting. The status of art history books were all the interpretation of the female status of art and insistence on the press and the most inexacted content of the possibility of the heroic of the power to an interest in the subject of a full of the authority that survey of the present space and specific contexts [. . .]

That may not sound much more coherent than a simple Markov chain procedure. I won’t dispute the point, but this process generates mostly grammatical English on a character-by-character basis. It opens and closes quotes, generates footnotes formatted in the style of the original, hallucinates publishers and place names, and is capable of many feats theoretically beyond the power of simpler methods. The above sample was generated with a low temperature (0.4), which is a parameter that roughly controls deviation from the original. Here is a passage from the same checkpoint with a temperature of 0.85:

It is her characteristics were acquired and enters the sense of interpretation is the significance of the modern life. There is a preface to any disconfigure in a world as dead. This is a barrier that ever sent to what have not to be theatrical obedience of the printed international thinking. Iron generates both changes that arise for it was done most obviously [. . .]

You can see more metaphorical freedom here. I might use “iron generates both changes” in some appropriate context. But should I attribute it? The guides are silent on this matter as on so many others. The samples above were generated with a few thousand words. If I ask it to produce a sample the same length as the articles published in the journal (10–12K words), it will produce footnotes and other citation-like matter:

No suspect the object of the subject is an audience needed to identify the peasant in imperialism. In the country as a “one-pears looking their value” (ibid.). We will call him to read the nature of the title and images. They were printed in the page expressive to the photographs of a scene in the mid to the start.

Ibid.!

See, for example, in 1927 by Stein, “The Sexual Archaeology of Lions in the Liberation of Kapphysis” (1969), 44.

Wouldn’t you read a paper with this title? The output of the model captures, for the most part, something of the inherent style of the journal. I say this as someone who’s read it quite carefully, both in the normal way and through various mechanical means, over the years. No one would mistake it for actual scholarship, and I don’t think parody is the right term for what it produces. With higher temperatures, there is an effect remarkably similar to Lucky’s monologue, which I find stimulating in several ways.³

Semi-Technical Appendix

My first attempt at generating text with an RNN was with the keras text-generation example. I found this easier to install for a variety of reasons. I was able to run it on both my laptop and an Amazon AWS without much trouble. The difficulty was that I believe something is wrong with the code. On every attempt, with many different training sets, the loss would spike in the middle of the training iterations and never recover. I would file an issue about this, but I haven’t yet discovered if it’s some version conflict, though I can’t understand why it would manifest itself this way. I don’t think it’s overfitting, given the magnitude of the problem, but it could be.

Much better results were achieved using torch-rnn. The problem is that this requires more dependencies to set up. If you use an AWS EC2 instance, make sure that you select an AMI that includes CUDA 8.0 or higher. There are probably AMIs with the appropriate CUDA and the lua torch framework that this needs already installed, but I didn’t immediately find them in the Northern Virginia region. Unlike the keras example, the output will also require you to install torch-rnn on your personal computer to generate, unless you want to do all of it on the remote server. I wouldn’t recommend that. I used a g2.2x instance, which costs at the time of writing about $0.65/hour, but the hours will add up. Compiling all of the necessary software for torch-rnn can take thirty minutes or more, if you can’t find a suitable image. I did not try docker or other virtual machine approaches; they very well might be quicker and more reliable.

Another caveat for installing torch-rnn on a Macbook Pro: I was unable to get it to work with my version of clang and CUDA. Extracting samples from a training session is simple and does not require GPU assistance, so I would recommend building your personal copy without it. Again, this only applies to people who have machines with weak and inconspicuous graphics cards (and Mac OS).

You will read a lot about the power of GPUs for these stupendous calculations, but not all GPUs are created equal. The one supplied with my MBP is actually slower than the CPU for these calculations as far as I can tell. Fancier ones, such as those you can rent, or that gamers use in their ‘rigs,’ do get better results. ↩︎
A physicist, Alan Sokal, submitted an article to the journal Social Text. The editors, while perplexed, trusted his sincerity and published it. It turned out that he had other motives. I offer this explanation for the benefit of younger readers. ↩︎
I also posted some tweets showing different examples of the model’s output: they are threaded here. ↩︎