Fri Sep 23, 2016

Like many habitual internet users, I strongly believe that I have never bought anything advertised to me on the web, nor have any of these ads affected my behavior beyond momentary irritation. I sometimes take ad-blocking steps and am well aware of cookies, browser-entropy measures, and the wily IP address. My disdain for the so-called “Flash” plugin is complete. What, then, could a book primarily focused on the marketing models used by data scientists to target consumer behavior on the web tell me? Nothing that I didn’t already know, right?

The argument from overconfidence in prior knowledge is all-too familiar for those of us who dabble in computational work in the humanities. I did indeed learn much from Cathy O’Neil’s Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. I’m not in fact naive enough to believe that marketing on the internet has not affected my behavior. One example: I visit Austin several times a year. What I believe to be a gas-station skimmer on HWY 71 stole my credit card number at least twice. Despite the fact that I’ve never made a late payment (at least not in a long time—I was once in graduate school, after all), my credit limit was soon reduced. These ‘models’ apparently can infer that if I’m stupid enough to get burned by the same rural gas station scam twice, they should go ahead and limit their exposure. And I can’t blame them.

I learn from O’Neil that one major credit-card insurer starts reducing credit limits based on where people shop. It wasn’t mine, but who knows? Maybe that was it. In any case, these WMDs are not only on the internet. They are used for teacher evaluations, college rankings, mortgage applications, and sentencing hearings. In every case, they make grievous errors. O’Neil documents many of these atrocities, though the ones about teacher evaluations and college rankings caught my particular notice. Perverse incentives often result from overvalued models. With college rankings, enterprising administrators simply fabricate data, encourage applications from students they plan to reject, and gear tactical initiatives around the rankings rather than rational and collaborative planning. O’Neil emphasizes that statistical models are only as good as the data they are trained on; and when that data documents pervasive bias, the models reinforce it. At several points O’Neil writes about the potential of data models to eliminate existing bias. Mortgage applications would be one area. An algorithm, hypothetically, would not engage in “red-lining” behavior. But the more complex the algorithms become, and the more extensive the data that they are trained on, the more likely they are to replicate existing inequalities.

Examples of this abound in O’Neil’s book, ranging from mortgage applications to criminal recidivism predictors. I found myself wondering at various points if the actual issue here was simply neoliberalism. It is the marketization that “weaponizes” a data model, after all. The more advanced data models applied to economic activity always remind me of arguments from complexity against economic planning. Finding evidence of hypocrisy there is too cheap to be rewarding, true. A more personal issue is the degree to which I am complicit in the data-driven assault of all-against-all through my aforementioned dabbles in computational approaches. I am very interested in the ability of models such as O’Neil describes to model genre and character in at least “medium data”-sized buckets of prose. O’Neil left academia to work at a hedge fund and then become a data scientist focused on internet marketing. I get the impression that some believe that using some of the same methods for humanities research is bringing the hedge fund into academia. (Not its money, of course, but its values.)

I can’t resolve those complex issues here. O’Neil’s book is not the only thing I have read about these issues, and much of the subject matter was familiar. It is a well-written guide for those who haven’t been following it closely. I think it would teach well, especially in a first-year writing class. Though I know this sounds like one of those foolish generalizations about the kids these days that olds such as myself can’t stop making, I have rarely encountered undergraduates who have thought much about data privacy and tracking from private corporations. The NSA and the like is a different matter, of course. My only criticism of the book, which is perhaps founded on my own ignorance, is that I think that O’Neil may slightly overestimate the effectiveness of internet marketing, even with its fancy models and petabytes of data. After all, I’ve never bought anything because of it.*

*Disclosure: O’Neil’s book was suggested to me by the Amazon recommendation engine, after which I immediately bought it using the popular “one-click” technology.