This year’s conference of the Association for Computational Linguistics, the most prestigious event in computational linguistics, had a paper that got me very excited. It’s called Extracting Social Networks from Literary Fiction, and here’s the abstract (emphasis added):

We present a method for extracting social networks from literature, namely, nineteenth-century British novels and serials. We derive the networks from dialogue interactions, and thus our method depends on the ability to determine when two characters are in conversation. Our approach involves character name chunking, quoted speech attribution and conversation detection given the set of quotes. We extract features from the social networks and examine their correlation with one another, as well as with metadata such as the novel's setting. Our results provide evidence that the majority of novels in this time period do not fit two characterizations provided by literacy scholars. Instead, our results suggest an alternative explanation for differences in social networks.

The paper advances a new technique for extracting social networks from text, and uses it on 19th century novels to argue that certain aspects of literary theory about novels might be false. In this post, I’ll explain the analysis to the digital humanities audience and discuss some strengths and weaknesses in the argument.

Written at Columbia University by two computer scientists and one English scholar, this paper contains exciting things to both computational linguists and literature researchers. For computational linguists, it proposes the first ever algorithm for extracting speaker-to-speaker networks from free text. This opens up fascinating new areas of study because it is now possible to computationally analyze interactions between people in a text and not just what they say to each other.

For literary scholars, it suggests two hypotheses from literary theory about community and society in 19th century novels might be false, namely:

Literary studies about the nineteenth-century British novel are often concerned with the nature of the community that surrounds the protagonist. Some theorists have suggested a relationship between the size of a community and the amount of dialogue that occurs, positing that “face to face time” diminishes as the number of characters in the novel grows. Others suggest that as the social setting becomes more urbanized, the quality of dialogue also changes, with more interactions occurring in rural communities than urban communities. Such claims have typically been made, however, on the basis of a few novels that are studied in depth. In this paper, we aim to determine whether an automated study of a much larger sample of nineteenth century novels supports these claims.

To make their arguments, the authors frame the statements above in terms of social networks:

  • If face-to-face time diminishes as the number of characters grows, then the more characters the novel has, the less dense its extracted social network will be.
  • Second, if more interactions occur in rural settings than urban settings, networks from rural novels will be densely connected, but contain fewer characters, but networks from urban settings be large and loosely connected.

