When the stories add up: the six narrative arcs in fiction

In recent years, literature has been getting attention from an unusual quarter: mathematics. Alongside statistical physicists analysing the connections between characters in the Icelandic sagas, and computer scientists exploring the life and death of words in English fiction, a team of mathematicians at the University of Vermont have now looked at more than 1,000 texts to see if they could automatically extract their emotional arcs. And their results show something interesting, not just about narratives, but also about using this approach to study literature. 

The Vermont researchers worked with test subjects to create a program capable of assigning emotional value – positive, negative or neutral – to words. ‘Terrorist’ is rated negative in the program’s word bank, while ‘win’ is positive. Then they selected texts from the massive volunteer effort to digitise books known as Project Gutenberg, which currently exists as a repository of public-domain writings. Finally, the researchers ran a series of analyses to chart the shape of the emotional arcs in the texts.

And indeed, according to the paper put up on ArXiv.org in June 2016, some patterns showed up again and again. About 85 per cent of the works that the researchers looked at could be separated into six groups. Some of the groups lent themselves to colourful names – such as ‘Icarus’, for an emotional type that rises, then falls; and ‘Rags to Riches’, for one that starts negative and then rises. Some of Gutenberg’s most-downloaded works fit the ‘Cinderella’ model, with a rise, a fall, and a rise. You can see how you might start to draw conclusions about what stories play best, or how small the true number of arcs in human storytelling is.

But looking closer at the books originally included in the study, you might start to question the reliability of those results. To begin with, the analysis used not only Robinson Crusoe and A Christmas Carol, but books such as Notes on Nursing and A History of Art for Beginners. A compilation of Hans Christian Andersen tales was handled as if it was a single story, rather than a series of stand-alone narratives. The book that fit the Icarus arc best was a collection of 196 yoga sutras. Another odd marriage was the Cinderella arc and its top fit: Boethius’ The Consolation of Philosophy.

Something is not quite right here, and indeed, this is one of the difficulties of doing automated analysis. It is a touchy business to take a large chunk of information, like all the books available on Project Gutenberg, and filter them so that the answers you get match the question you think you’re asking. Andrew Reagan, the graduate student who is the paper’s lead author, readily agrees – even getting to this hodgepodge of texts took a great deal of weeding on his part. Project Gutenberg, after all, is thick with dictionaries and poems and even the text of the Human Genome Project, all of which had to be removed.

Since June, when he first put the paper online, Reagan has received advice and tips on how to do a better job of filtering the data. For instance, he’s learned how to access the Library of Congress classifications for the books on Project Gutenberg. That’s made all the difference: ‘I was able to use that and select for just full works of English fiction,’ he said, so that his latest, revised version of the paper, put up this September, uses only those.

As it happens, the same categories still show up. And they still cover about 85 per cent of the stories. But that goes to show that the patterns aren’t exclusive to works of fiction, as one might have assumed if the group had looked only at verified fiction at the beginning. It’s hard to know how to interpret these arcs without knowing exactly why they exist, or what they might represent from the readers’ perspective.

In the meantime, the Vermont group is working on getting detailed information about texts digitised by Google Books, which should yield more data on stories published during the previous century in the United States. The Google data should make it possible to take books from a certain period and compare them with books from the same place at a different time, or another place at the same time, to see if interesting conclusions can be drawn. And future results might also sketch out the archetypal emotional shapes of certain genres – detective fiction, for instance, or romance.

Stepping back, there is a bigger, overarching question here. Are there, in fact, surprises to be stumbled on in this way? Can using computational tools to digest far more literature than a single human could read in the same amount of time tell us things we’d never have noticed on our own? It’s hard to know. But when you think of the time it would take to read every novel on Project Gutenberg, and the skill and effort required to describe what patterns are there, you can see why some people, at least, think it is worth a try.Aeon counter – do not remove

Veronique Greenwood

This article was originally published at Aeon and has been republished under Creative Commons.

How Is A Writer Different From Other People?

Well, one has the urge, first of all, to order the facts one observes and to give meaning to life; and along with that goes the love of words for their own sake and a desire to manipulate them. It’s not a matter of intelligence; some very intelligent and original people don’t have the love of words or the knack to use them effectively. On the verbal level they express themselves very badly.

Said Huxley to the Paris Review interviewer.

The Bridging Problem

Before J.M. Coetzee opens Elizabeth Costello he talks about the problem of getting us from where we are, which is, as yet, he says, nowhere, to the far bank. He calls this a simple bridging problem, a problem of knocking together a bridge. People, he writes, solve such problems every day. They solve them, and having solved them push on[1].

[1] In Which the Author Speaks to the Bridging Problem

When Coetzee writes about his narrator’s ‘simple bridging problem’ (Coetzee, Elizabeth Costello: Eight Lessons 1) he uses the pronoun ‘we,’ suggesting that author, narrator, characters, and reader will journey into and through the fiction together. His is a philosophical novel, a method of discovery, an experiment (Coetzee, Slow Man) and this requires all parties involved in the novel – author, narrator, characters, reader – to be active and attentive. The author himself must do much of the work alone; Coetzee does not explicitly state who is responsible for the building of the bridge – he uses the passive construction – the bridge is built – thus removing the bridge builder from centre stage and, instead, inviting the reader to join him and, together as ‘we,’ to cross it, arrive in the territory where ‘we’ want to be, the territory of Coetzee’s novel, of the fiction, and once arrived, to put the bridge out of ‘our’ minds and not to worry about the gap between what is real and what is fiction.

Coetzee’s bridging problem, when solved, bridges the gap between reality and fiction so that he and the reader can get to the other side, which is where they want to be, because that is where the essential story takes place. Kundera calls this ellipsis, omission, condensation. He talks about the method of omitting and condensing much of the work of realism. Kundera is responding to the work of Broch and Musil, but mainly Broch who developed the (‘ill-defined’, for Kundera) ‘polyhistorical novel’ which, by virtue of being a novel which ‘brings together every device and every form of knowledge in order to shed light on existence’ (Kundera, The Art of the Novel), is very close to the approach adopted by the author here. Kundera argues that in order to understand the ‘complexity of human existence’ one needs to master the art of ellipsis, which requires to ‘always go directly to the heart of things’ (Kundera, The Art of the Novel). The influence of Broch via Kundera and of course the practice of Coetzee has offered solutions to this author’s creative form, which include the courage to strip away unessential elements of geography, architecture, and back story (unessential, because to include details about country and city and suburb and streets would be to change the subject, to tell a different story), to unite philosophy and narrative (Broch’s ‘novelistic counterpoint’), and to remain, in essaying within the narrative (Broch’s ‘novelistic essay’) always hypothetical and playful; never didactic.