Screwing around

As you can see from the picture below, interest in this blog continues to fall – some may even go so far as to describe it as moribund!

site stats

In a desperate attempt to reverse the slide I’ve resorted to the cheap MailOnline tactic of the clickbait headline for my latest post.

Of course, as students @ #citylis well know, the title is really just wordplay on the title of a recent article by Stephen Ramsay, a keen advocate of the digital humanities and the use of text technologies. In ‘The Hermeneutics of Screwing Around; or What You Do with a Million Books‘, he describes “the anxiety of not knowing the path” through the enormity of the world’s literature and the consequent “debates about canonicity” which have been endemic in his field of Literary Theory since its inception. Ramsay refers to literary scholar, Franco Moretti, who in his latest book, ‘Distant Reading‘, calls for the use of computers as one possible path through this vast digital corpora. Ramsay then goes on to make the case for “serendipitous browsing”, which he describes as “screwing around” and “one of the most venerable techniques in the life of the mind”. He bemoans the fact that if a full text archive (of the type which Google Books hopes to create), does not also include the “vast hypertextual network that surrounds it”, then the result will be text analysis tools which are good for searching but “simply terrible at browsing”. In his opinion, this will result in “a landscape in which the wheel ruts of your roaming intellect are increasingly deepened by habit, training, and preconception. Seek and you shall find. Unfortunately, you probably will not find much else.”

I found this to be a very thought provoking article, particularly the way in which he challenged a lot of my preconceived ideas about information retrieval and the cataloguing of information. Personally, I do still enjoy browsing books on shelves, not knowing what I’m going to find. Catalogues are great for finding what you know you want (or an approximation of it), but I still enjoy the serendipty of finding something new.

We returned to the ideas of Moretti at the end of last week’s lecture. Essentially, he is arguing that since we now have access to vast digital databases of literary work, and effective data retrieval systems, it is now possible to combine the two in order to amass quantitative evidence in the field of literary studies. This is an entirely different approach to the close reading which has for so long been held up as the key technique of the discipline. A point I raised at the time, was that I hoped the baby wouldn’t be thrown out with the bathwater. Clearly, close reading / hermeneutics is still an effective method in and of itself, and I do hope that a ‘cult of the new’ wont lead to its abandonment. Rather, what digital text analysis offers is a new way of examining literature and, in combination with traditional methods, promises an even better way of advancing human understanding.

Which leads me to the final section of this post, in which I’d like to share some of my thoughts about some of the text analysis tools which are available today: Wordle; Many Eyes; and Voyant.

All three operate by analysing inputted text. To carry out these evaluations I chose to copy and paste the text results of my TAGS analysis of the Twitter hashtag #libchat, and my Altmetrics search on Civil Rights (discussion of TAGS and Altmetrics can be found in my previous posts, ‘TAGS: a reappraisal‘ and ‘Hello, can you hear me?‘).

First up is Wordle, a much maligned, but very simple to use application which generates word clouds from inputted text. Julie Meloni, in her 2009 article, ‘Wordles, or the gateway drug to textual analysis‘ praises it for its ability to help introduce a topic, help students discover key words and topics they may not have otherwise noticed, and help students to reflect on their own writing and word choices.

I pasted in the list of Civil Rights articles which I’d found previously using Altmetrics. The word cloud you can see below, was partially useful, in that it showed two main things; firstly, that the overwhelming focus of articles is on the Movement itself; and secondly, that there was also a fairly broad hinterland in terms of other themes addressed, such as race, women and war.

Civil Rights wordle

Wordle is not without its detractors though. Jacob Harris, in his 2011 article, ‘Word clouds considered harmful‘ even goes so far as describing them as “the mullets of the internet”. His main criticism is that because they’re so easy to create, they are often used as filler visualisations which either support only the crudest sort of textual analysis, or are applied to complex situations where textual analysis isn’t appropriate.

In my view, there is benefit, but like with all things, it does require a bit of thought in order for it to be used to best effect.

Next up is Many Eyes, a package which promised much, but didn’t always deliver. As with Wordle, I began by importing my data set.

many-eyes data set

I then used Many Eyes to produce a word cloud, which I believe gives greater clarity than Wordle.

many-eyes word cloud

I also produced a bubble chart, which was helpful in showing the frequency of words used in the titles of Civil Rights articles harvested by Altmetrics.

many-eyes civil rights bubble

Many Eyes has a good range of visualisations on offer, the problem however, is that I found it quite slow, buggy at times and not intuitively as straight forward as the final tool I used, Voyant.

This was by far my favourite of the three, offering good usability and a fine range of options. As with the above, I experimented using data sets from both TAGS and altmetrics. A particularly useful feature of Voyant is the fact the you can exclude stop words including custom words of your choice. This can allow you to really focus in on what the text shows. Take the following two images for example. In the first, a word cloud was produced for the TAGS Twitter analysis of the #libchat hashtag, using stop words. We can see that it is dominated by the name of the hashtag, http, and twitter codes (rt and, which doesn’t really tell us anything at all about the content of the tweets.

voyant libchat tags cloud 2 with stop

The second image shows the cloud with these aforementioned phrases excluded. Now we can focus more clearly on what is being tweeted; links to other hashtags, and discussions surrounding libraries, writing and learning for instance. This is much more useful information to a researcher.

voyant libchat tags cloud 2 with stop and mods

The final features I’ll describe are the fact that Voyant examines the frequency of words in the corpus (particularly useful when used with stop words), and also shows word trends in the form of a line graph. As you can see below, by searching for the trends of Civil, Rights, Movement and Act, we’re able to discern that articles on Civil Rights tend to focus more on the movement itself, rather than the 1964 Civil Rights Act.

voyant civil rights word trends

All in all, a useful tool and one I look forward to using more in the future with greater confidence and skill.





This entry was posted in Text Analysis and tagged , , , , , , , , , , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s