The development of the world wide web has changed the ways in which almost all professions operate, to some degree or another, but few as profoundly as that of Librarianship. Prior to the 1990s, the primary way to find information in a library was to consult its catalogue. This ‘fixed database’, takes the unstructured data of the library’s collection (names of authors, titles of books etc) and tabulates it into structured information which has far greater functionality to the user. This database approach to the problems of organising, searching for and retrieving information is still fantastically useful in many circumstances, and is largely down to the ease of use of Structured Query language (SQL) in conjunction with Relational Data Base Management Systems (RDBMS).
Foucault argued in his studies of epistemology, just like Levi-Strauss and other intellectuals labelled as ‘Structuralists’, that we think in tables, in which things are compartmentalised and set as binary oppositions. Likewise, our DITA exercise this week required us to log the results of our activity on an Excel spreadsheet, partly to get us to think in a tabular way. It’s this human desire for order and structure, which is why I believe that data bases will remain with us for a very very long time.
So far, so straightforward. Where the problems arise however, are when you are seeking out information from beyond the confines of a database. The internet, by connecting together the world’s computers has led to an explosion in the amount of information being created and in the accessibility to that information. The website www.evolutionoftheweb.com graphically illustrates this growth. The number of global users of the world wide web has gone from almost 40 million in 1995, to approximately 2.3 billion by 2011, and global internet traffic has grown from 1 petabyte per month to 27,483 petabytes per month over the same period. This presents great opportunities and also, potentially, great problems.
Even as early as 1945, Vannevar Bush, in his seminal article, ‘As We May Think’ identified the problem of processing such a volume of information:
“There is a growing mountain of research. But there is increased evidence that we are being bogged down today as specialization extends. The investigator is staggered by the findings and conclusions of thousands of other workers—conclusions which he cannot find time to grasp, much less to remember, as they appear.”
He recognised that the traditional methods of sharing information were no longer ‘fit for purpose’:
“Professionally our methods of transmitting and reviewing the results of research are generations old and by now are totally inadequate for their purpose.”
He concluded by suggesting that new computer technologies, could transform the situation and help overcome these barriers:
“there are signs of a change as new and powerful instrumentalities come into use.”
Information Retrieval is one of the means by which we can deal with data beyond a database; its most common form is that of the Internet Search Engine. This differs from the results of searching a database in a very significant way. Where two people making the same query of a database would receive the same results, two people making the same query in a search engine would be given differing results. This is all down to the fact that in Information Retrieval, the responses are all subjectively relevant; if you were using Google for instance, your previous searches would all be taken into account.
I’ll discuss Information Retrieval in more depth over the course of this blog, but I’d like to conclude by returning to a point that I made earlier, that the world wide web presents difficulties as well as opportunities. Yes, it’s true that there is an increasing symbiosis between users’ needs and search technologies, (as our DITA exercise showed, even natural language queries result in a fair degree of relevance these days), which makes it appear as if Information Retrieval may well help to overcome the problems highlighted by Bush; yet I read a blog post from the British Library’s web archive blog, which raised some serious concerns in my mind. The UK Web Archive has been archiving pages since 2004 and seeks to prevent the creation of a “potential digital black hole”. Recently they conducted an exercise in which they wanted to see how well they had achieved their objective. They checked to see whether the URLs they had archived were still live on the web, and if so, whether the information contained therein was still the same or had been changed.
The results, from an archival point of view were quite alarming. Over half of URLs from 2007 and earlier were unobtainable for one reason or another; and (apart from 2014 URLS), even those which are still retrievable, over 90% are no longer the same material. Therefore, could it be that Information Retrieval is only good for dealing with the web in the here and now? Are we going to have to develop new methods, new techniques and maybe even new ways of thinking when searching for historical materials on the web?