Thursday, November 4, 2010

Geocities and the Digital Archive Potential

Courtesy of Paul Townsend
The rapid and progressing digitization of our daily life, a process that is still reconfiguring relationships between culture and the individual, brings new questions and concerns to the practice of history.  While reading Yochai Benkler's Wealth of Networks, I was struck at how the model for old media distribution paralleled that of the traditional role historians played in the production of knowledge in our larger culture.  In old media, when a newsworthy source arose to the level of public concern it was often presented in a format that relied upon the faith of the viewer to trust in the summary produced, an example being the mainstream networks coverage of the Vietnam war.  Out of countless images and racks of video, network editors would select choice examples and present those pieces as the central discussion point; there was little room for viewer desires of alternatives outside that of the editorial selections for mass consumption.

In many ways, the practice of history continues to follow this same production model.  Archives are vast, expansive, full of narrative potential that is realized only through survey and exhumation, often requiring a trained professional to make sense of the written cacophony.  Like editors of old media, historians sift through hundreds of documents in order to select a few texts representative of the theme they pursue.  These themes turned narratives are then distributed via 'containers', books, journal articles, lectures, etc. in a manner that both reinforces and certifies the role of the historian in knowledge making and cultural production.  So long as the archives continued to be physical, geographically distributed, and organized in such a manner that significant time was required to plumb their depths, historians could continue to remain secure in their role as gatekeeper to the past.

However, just as digitization of news media threatens the 'certified' position of journalists, the same too is happening with historians.  It is a slower process, to be sure, as history does not rely upon production of up-to-the-minute information to remain relevant, but it is happening.  The crux of the problem is that as digital archives become more prevalent, the incumbent threshold of specialized knowledge required to sift their contents drops to the level of casual interest.  While a person may still seek out produced, authoritative historical works, their reliance upon the evidence selected and presented is greatly diminished with the availability of searchable digital archives.  Just as the discipline of journalism  debates the rise and role of 'citizen' journalists, so too must the historical profession contend with a similar rise and role of 'citizen' historians as the digitization of source materials becomes more available.

This is not to say I abhor the presence of 'citizen' historians.  Quite the contrary, I see the rise of participation in the production of historical knowledge a significan event that demands our attention and support.  However, as is the case with any participatory relationship, their arises the question of veracity and reliability of product produced, especially when using digital sources.  Take the example of the Virginian history textbook for fourth graders found to contain falsehoods perpetrated on the misuse of digital sources.  James Grossman, writing for the American Historical Association blog, states that concern over the elementary school textbook goes beyond mear protection of professional turf and, instead, addresses a more fundamental issue of providing good models of proper digital source evaluation and use in historical research.  Not only do I agree, but further suggest that Grossman's statement could be one avenue historians should take to help redefine their role in an increasing digital culture.  Let me explain what I mean through discussion of the recently debuted Geocities Archive.

Thanks Wayback Machine- Geocities 1996, 1998, & 2000, respectfully

Geocities, an early web hosting site running from 1994-2009, had, for all practical purposes, been erased when Yahoo! decided to terminate the service in October of its final year.  I say had, because a group of like-minded individuals, Archive Team and others, began the process of copying the Geocities holdings prior to the announced shutdown.  While the total percent of archived webpages remains unknown, (Yahoo! has not released figure the various groups could use to check against their numbers) Archive Team felt that the efforts of all successfully saved a large portion of the Geocities contents.  Having compressed the files into a roughly 650 GB file, the group made the archive publicly available via BitTorent.

Why the concern?  Because of the relatively early life of the domain into the, then, emerging world wide web and its sustained popularity over its fifteen year lifespan, Geocities hosted sites amounted to a historical documentation of internet evolution in the cultural sphere.  An apt metaphor might be possessing recordings of the first radio broadcasts from across America and even the world.  Actually, this is stretching it a bit as Geocities did not host the entire web, yet its contents, nonetheless, were a snapshot of emerging digital culture.  As such, the released archive has the potential for several creative uses but only if the disparate information contained within can be intelligibly comprehended.  I believe two recent examples provide models for how we might begin to understand such data.

The first comes from a recent analysis of the Russian blogosphere by the Berkman Center for Internet and Society.  Looking at a 'discussion core' of over 11,000 blogs, the researchers first analyzed the network of hyperlinks to generate a structural image of the data, then looked at where 'clusters' developed in the image to analyze concerns and viewpoints on politics and political affairs expressed there.  The result is a color coded figure "analogous to an fMRI of the social mind," (page 12 of report) that allowed researchers to clearly visualize public discourse occurring within the Russian blogosphere.  They discovered that Russian blogs generally were less 'siloed' in interest and less susceptible to an 'echo chamber' effect, containing a richer depth of cross-cutting debate than found in comparable systems in the US.  On the left are two illustrative examples from the report- the Russian blogosphere is on top and the United States on bottom.  Notice how the US picture depicts more divided clusters than the Russian picture does.

What if the same analysis was performed on the Geocities Archive?  Granted, the contents go beyond blog format, yet similar hyperlink analysis could be undertaken ultimately producing a dataset of network composition similar to that used by the Berkman study.  Even if the picture is hindered by an incomplete archive it could still produce a very useful baseline against which future findings compare themselves.  Not only that, but the themes of conversations/links archived could yield insight into the concerns of the public discourse for the period studied.  This could fill in background on historical moments, a way to measure the chatter leading up to a pivotal event.  As the Berkman study concludes, "there are far more questions about the impact of the Internet on collective action than there are answers."  Research on the Geocities Archive could help provide some of those answers.

Iraq War: A Wikipedia Historiography
The second example, a compilation of the changes made to the wikipedia entry on the Iraq War from 2004-2009, provides another avenue for research for the Geocities Archive.  James Bridle, who printed and bound the changes into a twelve volume set, commented on the neccessity for a discussion on history in our digital age, or, more precisely, historiography of digital sources.  His example of the changes made over the five years on the Iraq War wikipedia entry show how culture and digital sources are beginning to co-exist in an extensive manner, with particular sources reflecting public discourse through mutations of their form and content.  This change over time in both source material and public attitude falls squarely in the concerns pursued by professional historians, yet the Iraq War wikipedia entry is radically unlike physical archives in that the location of the 'documents' are not archived in one spot and require only a web browser and some idle time to view their contents.

For Bridle, the Iraq entry represents our current capacity to store digital culture but only if we are aware that viewing such an archive is viewing history in process, a means by which our culture could, in the words of Bridle, "talk about historiography, to surface this process, to challenge absolutist narratives of the past, and thus, those of the present and or future."  It is all too easy to lose these digital sources, to simply hit delete and obliterate the incorporeal form of peoples thoughts, words, and expressions captured on something as potentially mundane as a personal journal or collection of midi files; yet the ease of elimination belies the larger damage done to our potential for reflection and evaluation of these sources. 

This is where the Iraq wikipedia entry lends light to another use of the Geocities Archive, that being a means to examine the process of change inherent to the circulation of ideas.  Lines of logic could be traced over time to track their development, design standards of webpages could be compared and analyzed much like architecture, and distribution of information and its pliability in use could be analyzed, just to name a few speculative ideas.  Clearly, as the examples of the Russian Blogosphere and Iraq War wikipedia entries demonstrate, there are several reasons for wanting to preserve and study a digital source base like the Geocities Archive.  Having established relevancy, the question then becomes how to go about cataloguing and sifting the archive for ease of access and analysis.  Again, two examples from both Yochai Benkler and the recently released Wikileaks Iraq War logs assist in formulating an answer.   

From this....
Benkler is credited for coming up with the phrase 'peer production' to describe the phenomena of culture creation happening at the center of the developing information economy, now rooted in several developed Western nations.  This blog is one example of 'peer production', and Wikipedia is another.  As Benkler noted in The Wealth of Nations, the internet and relative cheap cost of excess microprocessing power enabled individuals of varying groups and beliefs to come together and not only consume culture but also produce it.  In particular, Benkler highlights ways in which 'peer production' can assist in performing large data analysis tasks, so long as those tasks are broken up into a finely-grained process   allowing for quick, but concentrated, interpretation.  This process is embodied in the recent release of the Iraq War Logs by Wikileaks. this!
Having previously released a straight 'dump' of around 70,000 documents relating to the war in Afganistan in the late summer of 2010, officials at Wikileaks sought to make the larger Iraq War logs release (around 400,000 documents) more accessible to those interested in reading the documents, be it for personal interest or research.  The result was two, separate webpages; the Iraq War Diary Dig and War Logs.  Both pages allow for 'peer production' of knowledge based on the document archive, the only difference being the Diary Dig allows for very specific document searches capable of using sophisticated filters to sift the contents, while the War Logs acts more like a social networking platform for discovery, allowing users to create an account, search for documents and then tag those documents for others to easily view and comment on.  In this way, users no longer have to rely upon major media outlets to sort and prioritize the information, they can do so for themselves.  With a robust searching and comment approach that allows for scalability of user interest (I could look at one document and comment, or I could look at thousands and keep tabs using my login account), Wikileaks ensured that the Iraq War Logs, an archive that is daunting in its composition, volume and documentation, would become something that average people could use and share with others and ultimately become a major source of reflection on a conflict that, so far, defines America's policy in the 21st century.   

The same techniques and approaches should be used on the Geocities Archive.  Preliminary sorting of the data could chronologically order the webpages, so that a user could focus on a particular year or grouping of years.  Users could then view the pages in their search selection and perform a tagging feature akin (albeit in a more narrow scope) to the process used by Pandora and the Music Genome Project.  Does the page have multi-media?  Are the contents political or personal or themed?  What is the structure of the webpage?  By allowing for redundancy of pages viewed and a growing list of relevant tags, 'peer production' could help bring the Geocities Archives into a more manageable form.  If maximum exposure is desired, then finding a way to make a mobile phone application to maybe verify tags or view pages in their own right would certainly help make the task more granular and less demanding on time required.  There are several paths to go down in using this archive, and I mention these few only as a means to jumpstart the conversation on what could be done.  

To bring the conversation back to my opening statement, the Geocities Archive represents a potential moment for historians to take the lead in defining their role in our increasingly digitized culture.  The efforts to catalogue and produce meaning from this data should not come exclusively from historians but should instead be lead by them.  As noted with the Virginia history textbook, the issue is not protection of turf but instead to promote an instructive model for how research and use of digital sources should be implemented.  If the profession fails to take charge on establishing this role, then their relevance in the knowledge making of digital culture will slowly diminish and, perhaps, become seemingly irrelevant to a large amount of participants in that culture.  This means, increasingly, that historians need to come equipped not only with familiarity of these newly emerging digital archives, but also the programming knowhow to create tools allowing for mass participation in sorting their contents in a manageable, productive way.  

Of course, the practice of history will continue regardless of the status of professional historians, yet I truly believe that 'professional' historians can serve the larger efforts of 'citizen' historians by demonstrating proper models of research and attribution in our increasing digital culture.  The Geocities Archive is just one potential source-base, but one I feel could prove to be a valuable testing ground for the development of future techniques.

(Authors Note: There is a post-script to this essay on analyzing one million syllabi collected online.)      


  1. A wonderful essay, and a very well-informed argument. I feel like almost every word could be applied to my field of journalism as well. Peer-production is forcing an amazing and little-understood shift, and we're seeing a collective realization within the profession that geeks are here to stay. There are now some really interesting hybrid identities like the "Hacks & Hackers" groups internationally.

    Your central call to action also applies to journalism:

    "If the profession fails to take charge on establishing this role, then their relevance in the knowledge making of digital culture will slowly diminish and, perhaps, become seemingly irrelevant to a large amount of participants in that culture."

    That's sort of the threatening way to say it. To frame this as good news: this could really be the dawn of an amazing new era for journalism, because so many more people now believe that sorting through information for the public good is a worthwhile thing to do.

  2. Thanks for the compliments on the post! The more I read about the emerging digital culture, the more I realize it has the potential to reshape a lot of our conceptions, especially in knowledge production and certification.

    I admit, looking at my quoted section above, that my sentiment was a bit venomous. I agree with your rephrasing and have some of that feeling in my post on the Digital Humanities. You are right on about how this is a little-understood shift and how more people are coming around to see the benefit of using their time towards peer produced projects. I see exciting potential for my own field in History, but this force will no doubt have impact on many other fields.