Tweet now, or forever hold your tweet

The Library of Congress is pressing ahead with plans to archive all tweets, ever - but when the endeavour is 'prohibitively costly', many ask - what's the point?

by Media Hawk on 22 January 2013 11:34

In 2010, the Library of Congress (LoC) proudly announced that Twitter was 'donating' its entire archive of every tweet ever tweeted (twat? twote? twut?) to the LoC, in order that one day, our aliens succesors anyone could browse the entire human history (since 2006) as told in 140 characters.

Since the boastful press release from April 15th 2010, there have been a few stumbling blocks. But that hasn't stopped the costly and seemingly pointless initiative from pressing ahead. Ah, small government.

In those few years, Twitter has gone from handling 50 million tweets per day, to a staggering half a billion (in late 2012). This is proving difficult for the LoC. 

While the institution has proclaimed that it is leaping the hurdle of how to store the information, the searchability of this data is proving to be anything but simple. In fact, it has been stated that even a simple search through the archive can take up to 24 hours at current standards.

Each tweet is a JSON file, containing an immense amount of metadata in addition to the contents of the tweet itself: date and time, number of followers, account creation date, geodata, and so on. To add another layer of complexity, many tweets contain shortened URLs, and the Library of Congress is in discussions with many of these providers as well as with the Internet Archive and its 301works project to help resolve and map the links.

"It is clear that technology to allow for scholarship access to large data sets is lagging behind technology for creating and distributing such data," said a recent white paper published by the Library of Congress.

"This is an inadequate situation," the Library concluded, calling the massive archiving project "prohibitively costly."

And yet Lee Humphreys, a professor of communication at Cornell University in New York, said that the brief online messages can reveal volumes "about the culture where they were produced."

But don't worry. If you're the BBC, Kim Kardashian or an eccentric Member of Parliament, for instance, you can rest assured that tweets that have been deleted or that are locked will not be among those gathered by the Library of Congress.

blog comments powered by Disqus