Library of Congress To Archive All Public Tweets 171
After the recent announcement that Groklaw will be archived at the Library of Congress, mjn writes with word that the push to archive more digital content continues: "The US Library of Congress announced a deal with Twitter to archive all public tweets, dating back to Twitter's inception in March 2006. More details at their blog. No word yet on precisely what will be done with the collection, but besides entering your friends' important updates on the quality of breakfast into the permanent archival record, the deal may improve access for researchers wanting to analyze and mine Twitter's giant database."
Your tax dollars at work... (Score:5, Insightful)
Given the signal to noise ratio for most tweets, I'm not convinced this is a particularly good use of resources...
Just because you can do something, doesn't mean you have to!
Re:Your tax dollars at work... (Score:5, Insightful)
It's not like it takes a lot of space to archive them, it's just 140 characters per tweet. There's a lot of useless information in the newspapers and books too, but they have archived them too because some of that info is valuable or might become valuable.
Re:Your tax dollars at work... (Score:5, Funny)
Hi, @librarycongress! I just took a shit. I am honored that you will be archiving this momentous occasion for future generations.
Re: (Score:2)
Hi, @librarycongress! I just took a shit. I am honored that you will be archiving this momentous occasion for future generations.
Obligatory [penny-arcade.com].
Re: (Score:2)
Yes, this truly is a giant database. Let us do math.
140 characters/tweet * 2 bytes/character * 12E9 tweets = ~3.36TB
O. M. G. This would fill more than half the hard disk space I have in my NAS...truly massive! (At my company, there was an April Fool's rumor going around on the day that Twitter would be going down for 10 minutes while their high school intern upgraded their "Tweet Storage Unit" (TSU) by adding an extra 2TB drive. Har har! To be fair, they store a good bit of metadata besides the tweet itsel
Re: (Score:2)
6TB is embarrassing? For a NAS used for personal backup? That's not only sufficient, I have lots of room to grow into it...I suppose I should be doubly embarrassed.
Let me check...hm, nope. I'm not. :-)
Re: (Score:3, Insightful)
50 million tweets/day
140 characters of message
60 bytes of metadata (timestamp, sender id, etc.)
10 GB of twitter archive per day
10 TB per 3 years
What does 1 TB cost these days? about $100?
Storage space will indeed be an inexpensive part of the cost, and will decline in price at about the same rate the traffic is growing.
Re: (Score:2)
That's uncompressed. Toss in bzip, gzip-9 or 7za. It's just plain text it should compress >80% rather well.
Re: (Score:2)
Indeed I agree, especially given the overlap in the topics people tweet about, thus the words/text used in tweets.
Re: (Score:3, Funny)
50 million tweets/day
140 characters of message
60 bytes of metadata (timestamp, sender id, etc.)
10 GB of twitter archive per day
10 TB per 3 years
Yes, but how much is all of that in Libraries of Congress?
- RG>
Re:Your tax dollars at work... (Score:5, Interesting)
And just because you don't have to, doesn't mean you shouldn't!
This is probably the best way to capture a snapshot of our current society. Sure, the barrier for entry is a little lower, but I think this will be invaluable for historians who look back and try to understand us.
Or, if anything, it'll confuse the hell out of them .
Everyone wins either way!
captcha: formally
Re: (Score:2)
Re:Your tax dollars at work... (Score:4, Informative)
I's fun to think of historians as just attributing everything they learn about societies to religion and superstition, but the biggest reason we think pre-Enlightenment civilizations were obsessively religious is because the priest castes were generally among the most literate and the most concerned with preserving knowledge of the past. Much of what we know about history comes through their writings—and therefore, their perceptions. They quite literally wrote history, to a large extent, and our understanding of their society is colored by their bias.
The Information Age has democratized knowledge to a huge degree. Historians centuries or millennia hence will have plenty of sources other than the lens of the Catholic Church. Given current trends, even just a decade from now a few consumer-grade storage devices could hold everything the Library of Congress or Archive.org contains today. As long as there are a few people in the world interested in preserving it, modern history should be safe.
Re: (Score:2)
Hey, that's interesting and insightful. I never thought of it that way before. I wonder if skeptics and nonbelievers were as common then as now. (In America I peg us at about 20% of the population, with about half of us being in the closet.)
Imagine an ancient ritual sacrafice of a virgin or something, and one fifth of the crowd is sort of rolling their eyes thinking "really? I mean, really? You guys think that stabbing a girl with a hymen is going to bring you blessings from magical beings in the sky? Get a
Re: (Score:2)
Re: (Score:2)
I live in Wisconsin, grew up in Alaska, and lived for a while in New Hampshire and Massachusetts.
Also to be clear, for skeptical nonbeliever I refer not only to Christianity or its similar easy-to-characterize religions, but also the Eastern sorts of religions, and the New Age sorts of religion (ghosts, "energy", pagan spirits).
Obviously, I hope you are right that we amount to greater numbers. Where do you live?
(Skepticism is a shockingly overdue prevalent worldview.)
Re: (Score:2)
California, but I'd say that it's closer to 40 (more if we include skeptics that attend church services just to appease someone else). And maybe it's due to the 20s-30s age group too.
In WI, I am actually surprised at 20%. I've always felt like an outsider (admittedly not urban areas).
Re: (Score:2)
Well, if it's 40% then that's pretty good. But, being from California, can you disclaim the accusation that many Californians are New Age-y hippie cranks?
In any case, let's hope our numbers keep increasing. If we get to the 40% you suggest, then I think we might start seeing some Out Atheist politicians.
Re: (Score:2)
Re: (Score:2)
Now, if the LOC would archive
Re: (Score:3, Funny)
When the historians of the 50th century unearth the records of /., they'll realize the Final Dark Age came upon humans in the early part of the 21st century, and that while many saw something happening, none realized the extent. And then they'll click their mandibles
Re: (Score:2)
You have to remember that the people usually shouting "wargarbal waste of money" to scientific situations such as these aren't the type to give two shits as to generations that come after them, as we've all seen. :(
Future historians? These people are trying to burn history books today.
Scientific Situations? (Score:2)
I didn't realize that scientific papers are now chopped-up and delivered via twitter.
Re: (Score:2)
Re: (Score:2, Insightful)
Given the signal to noise ratio for most tweets, I'm not convinced this is a particularly good use of resources...
Just because you can do something, doesn't mean you have to!
Its a fantastic idea, its probably only a few Tb of data but it represents the unedited reaction of ordinary people to historical events and a detailed insight into their everyday lives.
Detailed? (Score:2)
You keep using that word. I do not think it means what you think it means.
For the future (Score:3, Interesting)
We learned more about ancient Egypt from their twitter then from all the official records designed to be survive the ages. Sure sure, very interesting to read the "unbiased" record of a pharaoh in his own tomb, but it is from the "trash" notes that were recovered that we learned about how the country itself worked. Including such little details as that the pyramids were not made by slaves.
The official records of the US will be Fox news. Better pray that future researchers have access to some other source,
Re: (Score:2)
Obviously it's a project really funded by the DOD, the highest quality source of entropy yet for cryptography.
Certainly could be the users (Score:3, Insightful)
A library archiving your work does not necessarily imply that you don't own the copyright on it.
Re: (Score:2)
And I would bet the Library of Congress doesn't have to give a damn about copyright anyway.
Copyright registration; eminent domain (Score:2)
Re: (Score:2)
But people tweet outside USA jurisdiction too.
If they tweet on the USA-based server, they make themselves subject to USA jurisdiction.
Re: (Score:2)
1 trillion bytes = 1TB in the words of HD manufacturers. One trillion 140 character tweets is exactly 140TB.
Re: (Score:2)
You can get a pretty good idea of that individual's development. Or the tweets on a day like 9/11.
True enough, although you do have to wonder how much help "9/11? ZOMG--WTF?!" is going to be to future researchers.
hmm... (Score:2)
I could see them archiving tweets that were relevant to pop culture or history...but all of them??? Seems like a waste of time and money to me.
Re:hmm... (Score:4, Insightful)
Re:hmm... (Score:4, Insightful)
I suspect a lot of the interesting information is in the aggregate anyway, not individual tweets: things like trends, analysis of subgroups, linguistic analysis, etc.
Re: (Score:2)
I think that the importance of a single tweet varies depending on who is sending it and who is reading it. If I tweet/twitpic about some activity my children are doing, you might think a giant yawn is being generous. Meanwhile, however, a family member or friend reading it might be genuinely interested in that information. To give another example, if @grantimahara tweets about an upcoming episode of Mythbusters, you are a fan of that show, you'd likely find it very interesting. However, someone else who
Re:hmm... (Score:4, Insightful)
all of them???
Disk space is cheap...
They should get a copy of the internet archive while they're at it.
Re: (Score:3, Funny)
Re: (Score:2)
alt.binaries too
Good idea. Maybe linux-kernel too. Is there a better example of large scale teamwork? For coding, I mean. Not for documenting the downfall of the US legal system.
Twitter steganography (Score:2)
You can make it happen. Come up with a method to encode alt.binaries in 140-character chunks and the Library will archive them all for you.
Re: (Score:2)
Disk space is cheap...
Since it's "twitter", surely that should be "cheep"?
:-(
Uh, sorry.
Anyway, if Twitter messages are 140 bytes and we assume the overhead averages 30% per message, that's 187 bytes per message.
5.5 tweets per metric kilobyte.
5475 tweets per megabyte.
5,475,935 tweets per gigabyte.
5,475,935,828 tweets per terabyte.
Which isn't far short of the earth's population. Figure out the average number of tweets per person on earth, and you know how many $60 1TB hard drives you need to store them all.
The questi
Re: (Score:2)
In the history only popular news or writings were archived. Wouldn't it be interesting to see what someone else, normal people, said about Shakespeare or some kings 1000 years from now? All we have now is what was archived - popular writings that governments agreed to.
Re: (Score:2)
Re:hmm... (Score:5, Interesting)
They were probably too busy watching Medieval Idol to even realize who Shakespeare or the King was ;)
A jest, I know, but it does demonstrate a serious point.
Our history books are based on records maintained by the winners of wars, the leaders, the successful, etc. We know a lot about Shakespeare. We know relatively little about how his audiences actually felt about his work.
We largely speculate as to how life was for the ordinary folk during historical periods based on writings about them, not writings from them. The exception to this is diaries, and now many people maintain those any more. Twitter can help replace some of that perspective.
Admittedly, Twitter is not an ideal way to get a picture of a society, but you get to hear historical events told from a very different perspective. Actually, you get to hear them from LOTS of perspectives. They may not be an accurate portrayal of the events, but they are a snapshot of how a society reacts to and perceives events.
Who will represent the narcissists in society for future generations?
Re: (Score:2)
Re: (Score:2)
but most people tweet about mundane crap, not what happened on Capitol Hill. i.e., signal to noise will be horrible for trying to decipher What the Hell Happened...
Not really. I enter "White house" on Twitter's own search features and there is only about 30% noise, 70% stuff relevant to my topic.
So, in the year 31000 when they discover this data cache from the year 2010, they'll have search algorithms better than we could possibly concieve.
Re: (Score:2)
Which shows something important in itself - that most people don't care all that much about most of what happens on Capitol Hill.
Re: (Score:2)
That in fact is an ideal reason to do this, and twitter is nearly the ideal forum. The only hole in it is that some people aren't represented. Those who are over- or under-represented can be identified and the weight of their observations adjusted. But those who simply are not recorded will not have had an opinion at all.
The real problem here is, the LoC is a government entity, and all my experiences with technology provided by government entities has left me less than impressed. Searching the LoC's arc
Re: (Score:2)
The exception to this is diaries, and now many people maintain those any more.
Maybe not in written paper form, but certainly many people maintain and update their own blogs, notes, and other status updates on things like Myspace, Facebook, and blogspot. Surely those resources would be a good source for the same type of information that is maintained in diaries. I suppose diaries had/have the added advantage of usually being considered private, so more information may be disclosed in them. However, it's become pretty apparent that there are still many netizens that don't think enough
Re: (Score:2)
Re: (Score:2)
I will, of course, as I'm sure you all assumed.
Re: (Score:2)
Who really wrote the plays attributed to him?
David Tennant, obviously.
Re: (Score:2)
They were probably too busy watching Medieval Idol to even realize who Shakespeare or the King was ;)
Shakespeare was Renaissance English Idol, while Chaucer slammed the Medieval category.
Just because something is now stuffy 'literature' doesn't mean it wasn't wildly populist entertainment in its time. There's a reason why a lot of Shakespeare centers on drunks, crossdressing and hitting people with swords.
Re: (Score:2)
drunks, crossdressing and hitting people with swords.
So you're saying that we should archive /b/?
Re: (Score:2)
Hmm...that is a good point...
Diabolical Intentions (Score:5, Funny)
Re: (Score:2)
I saw that episode. "Just remember, Scooty Puff Jr. sucks..."
Re: (Score:2)
They'll have to fight Google for it. [theonion.com] :P
All these recursive acronyms are great, but... (Score:2)
1 new Tweet = 0.00000000000000017263 ( the current LoC + the new Tweet )
The only time... (Score:3, Interesting)
Re: (Score:3, Interesting)
Future generations will look back and conclude that some people REALLY did have to TOO much time and trivial stuff to share.
Sure, why not? You never know what sort of insights you'll get. What people do in their free time is just as important to historians as what they do when they're working. More so, sometimes, since the work is often ephemeral while the free time is an important insight into the culture as a whole.
Most of it's garbage, but garbage middens are one of anthropology's favorite data sources.
Re: (Score:2)
I find it to be an extremely useful tool for keeping up on various personalities and the going-ons behind the scenes at certain websites. A sampling of the list of the people I follow:
PADnD (Penny Arcade live tweets their Dungeons and Dragons games)
mattsinger (critic for IFC)
aedavis (Ashley Davis, who draws Once Upon a Pixel)
washcaps (Washington Caps Hockey official twitter)
mcps (Montgomery county Public Schools, who my fiancee works for)
CameronPierce (Bizzaro author)
CERN (LHC stuff, obviously)
BenKuchera (
Re: (Score:2)
Future generations will look back and conclude that some people REALLY did have to TOO much time and trivial stuff to share.
Which is why its important that we store this information. We know what the history books are going to say. We know that the War on Terror will come out to either be a horrible attrocity that human kind should never try to re-attempt, or it will be declared a huge success that ushered in a new era of peace and stability. People will ask "I wonder what was going through peoples heads?"
And this is the PERFECT example. It will show that a lot of people didn't do anything, and they'll probably infer it to be Ap
How many libraries of congress to store all that? (Score:2)
Great, we've got a variable constant now.
Re: (Score:2)
Great, we've got a variable constant now.
Don't worry, we'll just set up a system to tweet the new value whenever it changes ;)
I tweeted about this. (Score:5, Funny)
http://twitter.com/mzzt/status/12179834899 [twitter.com]
It had to be done.
Re: (Score:2)
Re: (Score:2)
The LoC isn't archiving URL shortener targets (yet, anyway), but the Internet Archive is on it [archive.org], which at least ups the likelihood that some future researcher will be able to decode what those links pointed to.
The future. (Score:2)
If they think tweets are worthy of being archived why not just archive every blog and comment in existence? Many of those offer far more worthwhile insight than 99% of tweets.
I remember in school students and sometimes teachers occasionally mocking the customs of past cultures. There was always that subtle arrogance that we're somehow more enlightened than people were 500, 1000 or 2000 years ago. The problem is that people confuse technological advancements for intellectual and philosophical advancement. I'
Re: (Score:2)
If they think tweets are worthy of being archived why not just archive every blog and comment in existence? Many of those offer far more worthwhile insight than 99% of tweets.
There is a slippery slope here. What happens when the try to archive the Library of Congress within the LOC? The recursive archiving would destroy them.
With the massive proliferation of every last inane comment preserved for posterity I can only imagine how utterly stupid we are going to look to people of the future.
Take that, future people!
In other words... (Score:2)
Okay, I'm sure someone (probably The Daily Show) will, at some point, find something useful in all that noise.
Legal implications? (Score:3, Interesting)
All 'useless twits' jokes aside, this is pretty interesting. But I wonder if they'd run into any copyright laws.
Reading the Twitter ToS turns up with this:
You retain your rights to any Content you submit, post or display on or through the Services. By submitting, posting or displaying Content on or through the Services, you grant us a worldwide, non-exclusive, royalty-free license (with the right to sublicense) to use, copy, reproduce, process, adapt, modify, publish, transmit, display and distribute such Content in any and all media or distribution methods (now known or later developed).
which looks to me like posters retain copyright, but Twitter retains the right to grant others the same license you've granted them (non-exclusive license to provide their service).
So based on my reading, Twitter (and the LoC) are in the clear?
Libraries have an exception (Score:2)
I think this would be legal regardless of what the ToS says. See the exemptions given to libraries and archives in 17 USC 108 [copyright.gov].
Re: (Score:2)
Small data set (Score:4, Interesting)
Math for the day:
Without compression, all tweets in human history will fit on a single hard drive costing less than $100.
http://search.twitter.com/search?q=a [twitter.com] (to find the latest tweet number)
http://twitter.com/about [twitter.com] (character limit)
http://www.pricewatch.com/hard_removable_drives/ [pricewatch.com] (1.5TB drive)Delete
http://www.google.com/buzz/fulldecent/18tfNfPHSBp/Math-for-the-day-Without-compression-all-tweets-in [google.com]
Re: (Score:2)
Even if you double or triple the data stored per tweet to account for other metadata, assuming the parent's math is correct, it still shouldn't matter because that's still a trivial amount of storage to manage.
What about other microblogging platforms? (Score:2)
Twiiter (Score:2)
While on a whole twitter is very important, most likely in an importance vs amount comparison they would rate as one of the lowest scoring collections of data of all time.
Researchers? (Score:2)
Yeah, yeah, it's public. Agreed. And everybody knows there's no difference whatsoever between what some guy can read and an exhaustive, automated audit trail and connection map of everything that has ever been posted. That's why nobody uses search engines, after all.
One thing I know for sure... (Score:2)
Gabe will be THRILLED. [penny-arcade.com]
Oblig. (Score:2)
All your tweet belong to us!
Usenet (Score:2)
They should have been archiving Usenet from the beginning.
Meanwhile: ACTA, not achieved. (Score:2)
And don’t even ask about Wikileaks as a whole...
Neat, forever TwitterShare! (Score:2)
Given that we can store almost 525 bytes [ksplice.com] of data in a single twit (I refuse to call them tweets), which is enough for a sector of data plus metadata, could it now mean we can store our data permanently at taxpayer's expense?
I call it TwitterShare as a play on RapidShare to send files easily... and now those files will be forever archived. Sounds like a good way to backup data to me! Other than letting everyone else in the world see your files...
I have five DVDs' worth... (Score:2)
...of archived gopherspace content I'm willing to donate to the LoC. Seems to me this dated motherload of data would have far more historical significance and impact than thousands upon thousands of dissociated mindfarts.
That's it (Score:2)
I'm putting my Library of Congress stock recommendation to STRONG SELL.
Why dont they archive the books first? (Score:2)
Tweetstore in 3... 2... 1... (Score:2)
How long before someone comes up with a scheme to backup files in encoded tweets "for posterity"?
Seriously, they should be spending their effort on funding or replicating the Internet Archive instead.
Re:Pooping (Score:4, Interesting)
I know you are joking, but this kind of stuff is actually very important to historians. For example, the only reason we are able to reconstruct how many hours a day people worked in the medieval era is by looking at court records - the judge will ask things like "what were you doing at five" and the person will respond with answers like "eating" or "sleeping" or "working", and by going though a lot of court records, we were able to guess at how people lived back then.
This will allow the historian of the future to guess much more accurately.
Re: (Score:2)
The obsessives worrying that we're about to enter a digital dark age forget about the massive amount of loss of data, information, photos, etc. from the past, and also underestimate the stupid amount we're archiving (intentionally or otherwise) nowadays.
Modern society is fast approaching the point where the major pro
Re: (Score:2)
I know you are joking, but this kind of stuff is actually very important to historians.
Plus in twenty years when the current college crowd is running for public office we will have all sorts of shit to dredge up.
Re: (Score:2)
Tycho is being a douche... alright, poop time... okay, poop is coming out.
I'm a twitter shitter!
Re: (Score:2)
Future alien archeologists will say: "These fuckwit twits must have had shit for brains."
"Let's saucer on over to another planet, Zog . . . there's nothing to learn mining this crap . . . and we might catch something here . . . ick!"
Re: (Score:2, Interesting)
Of course, that assumes that budding social scientists in the 23rd century can read [imdb.com].
Re: (Score:3, Insightful)
I would that a social scientist in the 23rd Century does that think that average human of today posts every triviality in his life like most of the current twitters.
Re: (Score:2)
Re: (Score:2)
Does the phrase 'history is written by the victors' mean anything to you?
Re: (Score:2, Interesting)
Soon after, he publishes a paper with his revolutionary new theory: People in the 21st century were so forgetful that they decided to record all details about their daily life in a central database so they could recover it if necessary.