I fully support Aaron Swartz as he fights unjustified charges from the U.S. government, and hope that my readers will support him too. Aaron is a researcher who works with huge datasets and has worked on many open data projects. Aaron is being charged for having accessed JSTOR, a repository of academic journal articles, and downloading them.
JSTOR itself didn’t want to press charges and says it hasn’t suffered loss or damage. But the U.S. Government indicted Aaron because they feel like they “caught a hacker”.
I’m incredulous that they would pursue this case against a well known researcher and activist who allegedly was doing something quite benign — scraping data.
I worry that this case will have a chilling effect on open data projects. The government has gone to great lengths here to stop a respected activist’s work, siccing the Secret Service on him and wasting an incredible amount of resources to trump up this case. The FBI has already investigated Aaron at least once for downloading PACER data . It looks bad to me, like the government was basically waiting for any excuse to build some sort of charge against Aaron for his briliant open data activism.
Here’s Aaron’s background in open data and analyzing large data sets:
In conjunction with Shireen Barday, he downloaded and analyzed 441,170 law review articles to determine the source of their funding; the results were published in the Stanford Law Review. From 2010-11, he researched these topics as a Fellow at the Harvard Ethics Center Lab on Institutional Corruption.
He has also assisted many other researchers in collecting and analyzing large data sets with theinfo.org. His landmark analysis of Wikipedia, Who Writes Wikipedia?, has been widely cited. He helped develop standards and tutorials for Linked Open Data while serving on the W3C’s RDF Core Working Group and helped popularize them as Metadata Advisor to the nonprofit Creative Commons and coauthor of the RSS 1.0 specification.
In 2008, he created the nonprofit site watchdog.net, making it easier for people to find and access government data. He also served on the board of Change Congress, a good government nonprofit.
In 2007, he led the development of the nonprofit Open Library, an ambitious project to collect information about every book ever published.
I would also like to say that I think that libraries and academics should stop buying into the JSTOR model. JSTOR aggregates academic journal articles which it doesn’t even own, and sells limited access to those articles to large institutions for thousands of dollars. Libraries and universities should act to enable access to information, not to limit it.
ETA: Here is JSTOR’s official statement on the case.
4 thoughts on “Support open data and defend Aaron Swartz”
Hi, I’m a passing individual who read this part and was curious:
As someone who has used JSTOR for research (and similarly O’Reilly Safari for Technology Text Books) and hasn’t thought critically about these models because, in my case, they have worked quite well since as a member of the library I was able to use them effectively, I’m curious what the alternative model would actually be?
I’ll say that I remember the advent of JSTOR as one of the things that made me feel substantially alienated from using the library as heavily as I used to. I was increasingly referred to it as a resource, and yet logins weren’t posted up anywhere in the library I frequented (I don’t know if they’d even subscribed yet) and it asked for payment in exchange for access. The card catalog system had died pretty recently, which was disorienting enough, but manageable–on the other hand, being told I needed to pay for information access was really damaging. It left me feeling that libraries weren’t for people like me.
Mr. Swartz wasn’t indicted for scraping articles from JSTOR, he was indicted for what was effectively criminal trespass/wasting resources on MIT’s network. He broke into wiring closets and repeatedly bypassed various measures in place to prevent excessive bandwidth use.
If he had been charged with a crime purely for accessing too many JSTOR articles, then I would be defending him myself, but in fact that was incidental to the actual charges. As someone who has been on both ends of things like this (sysadmin as well as user frustrated by bandwidth hogs) I can’t find much sympathy for him.