On Tuesday, I’d mentioned that I’d been spending my evening working on approximately 6 GBs of data and mentioned that I’d explain later. Well, it’s later, so I probably should explain.
Some background…Brewster Kahle, the founder of the Internet Archive (among other things), was the opening Keynote speakers at Code4Lib. We’d had some trouble thinking of a good speakers gift for him, but in early Feb., Jeremy and I had the opportunity to visit the Internet Archive and an opportunity to speak with Brewster and the folks from the Open Library. While there, he told a great story and the idea kind-of got rolling from there. Basically, Brewster talked about a visit that he’d made to Japan, and on the visit, he presented as a gift the entire Japanese domain from 1996-2001ish. While of course, gave me an idea. I wanted to do the same with Summit, and give Brewster a snapshot of the 12 million MARC records found within Summit.
So, from the idea, I sent a message out to a number of libraries in the Summit consortia making a request for records. There were questions, some excitment, and in the end, we got a number of libraries that contributed close to 5 1/2 or so million records (~3 million unique). Specifically, OHSU (and their members), Lewis and Clark, Portland Community College, Washington State University and Oregon State University provided me copies of their catalogs. I did some data processing (to translate data to Unicode, remove some records that people didn’t want contributed, etc.) and put the records on a jump drive. We presented the records to Brewster after his talk (a great talk by the way), and I think it was something that he didn’t expect and genuinely appreciated.
But the story won’t stop there. A number of libraries in Summit that were not able to provide there records before Tuesday still want to contribute their data. In fact, tonight, I’m processing close to 1.2 GB of data from Western Washington University — removing some requested vendor records and processing the data into Unicode and will hand deliver these records to the folks at the Open Library on Feb. 29th at the Open Library Developers meeting. Once these records are contributed, it will bring the total number of institutions that have made a decision to share records to 8 with more coming in the very near future. I’m still hoping that at some point, we will be able to contribute the entire Summit database (which will make us almost the single largest contributor of bibliographic data to the project), but for now, I’m just grateful to be in a consortia with members willing to be experiemental and a little bit a head of the curve. 🙂 Way to go Summit!