The Museum of Modern Art hosted a thoroughly modern gathering of librarians, archivists, curators, and preservationists this month. Archive-It NY--the local network of partners and friends of the web archiving software service Archive-It--met for the very first time, and its members got to know the faces and personalities behind some of the region’s most ambitious and technically challenging collecting efforts.
First developed and deployed in 2006 by the web stewards and technologists at the Internet Archive, Archive-It has quickly become the predominant weapon of choice of libraries, archives, and museums charged with maintaining their own institutional legacies on the web, or with building collections of scholarly resources published to that same famously tumultuous environment. NYARC uses Archive-It towards both ends and across the collecting strengths and missions of its member institutions. Thanks to project funding from the Andrew W. Mellon Foundation, our web archiving program has the privilege to pioneer issues of defining importance to all art librarians interested in this model of collecting, including modes of collection development, access, and the questions of long-term sustainability. Much of that work is facilitated by the frequent and generous advice of our web archiving peers, both here in New York and across the globe. I organized this kind of miniature summit in the hopes that all participants could likewise come away with a much clearer picture of their very local support network and its distributed but communal resources, not least of which includes all of our distinct brands of expertise.
Panorama of participants gathered at MoMA for Archive-It NY. Photo by Vicky Steeves.
To that end at least I can certainly report that the gathering was a success. 30 participants from among 16 institutions joined in-person or online, including several that have not in fact yet instituted a web archiving program, but who used the opportunity to gauge the state of the practice. A formal agenda of presentations walked us through key issues as they tend to manifest in the typical web archiving lifecycle, but more importantly stirred the pot for less programmed discussion of individuals’ cases, problems, and lessons learned.
Alexandra Drakakis, Assistant Curator of Collections at The National September 11 Memorial & Museum, kicked off our conversational round-robin with an introduction to her institution’s approach to web archival collection development. Given her institution’s mission to maintain the legacies of the victims of the 1993 and 2001 World Trade Center attacks, she demonstrated the essential artifactual value of archived blogs, news sites, uploaded photos and radio transmissions, and more to increasingly digital exhibitions. This spurred discussion of the perennially sensitive issue of copyright and our permissions to collect online materials, in this case especially from among the recently deceased. It likewise raised familiar questions about integrating the materials collected from the web by archivists across geographical and temporal divides through common access points.
Technical failures can be as easy to identify as they are challenging to solve. Screenshots by Karl-Rainer Blumenthal.
I chipped in at this point to update the group on NYARC’s progress defining the means and boundaries of our quality assurance responsibilities, a topic I recently outlined in some depth for the Library of Congress’s digital preservation blog, The Signal. Quality assurance (QA), as the kind of digital triage that web archivists perform when their collections fail technically to accurately reproduce original material, is a notoriously unpredictable demand on our time and resources. While I couldn’t provide any magic wands, I was able to share a few tools in NYARC’s internal documentation that at least make the process more predictable and efficient. Participants seemed to like this resource and it sparked discussion of producing more partner-authored and public technical guides, so that we may neither work at cross-purposes nor expend unnecessary time and effort on issues that our peers have already solved for themselves. Fittingly, this also inspired questions about the long-term sustainability of digital file formats within our archives today, which is the explicit focus of my current work as a National Digital Stewardship Resident.
To ensure that those digital files indeed ever reach the future scholars and keepers for whom we collect them, it’s also good practice to take out a little archival insurance policy. “Just in case” was thusly the theme to University of Scranton Associate Professor and Digital Services Librarian Kristen Yarmey’s introduction of the integration between Archive-It and the open-source digital back-up and storage environment DuraCloud. For a (relatively) low annual license fee, this service syncs Archive-It partners’ files with Amazon’s S3 and Glacier cloud storage solutions, providing redundant and geographically dispersed copies that are checked for data integrity along their way and regularly thereafter. She has found it to be a perfectly affordable and responsible first step to take towards long-term preservation, especially for a small digital services shop concurrently responsible for many more types of collections. As you might then imagine, her slides were aggressively sought by other participants building their business cases.
Speaking to those specific but often unanticipated needs for which they are collected in the first place, Pratt Institute Associate Professor Anthony Cocciolo next introduced participants to his own research with web archives. These frequently short but always fascinating projects have led him to some insightful conclusions about the particular vulnerabilities of web services built around youth culture to sudden extinction, and of the kind of web-based government information resources disproportionately affected by the political gamesmanship of shutdowns. His most recent project and forthcoming publication used a combination of web archiving and computer vision tools to verify his anecdotal observation that, as you may also have suspected, the web is becoming a quantitatively more visually- than textually-dominated medium. When asked what we as the stewards might do to improve the process of working with this kind of research corpus, he echoed the importance of the speed and ease of access--a challenge so common to us that it may demand a second meeting or workshop of its own.
With these thought provoking projects in mind, discussion turned to what we as the users of the software service would most benefit from and could therefore throw our collective weight behind in terms of technical development. Archive-It already enables its partners to suggest and support one another’s new feature requests through online forums. With an extensive overhaul and new ‘Archive-It 5.0’ interface on the horizon, however, there was agreement around the room that our group could incubate and advocate for especially important technical interventions by way of more detailed discussion here and through our new email group. Alex Thurman got that ball rolling by walking the participants through the status of existing requests on the forum and suggesting still more possibilities that appeal to him as Columbia University’s Web Resource Collection Coordinator, including enhanced control of metadata extraction and display, crawl data budget management, and of Archive-It’s incremental integration of Google Analytics to provide usage information.
This and other conversations must continue, though. In addition to sustaining the conversation online, several of us will meet with more partners from the Mid-Atlantic region this May, and New Yorkers will meet again in the late summer. I’m impressed with the breadth of material that we were able to cover in such a short opportunity, but I know that we’ve also only just scratched the surface. As has tended to be the case throughout my residency, small improvements keep instigating further and greater opportunities for discovery and change.
Karl-Rainer Blumenthal, National Digital Stewardship Resident, New York Art Resources Consortium