What is a web archive, and what are we archiving when we “archive the web”? What shape will the archive have? When a web archivist looks at the quality of a web capture, they are seeking to capture the functionality and behavior as well as the look and feel of the original site on the day it was archived. Compared to what archivists and librarians are used to cataloging, archiving a website is a relatively abstract concept. A website has numerous moving parts; an archived site is not a static, boxed object, like the archives we are used to.
In January 2016 I completed a 7-month internship with the New York Art Resources Consortium (NYARC)’s web archiving team. Over the course of my internship with the consortium, which was based at the Frick Art Reference Library and the Brooklyn Museum library, I contributed to the quality assurance (QA) of archived websites. The majority of my time was spent on archiving the image-heavy Brooklyn Museum website, in addition to archiving New York City art galleries including Godel & Co. Fine Art and Galerie St. Etienne.
Capturing all of the Brooklyn Museum website’s visual content proved to be a considerable challenge. Our initial site “crawl” failed to capture any of the images and most links were simply not working. Each of these pages had a large quantity of images and links nested and cross-linked. The QA process required following links that inevitably led to subsequent links or sets of images; often the links cross-referenced to other URLs within the website.
A website as large as the Brooklyn Museum’s can prove unwieldy in its size and scope so I thought about ways in which I could keep a clear view of the task at hand. Ironically, I found myself searching for a way to visualize the form of a website created to showcase the visual image. As I reviewed the capture I was reminded of a discussion in an article by Ayala, Phillips and Ko titled “Current Quality Assurance Practices in Web Archiving.” The article discussed the vertical and horizontal nature of a web archive which can also be thought of in terms of flat and deep website architecture. The horizontal refers to the surface level or breadth of the site, while the vertical axis is indicative of the depth. While the authors felt that defining an archive in these terms only was perhaps too limited, thinking of website construction in this way helped me understand the larger idea of the shape and structure of the Brooklyn Museum website.
Shortly after reading the article “Current Quality Assurance Practices in Web Archiving”, (Ayala, et al) I experienced an “A-ha!” moment when I discovered Mozilla Tilt. Tilt is a Firefox add-on that creates a 3D visualization of any website’s architecture by presenting it as stacks of nested elements. Each element has depth and texture corresponding to the webpage’s rendering. When I used Tilt to view the pages in the Brooklyn Museum web archive, not only was I was able to see a pictorial representation of the archived site’s architecture, but I could also manipulate the rendering and turn it like an object in space. For me, this was entirely illuminating. As I proceeded with the QA process of the Brooklyn Museum website, resolving the problems and restoring missing images and links to the archive, the visualization process enabled me to keep a clear view in my head of the website’s construction and hierarchies. Having a physical concept of the site helped me achieve a clearer concept of the overall project
Traditional libraries and archives are being transformed physically and theoretically. In the midst of the sea change that is occurring in the way we handle, create and store our data, it is crucial that we change the shape of our thinking. Web archiving is just one – relatively - new development in the library and museum world.
Chantal Sulkow, Collections Manager, The Center for Books Arts; Adjunct Faculty Cataloger, NYU Division of Libraries; and former NYARC Web Archiving Intern