The State Library is temporarily closed until further notice. See updates here.
This is a transcript of a lightning talk by Matthew Burgess, Digital Collections Analyst, at the 14th International Digital Curation Conference in Melbourne on Tuesday 5 February 2019.
The Library collects material that documents life in New South Wales, from the earliest times to the present day, and has been actively collecting born-digital photographs since 2006 to create a collection that reflects the history of New South Wales. When I commenced in this role in 2017, we had a backlog of born-digital collections that required analysis and preparation for ingestion into our newly implemented digital preservation system, Rosetta. It was through the analysis and preparation of this backlog that I started to discover inconsistencies and possible issues with some of our born-digital photographic collections. This included no colour profile information, layered TIFFs, TIFFs derived from JPEG, copyright and other symbols in filenames, missing or wrong file extensions and junk files such as AppleDouble system files. Some of these are minor issues that can be remedied easily, such as amending filenames or deleting junk files, while others such as lack of colour profile or layered TIFFs, meant that we cannot ensure accuracy when rendering images for access through automated workflows.
The process of preparation and ingestion of the backlog provided a good starting point for the development of specifications and guidelines explicitly for collecting born-digital photographs. Our previous four-page documentation covered all digital formats including text, audio and video, with a small section defining TIFF as our preferred file format for photographs. It is believed that this may have led some donors and vendors to convert their existing JPEGs to TIFF to adhere to the specifications, and while there is no loss of quality in this conversion there is a significant gain in file size without benefit.
The development of new documentation provided the opportunity to review our preferred file formats, and with a background in photography I thought it was important to introduce camera raw file formats to our born-digital collecting practices. They contain the uncompressed, binary image data captured directly from the camera sensor and have often been equated to the digital equivalent of film in providing greater control in how the image is processed with finer tonal graduation. It also provides a level of trust and authenticity in the captured content since the raw data cannot be directly manipulated.
Most cameras use their own proprietary camera raw file formats that are undocumented. The Digital Negative file format was developed by Adobe as an open format based on the TIFF/EP standard and, following stakeholder consultation, was chosen as our preferred file format where suitable. Looking at the benefits, apart from those already mentioned, it can have a smaller file size than a TIFF and is a self-documenting file with embedded metadata including a checksum for the raw data. It also has its risks, though, with limited support for processing when compared with TIFF or JPEG, possible corruption when converting, it is still under consideration as an ISO standard and there are concerns around hidden, proprietary metadata.
The aim for our new specifications and guidelines was to provide clearer instructions on what our preferred file formats are and under which circumstances they are suitable. Following consultation with photographers, acquisitions librarians and key internal stakeholders, the new document identified camera raw as the preferred format. This notes that Digital Negative is preferred with proprietary camera raw also accepted, which we would retain as a digital original and normalise to Digital Negative for a preservation master. Following this, it states the general characteristics, specifications and metadata requirements for TIFF files derived from camera raw, and notes that JPEG is also accepted under some circumstances.
It is important to clarify that we approach the acquisition of born-digital photographs and file formats on a case-by-case basis. You cannot have a blanket rule defining one file format over another – what about unique photos that were taken on a phone in a JPEG format? What if the photographer does not want to provide a raw file? What if the photographer has applied some creative processes to their work in a program like Adobe Photoshop? There is also a difference in the acquisition of legacy collections vs commissioned photography, where we can be much more prescriptive with the latter.
With the implementation of new specifications and guidelines, it was important to ensure our acquisitions librarians understood the process of digital photography and how to assess digital files and their suitability for long-term digital preservation. They needed to understand what it meant when they asked a photographer to supply a raw file, or an uncompressed TIFF with 16 bits per channel and an Adobe RGB colour profile. We needed to equip them with the knowledge and confidence to have a conversation with donors and vendors, ask the right questions and ensure we receive the best available copy at the point of acquisition.
In consultation with my colleagues in Digitisation and Imaging, I developed a two-part Introduction to Digital Photography workshop that involved a presentation as well as a hands-on workshop. Part one looked at types of digital cameras, file formats, terminology including bit depth, resolution, white balance, colour mode, colour profile, and compression, highlighting how these affect the quality of an image and its suitability for long term digital preservation. We also looked at examples from our own collections to highlight some of the issues I mentioned previously. Part two was a practical workshop that looked at how file format at the point of capture impacts image quality, software and tools used to work with digital photographs and how to analyse incoming collections for quality control.
The practical component was used for in-depth training of frontline staff to highlight how file format affects image quality. This provided an opportunity to walk across the road from the Library to the Royal Botanic Gardens, cameras in hand, with a set of activities to take photos with both camera raw and JPEG file formats using different white balance and exposure settings. This was followed by an introduction to Adobe Bridge and Camera Raw in the computer training room, asking participants to process their photos to our own standards. The practical session ended with quality control, using tools such as ExifToolGUI and Adobe Bridge to inspect the metadata in files to ensure they adhere to our specifications.
Feedback from participants was very positive, highlighting that the training increased their knowledge and skills a lot. They came away with a much better understanding of the terminology and confidence in being able to analyse the quality of born-digital photographic acquisitions.
Since the implementation of new specifications and training, our acquisitions librarians are asking informed questions and seeking clarification from photographers, they know how to use the tools to look at the quality of incoming files and require less input from the digital curation team, which allows us to focus on strategy and automation of our pre-ingestion workflows. We spend less time pre-conditioning collections and pick up issues that are not identified in the validation stack within Rosetta, such as missing colour profiles. Unprocessed collections that were previously placed in the ‘too hard’ basket are now easy to resolve with clear guidelines and workflows in place.
It was an exciting opportunity to bring my background and experience into a space where it was needed at a critical time. As we continue to collect born-digital photographs that document life in New South Wales, we face new challenges that require further clarification and fine-tuning of our processes and documentation. We plan to run a shortened version of the training without the practical component for new and existing staff to re-enforce our standards and ensure we continue to ask the right questions of both ourselves, and our donors and vendors.
You can download a copy of the specifications and guidelines for born-digital photographs via the following link: pro_digitalcollecting_born-digitalphotographs_v1.4_20190121.pdf
Image credits - Mitchell Library, State Library of New South Wales, photographs by (left to right, top to bottom): Jon Lewis, Robert Wallace, John Janson-Moore, Geoff Ambler, D-Mo Zajac, Jon Lewis, Louise Whelan, Barbara McGrady, Maylei Hunt, Bruce York, Birgit Neiser and Brad Steadman.