A handy guide on the best strategies and practices to digitally preserve a Library's collection
What is digital preservation?
Digital preservation is the coordinated and ongoing set of processes and activities that ensure long-term, error-free storage of digital information, with means for retrieval and interpretation, for the entire time span the information is required. This section provides a brief background to the principles that underpin the currently accepted strategies.
Introduction to digital preservation
Cultural institutions are increasingly devoting money and resources into building their digital collections, both by reformatting physical materials and by creating and acquiring digital originals. All characteristics of these digital objects need to be available for the future – the data stream (bits and bytes), the "findability", the functionality and structural relationships between complex digital objects as well as the ability to display correctly and retain the "look and feel" of the original document, image, sound file or web page. Ensuring the sustainability of these digital assets requires more than static storage and backup regimes, it requires the active management of this digital information over time to ensure its continued viability and accessibility.
Digital preservation can be defined as:
The coordinated and ongoing set of processes and activities that ensure long-term, error-free storage of digital information, with means for retrieval and interpretation, for the entire time span the information is required.
Born digital assets (digital originals with no analogue counterparts) are particularly vulnerable to potential loss from our cultural and heritage landscape. While we still have physical documents from many hundreds of years ago, we are in danger of losing electronic documents and digital images created in the last decade. Digital content is pervasive and powerful. It is easy to create and to update but these characteristics also contribute to the challenge of preserving it for the future.
Digital preservation can appear a daunting challenge to collection managers, more so as the size, complexity and history of the collections increase. This section is intended to identify the problem and provide a brief background to the principles that underpin the currently accepted strategies. Links at the end of this section direct the reader to comprehensive information about Digital Preservation, both at an introductory and advanced level.
Digital Dark Age ?
The term 'Digital Dark Age' is often used to describe a scenario where vast amounts of digital information is lost or rendered permanently irretrievable.
Though the potential severity of this is open to debate, it is clear that the global library of knowledge and cultural heritage in digital forms is at risk.
There are two critical reasons for developing and implementing digital preservation practices:
- Physical deterioration of carrier media;
- Technological obsolescence of hardware/software.
The media on which digital contents are stored are more susceptible to deterioration and catastrophic loss than some analogue media such as paper and microfilm. Digital storage media may deteriorate more rapidly and once the deterioration starts, there may already be data loss. Relatively small amounts of media damage can cause file corruption and complete loss of data. This characteristic of digital formats leaves a very short time frame for preservation decisions and actions.
Rapid advances in storage and recording and playback technologies means hardware, software and file formats may become obsolete in a matter of years.
Digital content created with such technologies is at great risk of loss, simply because it will become no longer accessible or cannot be correctly rendered.
Lack of standards, protocols and proven methods for preserving digital information, as well as the prevalence of proprietary technology and file formats, adds to the problem of ensuring content is retrievable and useable in the future.
Guidelines for digital preservation
The following principles/components of a Digital Preservation Strategy have been proposed:
- Use sustainable file formats –successful digital preservation activities will depend on controlling the makeup of your digital repository (i.e. digital asset management database) and of your digital assets being of known types. It is essential to create and acquire digital content that is in recommended file formats only. Sustainable formats are those, which comply with standards, are patent free, support metadata and interoperability and have a critical mass of user acceptance. Each type of media (image, audio, video etc) has a range of possible file formats that should be used. File format registries (e.g. PRONOM) have been created for the purpose of defining, assessing and selecting appropriate formats for a variety of digital content.
- Authenticate digital objects - once you have established your repository, ensure that archival master files match the attributes of recommended file formats. Various authentication modules are available (e.g. JHOVE2) as open source software. They can analyse files prior to ingestion and compare attributes to known criteria (generally technical metadata specifications).
- Use detailed and standardised metadata - in order to ensure long-term accessibility or resources one of the key activities is creating good quality preservation metadata. Preservation metadata are intended to store technical details on the format, structure and use of the digital content, the history of all actions performed on the resource including changes and decisions, the authenticity information such as technical features or custody history, and the responsibilities and rights information applicable to preservation actions. It is essential that metadata should be OAIS compliant to allow interoperability, sharing and harvesting by other organisations and systems.
- Replication – creation of multiple copies of data at one or more locations and on one or more systems. Digital data is more likely to survive software or hardware failure, intentional or accidental alteration, and environmental catastrophes if it is replicated in several locations. Active management of replicated data is essential to control issues with version control and access over multiple locations.
- Refreshing - the transfer of data between two types of the same storage medium while monitoring and maintaining data integrity. Refreshing will always be necessary due to the deterioration of physical media.
- Migration - the transferring of data to a newer hardware and/or software environment. This may include conversion of resources from one file format to another, from one operating system to another or from one programming language to another, so the resource remains fully accessible and all functional characteristics are retained.
- Emulation – the “look and feel” and functionality of legacy datasets/application or websites can be crucial to the value of them as digital objects. Emulation uses modern technologies to render the data as it was originally intended, even when older operating systems and infrastructure is no longer available. An alternative approach is to maintain older infrastructure and systems in a ”technology museum”.
- Sustainability - Active management – is the proactive and continuous data management that encompasses a range of strategies that contribute to the longevity of digital information. Digital sustainability focuses on building a flexible approach to data preservation with an emphasis on interoperability, standards, continued maintenance and continuous development.
Which of these principles are chosen to be implemented in any digitisation project will depend on the scope, purpose and available resources. Sound examples of digital preservation initiatives can be found via the websites of the British Library, National Library of New Zealand, the National Library of Australia and the Library of Congress.
Large scale implementations – National Library of New Zealand
If funding and resources are available, partnering with credible vendors can result in comprehensive and nationally distributed solutions.
In 2003, new legislation gave the National Library of New Zealand the mandate to collect digital materials and “preserve the nation’s digital heritage in perpetuity". In 2006, the National Library of NZ successfully sought funding for a Digital Preservation system to embrace the needs of the country’s existing and planned digital asset collections. As part of the national Digital NZ Strategy, the Library developed a comprehensive Digital Preservation system in collaboration with a commercial partner. The final product, Ex Libris Rosetta, supports the acquisition, validation, ingest, storage, management, preservation, and dissemination of different types of digital objects. Digital New Zealand from The National Library of New Zealand, provides an end-to-end solution to managing and preserving New Zealand's digital assets, both legacy collections and new, born-digital acquisitions, including electronic publishing and community created content.
Open source solutions for small archives
There are a number of established or emerging open source solutions for digital asset preservation and management. Some are well known (e.g. D-space and Fedora within academic and university libraries). The Library of Congress is developing open source software and tools to support their digital stewardship activities s with a network of partners involved in their National Digital Infrastructure & Preservation Program (NDIIPP). The National Library of Australia is currently exploring open source (supplemented by considerable in-house development) as an option for its Digital Library Infrastructure Review project.
It is important to remember that open source does not mean any cost - considerable resourcing and technical expertise is required to implement these modules and tools to meet individual institutional requirements.
Archivematica is just one example of a free and open source software solution designed for Digital Preservation. It is a comprehensive digital preservation system that allows users to process digital objects from ingest to access in compliance with the ISO-OAIS functional model. Archivematica uses METS, PREMIS, Dublin Core and other best practice metadata standards.
Digital Preservation links
PRONOM from the National Archives, UK is a resource for anyone requiring impartial and definitive information about the file formats, software products and other technical components required to support long-term access to electronic records and other digital objects of cultural, historical or business value.
DROID (Digital Record Object Identification) is an automatic file format identification tool. It is the first in a planned series of tools developed by The National Archives (UK) under the umbrella of its PRONOM technical registry service.
The National Archives UK, Guidance notes produced by the Digital Preservation Department, give advice and guidance on general issues which should be considered by the creators and managers of electronic records when selecting file formats for use.
The National Library of Australia (NLA) has been researching issues and undertaking activities relating to digital preservation for a number of years. NLA: Recommended Practices for Digital Preservation provides advice for those with a long term responsibility for management and preservation of digital materials.
NLA: Digital preservation Policy Statement outlines the directions the National Library of Australia takes in preserving its digital collections, and in collaborating with others to enable the preservation of other digital information resources.
The NLA: Preserving Access to Digital Information (PADI) initiative aims to provide mechanisms that will help to ensure that information in digital form is managed with appropriate consideration for preservation and future access.
JISC Beginner’s Guide to Digital Preservation guide is aimed at those who are new to digital preservation but can also serve as a resource for those who have specific requirements or wish to find further resources in certain areas.
DigitalPreservationEurope (DPE) fosters collaboration and synergies between many existing national initiatives across the European Research Area. DPE addresses the need to improve coordination, cooperation and consistency in current activities to secure effective preservation of digital materials.
The Technical Guidelines for Digitizing Cultural Heritage Materials shares best practices followed by agencies participating in the US Federal Agencies Digitization Guidelines Initiative (FADGI) Still Image Working Group for digitizing cultural heritage materials.
The Library of Congress Digital Formats Web site provides information about digital content formats.