In the early days of digitization, as leaders at Cornell, the University of Michigan, the Library of Congress and elsewhere started building online collections, they began to realize how much their individual collections could be enhanced if they were made interoperable with others. Online patrons could then access books and serial publications digitized and managed at many different sites.

No, was the group's answer, because new, exclusive standards could reduce the registry's usefulness by keeping out of it some large and valuable legacy collections. Besides, users could make their own judgments about whether any particular digitized items would meet their needs. Nonetheless, DLF followed the April meeting with one in May 2001 that produced a recommendation for "a minimum benchmark for a faithful digital reproduction of a printed book or serial publication." For more on the benchmark and background, see Registry of Digital Reproductions of Paper-based Books and Serials.

The group further recommended that preservation digital masters have descriptive, structural, and administrative metadata, available in well-documented formats such as XML; and that structural metadata include page-level information (e.g., as required for page turning and related application software). The group recommended a "minimum list of structural metadata elements" in an appendix to its report. Additionally, the group specified that preservation digital masters may also include machine-readable text as follows: either uncorrected OCR or corrected OCR that is below 99.995% accuracy, or corrected text (keyboarded or OCR) that is at or above 99.995% accuracy, as well as text that is encoded (at any level, e.g. as specified in TEI Text Encoding in Libraries, Guidelines for Best Encoding Practices, Version 1.0, July 30, 1999). The benchmark is available for review at Draft benchmark for digital reproductions of printed books and serial publications.

The group does not intend to promote or define methods for creating digital replacement copies for source documents. It is presumed that many of them will be retained even as their digital surrogates are used to enhance access. Nor does the group intend to make an absolute statement of best practice. Such a statement would assume that digitization methods will not improve, but they will, and best practice with them. That is why the group defined preservation digital masters as digital objects with minimum characteristics. The group also has no intention of disparaging the importance of legacy digital collections, forcing their rescanning, or encouraging poor management of them. The recommendations look forward, suggesting that from now on preservation digital masters be made to meet or exceed the recommended minimum characteristics.

By agreeing to a benchmark for a preservation digital master, libraries and other organizations can reduce risks in producing and maintaining digitized texts, and inspire users' confidence in them. Because preservation digital masters will be considered viable for meeting future needs, repositories investing in them will be secure in the knowledge that re-digitizing will not be necessary even as production techniques improve. Because digital masters will have well-known and consistent properties, they will support a wide variety of uses (including uses not possible with printed texts). As more printed texts become digitized, collection managers may investigate alternatives to preserving redundantly multiple copies of them, such as considering a network of specialist print repositories.


