NLM® Launches Digital Collections, a Repository for Access to and Preservation of Digitized Biomedical Resources
On September 27, 2010, the National Library of Medicine® (NLM) launched a new free digital repository, Digital Collections, which is complementary to the PubMed Central® digital archive of electronic journal articles. The repository currently provides rich search, browse and retrieval of monographs and films from the NLM History of Medicine Division. Additional content and other format types will be added over time. Users can perform full text and keyword searching within each collection or across the entire repository.
"The new Digital Collections repository will allow NLM to provide permanent, robust access to an even broader range of biomedical information," said Betsy Humphreys, Deputy Director, NLM.
Accessing the Collections
This first release of Digital Collections includes a newly expanded set of Cholera Online monographs, a portion of which NLM first published online in PDF format in 2007. The Cholera Online now available via Digital Collections includes 518 books dating from 1817 to 1900 about cholera pandemics of that period. More information about the selection of the books and the subject of cholera may be found on the original Cholera Online Web page. Each book was scanned into high-quality TIFF images, which underwent optical character recognition to generate corresponding text files. Finally, a JPEG2000 derivative was created for each page for presentation through the integrated book viewer, which includes a Flash®-based zooming feature for resizing and rotating a page on demand (see Figure 1).
The second collection is a selection of eleven historical films, all created by the US Federal Government and in the public domain. The films have been digitized to a variety of video formats to accommodate a wide range of playback devices, including mobile devices. Digital Collections also includes an integrated, Flash-based video player which allows full text search of a film's transcript and graphically displays where the searched word or phrase occurs within the timeline of the film (see Figure 2).
Preserving the Collections
Every page of each book and every video are stored as a discrete object in Digital Collections, with an XML "glue" describing each object and relationships between objects. To ensure long-term integrity of these digital files, checksums (number strings which act like mathematical "fingerprints") are calculated and written into the objects as the objects are ingested into Digital Collections. These checksums will be re-calculated periodically and compared with the original values. Additionally, all ingested files are versioned, so that any changes do not overwrite the original but rather create a new, second file which is stored along with the first.
Technology
Digital Collections was built using several open-source components, with the Fedora Commons Repository Software providing the foundation. The primary browse and search interface has been adapted from the Muradora "front-end" for Fedora, created by Macquarie University, Sydney, Australia. The book viewer is a component of the Northwestern University Book Workflow Interface, also created specifically for use with Fedora. The Los Alamos National Laboratory djatoka JPEG2000 server handles the images. The video player was adapted from a research project by the NLM Office of Computer and Communications Systems.
Project
In 2009, NLM began a pilot project to build the repository, develop appropriate workflows for ingesting and managing the content, and provide a core set of end-user services suitable for general public access. Information on the year-long evaluation process leading to the selection of Fedora is available. Please send your comments and questions about Digital Collections to NLM customer service.
Doyle JP. NLM® Launches Digital Collections, a Repository for Access to and Preservation of Digitized Biomedical Resources. NLM Tech Bull. 2010 Sep-Oct;(376):e9.