Skip Navigation Bar
NLM Technical Bulletin Header
Article Navigation Bar Table of Contents NLM Technical Bulletin Home Page Back Issues
 March 9, 2005 [posted]
 
 
 Permanence Levels and the Archives for NLM's® Permanent Web Documents
 
 

drop cap letter for t he instability of resources on the Web is one of many challenging issues related to digital preservation. Several years ago, NLM recognized the seriousness of this problem and included in its long range plan for 2000-2005 the following objective:

Take a leadership role in ensuring permanent access to important digital materials in health and biomedicine, including electronic journals, databases, documents published on the Web, and new kinds of scholarly communication and documentation of knowledge, using NLM's own electronic output and services as initial testbeds.

To this end, NLM has developed a system for communicating to users whether the resources they consult on our Web site will be kept permanently available, change over time, or possibly disappear altogether. In addition, we have created an online archive for NLM's permanent Web documents that are no longer current.

Background
In 1999, the Working Group on Permanence of NLM's Electronic Information (Permanence Working Group) was appointed and asked to examine the range of electronic information produced by NLM and develop recommendations in the following areas:

a) levels of permanence suitable for different categories of NLM information

b) methods of recording and communicating the level of permanence of NLM electronic information

c) procedures for ensuring that the levels of permanence are implemented in practice

d) approaches to labeling, organizing, retrieving and displaying NLM's electronic information so that the retention of older materials would not have a negative impact on those seeking current information.

The Permanence Working Group's discussions focused initially on three important characteristics of Web documents: identifier validity, resource availability, and content invariance. The Group developed a rating system based on these three concepts. The ratings later were distilled into the following four permanence levels:

Permanent: Unchanging Content
This resource will be kept available permanently. Its identifier will always provide access to the resource. Its content will not change. Example: Minutes of the NLM Board of Regents
Permanent: Stable Content
This resource will be kept available permanently. Its identifier will always provide access to the resource. Its content is subject only to minor corrections or additions. Example: Fact Sheets
Permanent: Dynamic Content
This resource will be kept available permanently. Its identifier will always provide access to the resource. Its content could be revised or replaced. Example: NLM's Home Page
Permanence Not Guaranteed
NLM has made no commitment to keep this resource available. It could become unavailable at any time. Its content and identifier could be changed. Example: Frequently Asked Questions

The Permanence Working Group analyzed the documents that were available on the NLM Web site and developed a list of document categories. To simplify the assignment of permanence levels by Library staff, document categories were assigned default ratings.

Document Category Default Permanence Level
Announcements, News Permanence Not Guaranteed
Applications, Forms, Registrations Permanence Not Guaranteed
Bibliographies Permanent: Dynamic Content
Calendars, Schedules Permanence Not Guaranteed
Clinical Alerts Permanent: Unchanging Content
Contracts and Related Resources Permanence Not Guaranteed
Database Permanent: Dynamic Content
Digital Library Collections Permanent: Dynamic Content
Exhibitions Permanent: Stable Content
Fact Sheets Permanent: Stable Content
FAQs, Help Files, Pocket Cards Permanence Not Guaranteed
Finding Aids Permanent: Dynamic Content
Grants, Awards Permanence Not Guaranteed
Lists of Links Permanence Not Guaranteed
Minutes (Official) Permanent: Unchanging Content
Newsletters Permanent: Stable Content
Organizational Charts & Directories Permanence Not Guaranteed
Other Blank (No Default Rating)
Photos of Staff, Programs, Activities, Buildings & Grounds Permanence Not Guaranteed
Policies (Official) Permanent: Stable Content
Press Releases Permanent: Stable Content
Procedures Permanence Not Guaranteed
Product, Program, & Project Descriptions Permanent: Dynamic Content
Reports (Official) Permanent: Stable Content
Software Permanence Not Guaranteed
Staff Biographical Sketches Permanence Not Guaranteed
Staff Papers Permanence Not Guaranteed
Staff Presentations Permanence Not Guaranteed
Training Materials & Manuals Permanent: Dynamic Content
Visitor Information Permanence Not Guaranteed

NLM's Metadata Schema
During the deliberations of the Permanence Working Group, NLM's Task Group on Metadata and Methods of Recording Permanence Levels was appointed and charged with developing an expanded set of metadata to increase the retrievability of NLM's Web documents. It also was asked to decide how permanence metadata would be recorded and displayed. The Task Group recommended that metadata should be created for all publicly available electronic resources created by NLM and that permanence levels be a required element of the metadata set. The NLM set is based on the Dublin Core Metadata Element Set but with some local adaptations--most notably the addition of permanence ratings. See http://www.nlm.nih.gov/tsd/cataloging/metafilenew.html.

Implementing the System
A third committee, known as the Electronic Archive Group (EAG) then was charged with developing a pilot project for assigning metadata including permanence levels and building an archive for outdated Web documents of permanent value to NLM. The EAG evaluated several systems under development elsewhere and concluded that TeamSite, a content management system developed by Interwoven, Inc. that was being purchased for NLM's main Web site, could be used for assigning metadata and managing the archiving workflow. A template was created in TeamSite (see Figure 1) and NLM Web contributors were trained to use it to assign basic metadata for all documents that would be submitted for promotion to the Web. The template is designed to minimize the burden on document creators. Default values or drop-down menus are provided wherever possible. When a contributor selects a document category for a document that has just been created or revised, the system automatically provides its default permanence rating. If a default rating does not seem appropriate for a particular document, it can be changed by the person responsible for assigning the metadata or by a system administrator.

  figure 1: graphic

When a contributor assigns to a document a rating of Permanent (Unchanging, Stable, or Dynamic content), the system notifies the NLM Archives Team. The Archives Team reviews the document category and permanence metadata and forwards the document for promotion to the Web. The Cataloging Section then creates a complete MARC bibliographic record with standardized access points, including MeSH and an NLM classification number. The record appears in NLM's online catalog and is distributed to the bibliographic utilities and other NLM licensees. Enhanced metadata created by the Cataloging Section is then added to the header information of the online resource.

The Archiving Process
The system prompts Web contributors at regular intervals to review and revise their current documents as needed. If contributors create a major revision of a permanent document or decide that a permanent document should be removed from the current site without being replaced, the archiving function is triggered.

When a document is moved to the Archives, the date archived is added to its URL. The only links in an archived document that continue to function are those to other parts of the same archived document. All other links are stripped when a document is moved to the Archives.

The Archives
The Archives contain permanent resources with outdated or superseded content. This includes older material that was once on the current NLM site but is no longer of current interest and earlier versions of current documents that have undergone major revisions. After investigating archives models developed elsewhere, the EAG determined that the best way to ensure proper migration of all permanent resources and allow searching and retrieval of archived items was to keep the Archives as a separate but integral part of NLM's main Web site. Archived pages are stored on a separate branch of the main NLM web server as shown in Figure 2.

  figure 2: graphic

The search engine was configured to query both the current site and the Archives but list the search results for archived documents separately (see Figure 3).

  figure 3: graphic

Clicking on an item in the search results takes the user directly to the archived document. An Archives header (see Figure 4) and footer were designed to indicate clearly to users that the documents they have accessed are no longer current.

  figure 4: graphic

At the end of each document are publication, update, and archived dates as well as links from an archived version to the version that replaced it. In the case of Figure 5, clicking on "Replaced By" takes the user to NLM's current site (see Figure 6).

  figure 5: graphic


  figure 6: graphic


Clicking on "Previous Version" at the end of the document will take the user from the current site back to the archived document (see Figure 7). Within the Archives, the user can trace changes in a document over time by clicking on the "Previous Version" link on every archived version of a document.

  figure 7: graphic


A link has also been added so that users can access the metadata for every document (see Figure 8).

  figure 8: graphic

The example in Figure 9 is a partial list of the expanded metadata which is created for all permanent documents:

  figure 9: graphic

Finally, if a user enters a URL for a document that has been moved to the Archives and there is no current version of the document on the main site, a redirect page will provide a link to it (see Figure 10).

  figure 10: graphic

Additional Work:
Currently only HTML documents are being archived. NLM has developed a sidecar approach to providing metadata for non-HTML documents such as PDFs. Contributors use a templated form similar to that used for HTML pages to enter metadata (see Figure 1). System workflow validators require that contributors create this metadata file before a non-HTML document can be promoted. The metadata file is structured as Dublin Core XML schema which can also be queried by the site search engine; implementation is scheduled for the first half of 2005. Web documents created by the NLM administrative units that do not use the TeamSite content management system currently are not included in the Archives. In the future the workflow will be modified so that all of NLM's outdated Web publications of permanent value can be archived. Finally, NLM hopes to work with other libraries to encourage their use of permanence ratings for Web documents that are of lasting value.

By Margaret M. Byrnes
Head, Preservation and Collection Management Section

black line separting article from citation

Byrnes MM. Permanence Levels and the Archives for NLM's Permanent Web Documents. NLM Tech Bull. 2005 Mar-Apr;(343):e4.

 


Article Navigation Bar NLM Technical Bulletin Home Page Back Issues Index Previous Page Next Article
U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda, MD 20894
National Institutes of Health, Department of Health & Human Services
Copyright, Privacy, Accessibility, Freedom of Information Act (FOIA)
HHS Vulnerability Disclosure