This message details a project at the Library of Congress for which we
plan to use MODS for the metadata.
MINERVA (Mapping the Internet: Electronic Resources Virtual
Archive) see: http://lcweb.loc.gov/minerva/minerva.html (formally known
as the Web Preservation Project), is an experimental pilot developed to
identify, select, collect and preserve open-access materials from the
World Wide Web. The effort includes consensus building within the Library,
joint planning with external bodies, studies of the technical, copyright
and policy issues, the development of a long-term plan and coordination of
prototypes. The aim is to identify what can be done immediately and move
rapidly through prototype into production in these areas.
LC is collaborating with the Internet Archive (Alexa) and a new
group, SUNY and the University of Washington to expand the project. The
latter groups are assisting in identifying content and in using tools of
their design to assign metadata descriptions to the web sites
collected. This metadata database is then used to search, retrieve and
analyze the archived collection of web sites. The contractors have been
assisting us with the collecting and archiving of Web sites all focused on
themes concerning :
The Election 2000: over 1,000 related sites
http://web.archive.org/collections/e2k.html
September 11th - over 2,000 related sites identified
http://september11.archive.org/
The Winter Olympics 2002 - approx. 65 related sites not yet available
Election 2002 - collection not yet started
Elements from the MODS schema will be used for the following reasons:
1. Collection level records are already being created in LC's Integrated
Library System for the main sites. MODS would be more compatible with our
existing MARC bibliographic records for resources being described below
the main Web site. Contractors that create the metadata using MODS could
provide a sort of preliminary record that can be enhanced (or not
depending upon policy) and brought into the ILS.
2. MODS offers a lot of flexibility in terms of how specific you want the
markup to be (e.g. you can subfield the elements of a subject heading or
just use an LCSH string). That will allow for different methods of input
depending upon the expertise of the person creating the record. It will
also serve us well as a preliminary catalog record if we can receive the
more detailed encoding.
3. MODS has potential use as an extension schema for METS (Metadata
Encoding and Transmission Standard), which is an encoding format for
descriptive, administrative, and structural metadata for textual and
image-based works. METS attempts to package together these different forms
of metadata which are essential in a digital repository. This means that
the MODS record could provide the descriptive metadata which then gets
packaged with the administrative and structural metadata in a future
repository. (More information on METS is
at:http://www.loc.gov/standards/mets). We expect to use METS in a later
phase of the project to incorporate preservation, administrative and
structural metadata for resources that are included.
4. Since this is a new project that is not reliant on any existing
software for record creation (e.g. the LC ILS or CORC) this would be a
good opportunity to test the format at LC.
The Network Development and MARC Standards Office is currently working on
tools for the creation of MODS records and the conversion of MARC records
to MODS (and later MODS to MARC so that these records can be brought into
the catalog).
MODS elements to be used in Minerva project
Title: Text included within the title tag of the HTML source file
of a Base URL page.
Name: A personal or corporate entity related to resource; creator
or issuing publisher.
[Name of the entity who appears to be primarily responsible for making the
content of a Base URL page, as identified by reference to text or graphics
on a Base URL page, or by reference to an "about us" page linked from a
Base URL page.]. Name should be in a structured form, i.e. lastname,
firstname and type of name should be indicated (personal, corporate).
Abstract: A brief description of the site associated with a Base
URL page, shall be generated, referencing a possible site producer and
identifying a possible purpose of the site.
Date Captured: The archived time associated with the Base URL page
archived closest to and after 9:00AM EST on September 11, 2001, using the
form YYYYMMDDHHMMSS, such that the date 20010911090000 corresponds to
9:00:00 AM on September 11, 2001. This may be a range; start and end
dates. The date of the first iteration and the date of the last iteration
for each Base URL.
Genre: Each URL shall be identified as "web site."
FormAndPhysicalDescription/Format: A list of the distinct file
formats expressed as a standard Internet media type (e.g., text/HTML,
image/jpeg) of all archived objects associated with a Base URL page.
Identifier: Base URL.
Language: The primary languge of a Base URL page shall be
identified following the practice of ISO 639-2, Bibliographic Code (e.g.,
eng, fre).
See: http://www.loc.gov/standards/iso639-2/langcodes.html
Access Condition/Rights Management: An identifier provided by the
Library associated with a Base URL.
Subject: Uncontrolled Keywords taken from source or controlled
vocabulary extracted from LCSH.
The following is a list of MODS elements requested above expressed in XML
syntax (only beginning tags are provided; end tags are needed after the
data when creating a record). The full MODS schema is available
at: http://www.loc.gov/mods
<title>
<name type="personal"> or <name type="corporate">
<abstract>
<date type="captured">
or <date type="captured" point="start">first date </date><date
type="captured" point="end">end date</date>
<genre>Web site</genre> Note: this can be generated since always the same.
<formAndPhysicalDescription><internetMediaType>
Note: IMT standard can be found
at: http://www.isi.edu/in-notes/iana/assignments/media-types/media-types
<identifier type="uri">
<language > (See: http://www.loc.gov/standards/iso639-2/langcodes.html)
<accessCondition>
<subject> if uncontrolled) -or-
<subject authority="lcsh"> (if controlled)<topic><geographic><temporal>
Portions of LCSH heading is subdivided into elements.
Sample Record in MODS
Here is a MODS record for George Bush Web site. Site was originally
cataloged using MARC 21. Note that the cataloging is for the Bush site as
it looked when it was captured (it looks nothing like this now!).
<title>George W. Bush for President. [electronic resource]</title>
<name type="corporate">Bush for President, Inc.</name>
<genre>Web site</genre>
<date type="captured" encoding="ISO8601">20001011</date>
<language authority="iso639-2b">eng</language>
<language authority="iso639-2b">spa</language>
<formAndPhysicalDescription><internetMediaType>text/html</internetMediaType>
</formAndPhysicalDescription>
<abstract>Presents information about Texas Governor George W. Bush
(b. 1946) and his campaign to become the Republican nominee for
U.S. President. Contains a biographical sketch, a schedule of appearances,
and news about the campaign. Offers access to details about issues and
speeches. Discusses how to participate in the campaign.</abstract>
<subject authority="lcsh"><topic>Republican Party</topic></subject>
<subject authority="lcsh"><topic>Presidential candidates</topic>
<topic>Biography</topic> <temporal>20th century</temporal></subject>
<subject><geographic>United States</geographic></subject>
<subject><geographic>Texas</geographic></subject>
<identifier type="uri">http://www.georgewbush.com/ </identifier>
(There may be a need for additional Internet Media Types to be
recorded. Note also that the element "accessCondition" is not used in
this record.)
We thought it would be useful for subscribers to this list to see how we
are testing MODS.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^ Rebecca S. Guenther ^^
^^ Senior Networking and Standards Specialist ^^
^^ Network Development and MARC Standards Office ^^
^^ 1st and Independence Ave. SE ^^
^^ Library of Congress ^^
^^ Washington, DC 20540-4402 ^^
^^ (202) 707-5092 (voice) (202) 707-0115 (FAX) ^^
^^ [log in to unmask] ^^
^^ ^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|