LISTSERV mailing list manager LISTSERV 16.0

Help for METS Archives


METS Archives

METS Archives


METS@LISTSERV.LOC.GOV


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

METS Home

METS Home

METS  March 2002

METS March 2002

Subject:

Re: Checksum

From:

Jerome McDonough <[log in to unmask]>

Reply-To:

Metadata Encoding and Transmission Standard <[log in to unmask]>

Date:

Tue, 5 Mar 2002 11:56:07 -0500

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (44 lines)

At 09:57 AM 3/5/2002 -0500, Robin Wendler wrote:
>One of the general questions that came out of the recent
>(and soon to be documented, I swear) meeting on technical metadata for
>audio was about the METS <checksum> attribute of the <file> element.
>Right now, this is defined explicitly as MD5, but there are now and
>undoubtedly will continue to be other checksum algorithms in use. We were
>wondering whether it would be better/possible to generalize this in METS,
>providing for the checksum type, value, and create date. It seems better
>to raise this now, rather before we hit the big Version 1.0.

Just as a historical matter, my original choice of MD5 was motivated by a
desire to have A. an assured unique hash value for each file in a large
repository
and B. public documentation of the algorithm and readily available source
code.  There are other options out there, including SHS/SHA-1 (see
http://www.itl.nist.gov/fipspubs/fip180-1.htm), Haval
(http://www.sis.uncc.edu/~yzheng/src/),
and Snefru (http://ciac.llnl.gov/ciac/ToolsUnixSig.html#Snefru).  I'd also been
assuming that the main purpose of storing this information was to allow file
integrity checking software monitor the condition of files within an online
repository, and that this software would be fairly tightly integrated into DL
software systems.  That is, I'd been assuming that you'd generate an MD5
value for a file when you accessioned it into your system, store that in
your DL metadata database, and file integrity software would pull that value
out of the database to due checks on files' status.  However, most commercial
(and noncommercial) file integrity checkers maintain their own, separate
database
of checksum values, so my assumption of how this might work in practice is
not necessarily on target.

That being said, having a checksum value in your database is still valuable
as a means of allowing people who are on the receiving end of files that
your repository is shipping out to check whether the files have been corrupted
in transit it or not.


Jerome McDonough
Digital Library Development Team Leader
Elmer Bobst Library, New York University
70 Washington Square South, 8th Floor
New York, NY 10012
[log in to unmask]
(212) 998-2425

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password