Our SAMMA system that is installed at Yale uses MD5 checksum for ALL the files transferred from the internal RAID to Yale's server farm. The time taken is trivial as compared to other processes. This is a non issue if implemented correctly. jim Jim Lindner * Email: [log in to unmask] * Media Matters LLC. * Address: 500 West 37th Street, 1st FL New York, N.Y. 10018 * eFax (646) 349-4475 * Mobile: (917) 945-2662 * www.media-matters.net Media Matters LLC. is a technical consultancy specializing in archival audio and video material. We provide advice, analysis, and products to media archives that apply the beneficial advances in technology to collection management. -----Original Message----- From: Association for Recorded Sound Discussion List [mailto:[log in to unmask]] On Behalf Of Casey, Michael T Sent: Monday, August 15, 2005 9:54 AM To: [log in to unmask] Subject: Re: [ARSCLIST] long range file storage The Sound Directions project is also investigating the use of "checksums" (MD5 hash, SHA-1, for example) at Indiana University and Harvard University. We will investigate data integrity checking both for interim (within our archives) and long-term (mass storage)storage. We're just getting rolling but I expect we will have data on performance, implications for workflow, etc. next year. www.dlib.indiana.edu/projects/sounddirections/ ---------- Mike Casey Coordinator of Recording Services Archives of Traditional Music Indiana University (812) 855-8090 -----Original Message----- From: Association for Recorded Sound Discussion List [mailto:[log in to unmask]] On Behalf Of Damien Moody Sent: Monday, August 15, 2005 7:03 AM To: [log in to unmask] Subject: Re: [ARSCLIST] long range file storage My department is investigating the use of checksums for the National Audio Visual Conservation Center. Checksums are a good way to validate files. Our concern here, though, is that our archive will be so large and process so much data that we may not be able to create/compare checksums on every file - perhaps only 1 to a few percent. We may just have to accept a certain amount of file corruption risk. But we'll surely continue investigating ways to ensure file integrity for extremely large archives. Damien J. Moody Information Technology Services Library of Congress >>> [log in to unmask] 08/11/05 11:04 PM >>> We are currently creating parallel preservation copies, both online and on optical media, but eventually I see us phasing out the physical media. Before we do that, one thing I feel is necessary and that we have been discussing is the integrity of the data stored online. Once the data goes onto disk, there is no practical way to manually make sure that files haven't become corrupted over time, during a backup and restore process, or during a migration from one system to another. We've discussed using checksum files created upon ingest that would be periodically and automatically compared against the files to ensure that nothing has become corrupted. In case of corruption, the original file could be restored from tape. I've noticed that the audio files in the Internet Archive have associated checksum files so you can make sure that the file you have downloaded is identical to the original. I don't know if they also use these to ensure data integrity over the long term. Has anybody looked into this further or implemented this for archiving audio files? David Seubert UCSB