Damien Moody wrote:
> My department is investigating the use of checksums for the National Audio Visual Conservation Center. Checksums are a good way to validate files. Our concern here, though, is that our archive will be so large and process so much data that we may not be able to create/compare checksums on every file - perhaps only 1 to a few percent. We may just have to accept a certain amount of file corruption risk. But we'll surely continue investigating ways to ensure file integrity for extremely large archives.

May I suggest taking your cue from parity? Two properties of parity
checking made it practical for computer memory; what killed it for
general use was the cost of the extra bit and the impact on a novice of
a false detection. The properties were:

1. Design so that the correct parity was always inherent in a valid word
- the one extra bit set so that the bits always XORed to one. (Or was it
zero? I've forgotten which.)

2. Hardware test. No CPU was required to verify parity.

If those properties are met - though 1. can be relaxed with some
inventiveness - CRC can be generated and verified as fast as the bytes
can be streamed through the low-cost hardware.

[log in to unmask]