David Seubert wrote:
> We are currently creating parallel preservation copies, both online and
> on optical media, but eventually I see us phasing out the physical
> media. Before we do that, one thing I feel is necessary and that we have
> been discussing is the integrity of the data stored online. Once the
> data goes onto disk, there is no practical way to manually make sure
> that files haven't become corrupted over time, during a backup and
> restore process, or during a migration from one system to another. We've
> discussed using checksum files created upon ingest that would be
> periodically and automatically compared against the files to ensure that
> nothing has become corrupted. In case of corruption, the original file
> could be restored from tape. I've noticed that the audio files in the
> Internet Archive have associated checksum files so you can make sure
> that the file you have downloaded is identical to the original. I don't
> know if they also use these to ensure data integrity over the long term.
> Has anybody looked into this further or implemented this for archiving
> audio files?
> David Seubert
Forgive me for quibbling, but there is a physical medium even online.
The server may not be in your facility, there may be several of them,
but all are quite physical and without them and their media, the audio
files would persist only as long as the electrical currents were
flowing. (I do assume that no one is considering audio delay lines for
storage. Half a century should be long enough to have killed that
In that vein, redundant storage and verification with checksums will
suffice. Three copies may not last the remaining years of the universe,
but there's no law saying that a fourth (or fifth) cannot be provided.
Neither is there a rule saying that the redundant copies need by on
disc; here again optical or magneto-optical or other storage may be
appropriate even if your primary (and fallible) medium is magnetic.
[log in to unmask]