Print

Print


On Thu, 21 Mar 2002, Jerome McDonough wrote:

>          Technical Metadata for Audio/Video: Michigan State, Library of
> Congress, Harvard

We plan to have a mtg report and revised schema drafts available
sometime the week of April 1st.

>          Technical Metadata for Text: New York University, Harvard

I was supposed to revise and send around a paper on this, mea maxima
culpa. If I can clean up what I have for general consumption I'll send it
out by April, but more likely I will publicly declare defeat.
I would LOVE to see what NYU has done on this.

As a summary, our gross areas of local concern for archiving text (i.e.,
character data, not Word, et al.) were

-- Character set -- about which there is an eye-opening technical report
on the Unicode site: "Character Encoding Model (Unicode technical report
#17)" http://www.unicode.org/unicode/reports/tr17/  What aspects of this
do we need to record and how can we determine them? (Since our
contributors sure as heck won't know...)

-- Markup -- What DTDs, entity files, schemas, style sheets, etc. do we
want to a) know about and/or b) deposit along with the text file? How do
we manage versioning of said auxiliary files?

-- Processing history -- what if anything do we want to know about the
hardware/software environment in which these text files were produced (OCR
engines, etc.)

-- Use -- and this is fuzzy one: what if anything do we need to record
about the application/processing environment in which the file is intended
to be used? For example, a DTD/style sheet may not tell you (as an
archive) everything you need to know about a text object in order to
preserve the functions that it currently fulfills in its application
context. (Does that make any sense?) We can preserve the bits, we're
fairly confident we can preserve the characters and markup, but what
if the app dies?

Discuss.  ;)

--Robin

Robin Wendler  ........................     work  (617) 495-3724
Office for Information Systems  .......     fax   (617) 495-0491
Harvard University Library  ...........     [log in to unmask]
Cambridge, MA, USA 02138  .............