Timothy Young at Yale posed an interesting question about the
indexing and retrieval for special characters. To his dismay,
he realized that if special characters are encoded in EAD
finding aids, those same characters must be used in searching
unless some special accommodation is made somewhere.
This problem confronted early library systems too. Having
the special marks associated with "ordinary" alphabetic
characters was one way to get around the problem. That's
one of the reasons the USMARC character sets treat modifying
marks (accents, etc.) as separate characters. It makes it
much easier for indexing and retrieval systems to ignore
the specialness of special characters. This solution
won't work for special characters that are more than just
an ordinary letter with a special mark (for example, the
"thorn" used in Icelandic). In many system, nonspecialized
characters are associated with special characters for
indexing and retrieval (for example, thorn can be indexed
and retrieved as an ordinary "d"). I'm not suggesting
that the bibliographic (i.e., MARC) solution is the best
for archival finding aids, but it is one solution.
Regardless of the solution, it will require special provisions,
rules, programming, and training of users to search
when special characters are involved. There is not
an easy solution.
Developers of ISO 10646, the Universal Coded Character Set,
and implementations of it like Unicode(tm), are concerned
with issues such as indexing, retrieval, sorting, etc. As more
characters come to be available to us, using them in search
and retrieval will become a challenge. Perhaps we should
look at authority files or thesauri as a possible solution.
When people search for matches to text strings, they should
be given help if variations are possible. If searching
Muller is intended to also bring back M"uller and Mueller,
authority and thesauri systems can help guide the users.
I don't think the solution need necessarily burden the EAD
group. It's a general problem with data nowadays that involves
rich character data. Applications that can handle special
characters for printing, display, and retrieval need to
provide the solution. I rather us not try to fix it withing
the scope of the EAD DTD.
--Randy Barry
*****************************************************************
* Randall Keigan Barry LL *
* Senior MARC Standards Specialist LLL *
* U.S. Library of Congress LLL *
* Network Development and MARC Standards Office LLL CCCCC *
* 101 Independence Avenue, S.E. LLL CCC CCC *
* Washington, DC 20540-4102 U.S.A. LLL CCC *
* TEL: +1-202-707-5118 LLLLLLLLLL *
* FAX: +1-202-707-0115 CCC CCC *
* NET: [log in to unmask] CCCCC *
*****************************************************************
* NOTE: The ideas and opinions expressed in this communication *
* are personal and do not necessarily reflect the position of *
* the Library of Congress or any other U.S. government agency. *
*****************************************************************
|