We're using the DLXS software which lets us store and index utf-8
characters. But it also has an option to map characters to make
retrieval a bit more loose. For example, "Comité" would be retrieved
in a search for "Comite," because the é [LATIN SMALL LETTER E WITH
ACUTE] is mapped to e [LATIN SMALL LETTER E]. These mappings are
As part of our local preparation process, the non-ASCII characters
are marked up as numerical entities, then converted to utf-8. This
can be helpful if you want your source data to be explicit, but will
quickly become burdensome if you have a lot of non-ASCII characters.
In that case you would want to dispense with the numerical entities
in favor of utf-8 characters throughout. I recommend validating the
utf-8 encoding with JHOVE.
Characters that are not explicity mapped must be entered as the
actual character. The HEBREW LETTER ALEF, for example.
On Jun 6, 2007, at 9:26 AM, Michele Combs wrote:
> For those of you who are offering web-based searching of your EAD
> finding aids by title or author, how are you handling special
> in the title, subtitle, or originator (for example, the French
> e, or German umlauted u) ? Are you encoding those special
> characters in
> the EAD finding aid using character entities? If so, does it cause
> problems with indexing or searching, given that most researchers will
> not have the special characters in their search string (for example,
> they'll likely just use an e without the accent) ?
> Michele C.
> Michele R. Combs
> [log in to unmask]
> Manuscripts Processor
> Special Collections Research Center
> Syracuse University Library
> 222 Waverly Avenue
> Syracuse, NY 13244
> (315) 443-2697
University of Wisconsin Digital Collections Center
[log in to unmask] (608) 262-3349