We're using the DLXS software which lets us store and index utf-8  
characters. But it also has an option to map characters to make  
retrieval a bit more loose. For example, "Comité" would be retrieved  
in a search for "Comite," because the é [LATIN SMALL LETTER E WITH  
ACUTE] is mapped to e [LATIN SMALL LETTER E]. These mappings are  

As part of our local preparation process, the non-ASCII characters  
are marked up as numerical entities, then converted to utf-8. This  
can be helpful if you want your source data to be explicit, but will  
quickly become burdensome if you have a lot of non-ASCII characters.  
In that case you would want to dispense with the numerical entities  
in favor of utf-8 characters throughout. I recommend validating the  
utf-8 encoding with JHOVE.

Characters that are not explicity mapped must be entered as the  
actual character. The HEBREW LETTER ALEF, for example.


On Jun 6, 2007, at 9:26 AM, Michele Combs wrote:

> For those of you who are offering web-based searching of your EAD
> finding aids by title or author, how are you handling special  
> characters
> in the title, subtitle, or originator (for example, the French  
> accented
> e, or German umlauted u) ?  Are you encoding those special  
> characters in
> the EAD finding aid using character entities?  If so, does it cause  
> any
> problems with indexing or searching, given that most researchers will
> not have the special characters in their search string (for example,
> they'll likely just use an e without the accent) ?
> Thanks
> Michele C.
> -=--=--=--=--=--=--=--=--=--=--=--=--=--=-
> Michele R. Combs
> [log in to unmask]
> Manuscripts Processor
> Special Collections Research Center
> Syracuse University Library
> 222 Waverly Avenue
> Syracuse, NY 13244
> (315) 443-2697
> -=--=--=--=--=--=--=--=--=--=--=--=--=--=-

Brian Sheppard
University of Wisconsin Digital Collections Center
[log in to unmask]    (608) 262-3349