Mike,
Here is the spec.
http://www.w3.org/TR/2006/REC-xml-20060816/#sec-predefined-ent
Have fun!
Mike Ferrando
Library Technician
Library of Congress
Washington, DC
(202) 707-4454
----- Original Message ----
From: Michael Doylen <[log in to unmask]>
To: [log in to unmask]
Sent: Tuesday, June 19, 2007 1:56:13 PM
Subject: Re: special characters in EAD (+ federated search)
What are the 5 specified in the standard? Where could I find them?
Thanks,
Michael Doylen
Fox, Michael wrote:
> While one can do whatever one wishes in the privacy of one's own search engine, I believe that, other than the five specified in the standard, named entities such as é are not valid in XML and should be avoided for data interchange.
>
> Michael Fox
>
> -----Original Message-----
> From: Encoded Archival Description List [mailto:[log in to unmask]]On Behalf Of
> John Harrison
> Sent: Wednesday, June 13, 2007 5:35 AM
> To: [log in to unmask]
> Subject: Re: special characters in EAD (+ federated search)
>
>
> I've been grappling with the problem of these accented characters, and
> the best way to deal with them for a few years now, particularly
> allowing them to be searched and displayed in the optimum way. I've come
> to the following conclusions:
>
> 1. Special characters should be encoded within XML, using either the
> named or numeric form - e.g. é or é
>
> 2. The named form as preferable when editing and proof-reading the file.
> If numeric forms are required by your XML parser of choice, these can be
> substituted for the named versions in en mass prior to parsing.
>
> 3. The software used to index the finding aid should provide a means of
> normalising the accented form to the regular character. This allows the
> end user to find the record using the form without entering the accented
> character.
>
> 4. The software should also apply the normalising to query terms entered
> so that advanced user who do enter the accented form still find matches.
>
>
> The above approach - which is currently being applied to support
> federated search here in the UK (http://www.archivehub.ac.uk) - means
> that matches can be found by searching with the unaccented form, while
> the accented characters remain in the original file. So the file appears
> as intended when the retrieved and transformed (by XSLT) for display in
> the browser.
>
> Hope this helps. Please feel free to contact me for further detail about
> how we're implementing federated searching.
>
>
--
Michael Doylen, Ph.D.
Head, Archives Department
(414) 229-6980 (office)
(414) 229-5402 (department line)
(414) 229-3605 (fax)
University of Wisconsin--Milwaukee
UWM Libraries
Archives Department
P.O. Box 604
Milwaukee, WI 53201-0604
____________________________________________________________________________________
Park yourself in front of a world of choices in alternative vehicles. Visit the Yahoo! Auto Green Center.
http://autos.yahoo.com/green_center/
|