On Wed, 2007-06-13 at 17:04 -0500, Fox, Michael wrote:
> While one can do whatever one wishes in the privacy of one's own search engine, I believe that, other than the five specified in the standard, named entities such as é are not valid in XML and should be avoided for data interchange.
...which is why in point 2, I suggested replacing the named version with
the numeric version after editing, before doing anything that requires
valid XML (parsing, exposing, exchanging etc.)
John
> -----Original Message-----
> From: Encoded Archival Description List [mailto:[log in to unmask]]On Behalf Of
> John Harrison
> Sent: Wednesday, June 13, 2007 5:35 AM
> To: [log in to unmask]
> Subject: Re: special characters in EAD (+ federated search)
>
>
> I've been grappling with the problem of these accented characters, and
> the best way to deal with them for a few years now, particularly
> allowing them to be searched and displayed in the optimum way. I've come
> to the following conclusions:
>
> 1. Special characters should be encoded within XML, using either the
> named or numeric form - e.g. é or é
>
> 2. The named form as preferable when editing and proof-reading the file.
> If numeric forms are required by your XML parser of choice, these can be
> substituted for the named versions in en mass prior to parsing.
>
> 3. The software used to index the finding aid should provide a means of
> normalising the accented form to the regular character. This allows the
> end user to find the record using the form without entering the accented
> character.
>
> 4. The software should also apply the normalising to query terms entered
> so that advanced user who do enter the accented form still find matches.
>
>
> The above approach - which is currently being applied to support
> federated search here in the UK (http://www.archivehub.ac.uk) - means
> that matches can be found by searching with the unaccented form, while
> the accented characters remain in the original file. So the file appears
> as intended when the retrieved and transformed (by XSLT) for display in
> the browser.
>
> Hope this helps. Please feel free to contact me for further detail about
> how we're implementing federated searching.
>
--
John Harrison <[log in to unmask]>
University of Liverpool Library
|