For the North Carolina EAD (NCEAD) project guidelines for XML EAD, the group
decided on Unicode hexadecimal references embedded directly in the document
instance after much consideration.  It was determined that the references
worked well with the existing display systems (Dynaweb) as well as XML
parsers and MS Internet Explorer 5.  From reviewing the XML literature and
talking with others it was apparent that Unicode hexadecimal would be the
preferred means of entity inclusion for XML in general.  It is also
specifically recommended within the EAD DTD:

<!--     NOTE 1: eadchars.ent should only be invoked for SGML   -->
<!--     applications. For XML, use Unicode &#xN; where "N"     -->
<!--     is the Unicode Hexadecimal value assigned by the       -->
<!--     standard.                                              -->

I recall that I have experienced problems with numerical character entities
displaying incorrectly on different browser versions, and especially on
different platforms.

David Ruddy has prepared an excellent overview of character entities in XML
EAD for the EAD help pages:

The NCEAD guidelines I mentioned are available at:

Hope this helps out,

Stephen Miller
Director, Digital Library of Georgia
Main Library 4th Floor
University of Georgia
Athens, GA 30602
Ph: 706.542.3003
Email:  [log in to unmask]

----- Original Message -----
From: "Richard Rinehart" <[log in to unmask]>
To: <[log in to unmask]>
Sent: Thursday, October 26, 2000 3:50 PM
Subject: EAD and character entities - which option to use?

> Hello all,
> I have a question to pose to this group. At Berkeley, members of
> encoding teams at the Library and Art Museum have been chatting
> recently about how to best encode extended ASCII characters in EAD
> finding aids. It appears there are two main options, and I'd be very
> interested to hear opinions from you all, or projects in which you
> are tackling this problem. The options appear to be:
> 1) Numerical character entities (i.e. &#255; converts to a y with
> umlaut). The advantage of this is that it is pretty safe with many
> old and existing systems (such as search and delivery systems) and is
> legible for editing the EAD in the widest variety of old and existing
> software packages (word processors, etc). So, this would seem the
> safest option if you need your EAD finding aids to be truly portable
> - between different editing packages and different web portal systems.
> 2) Unicode. Smarter search engine/delivery systems can convert the
> Unicode character to the closest HTML character on the fly when
> delivering to the web. A disadvantage may be that it is less portable
> between applications which can properly display the character, but an
> advantage may be that it helps searching because when you search on y
> with umlaut it finds them because it stores them as y with umlaut in
> the system, whereas if you use numerical entities, the search system
> may store them that way, as plain text strings, so when you search on
> y with umlaut you don't find it because it is stored literally as
> &#255; (I know of at least one older SGML web delivery system that
> had this failing).
> Any other thoughts on either of these as preferrable options given of
> course that we want to be as portable/compatible with existing
> editing/delivery systems and portable/compatible with future systems?
> Any ideas are welcome; thanks!
> --
> Richard Rinehart
> ----------------
> Digital Media Director
> Berkeley Art Museum/Pacific Film Archive
> @ University of California
> ----------------
> & Board of Directors
> Museum Computer Network