I have a question to pose to this group. At Berkeley, members of
encoding teams at the Library and Art Museum have been chatting
recently about how to best encode extended ASCII characters in EAD
finding aids. It appears there are two main options, and I'd be very
interested to hear opinions from you all, or projects in which you
are tackling this problem. The options appear to be:
1) Numerical character entities (i.e. ÿ converts to a y with
umlaut). The advantage of this is that it is pretty safe with many
old and existing systems (such as search and delivery systems) and is
legible for editing the EAD in the widest variety of old and existing
software packages (word processors, etc). So, this would seem the
safest option if you need your EAD finding aids to be truly portable
- between different editing packages and different web portal systems.
2) Unicode. Smarter search engine/delivery systems can convert the
Unicode character to the closest HTML character on the fly when
delivering to the web. A disadvantage may be that it is less portable
between applications which can properly display the character, but an
advantage may be that it helps searching because when you search on y
with umlaut it finds them because it stores them as y with umlaut in
the system, whereas if you use numerical entities, the search system
may store them that way, as plain text strings, so when you search on
y with umlaut you don't find it because it is stored literally as
ÿ (I know of at least one older SGML web delivery system that
had this failing).
Any other thoughts on either of these as preferrable options given of
course that we want to be as portable/compatible with existing
editing/delivery systems and portable/compatible with future systems?
Any ideas are welcome; thanks!
Digital Media Director
Berkeley Art Museum/Pacific Film Archive
@ University of California
& Board of Directors
Museum Computer Network