Print

Print


Hi,

In fact, XML's default character set is UTF-8. However if you don't have
the character set available you  can set the encoding to ASCII, or
IS0-8859-1 which is what NoteTab Pro is doing.

I believe that Notetab Pro (at least older versions) does not handle
Unicode - so it can not output in Unicode but rather uses latin-1.
Perhaps another editor that is fully Unicode compliant is in order.
There are tons of them now that *are* unicode compliant.

If you are entering the data into your ACCESS database in UTF-8 but you
are using no characters other than those found in ascii and latin-1
(ISO-8859-1) then the data that your are producing requires no
transformation (the ASCII and Latin1 characters set are a subset of the
first 256 characters in UTF-8) and the techies can output the data in an
XML file that uses UTF-8. Because you are using no other characters then
you should be able to edit in an editor that can handle latin-1.
(Although it the file is formatted absolutely correctly it might have a
header at the beginning of the file that indicates that it is a unicode
file and your editor might choke on it)

However, a problem will occur if you are using characters other than
those available in ASCII and latin-1 and you want to use an editor that
is not fully unicode compliant. Your techies can output UTF-8 but your
editor will choke.

In sum, it is your tool, not XML, that is the problem.

If you don't know what the ISO-8859-1 character set is there is a listing at

http://www.htmlhelp.com/reference/charset/

Liz Shaw



Susan Hamburger wrote:
> Our tech people are mapping an Access database to output both EAD and MARC
> as XML documents. Currently, the SGML conversion to XML in NoteTabPro that
> I use generates this string
>
> <?xml version="1.0" encoding="ISO-8859-1"?>
>
> The techies want to know if the encoding can be changed from ISO-8859-1 to
> UTF-8 to support Unicode. My notes from the Publishing EAD Finding Aids
> course indicate that XMetaL (which I use to create my SGML documents)
> stores the document as UTF-8 and it needs to be changed to ISO-8859-1.  Is
> this only for the ASCII editor or does XML not support Unicode? My final
> output HTML document has it converted back to UTF-8. Must ISO-8859-1 be in
> the XML document so it can be converted to HTML and PDF? Or is there some
> other reason why the encoding is in ISO and not Unicode?
>
> Thanks for any help and advice.
>
> Sue
>
>
> Susan Hamburger, Ph.D.
> Manuscripts Cataloging Librarian
> Cataloging Services
> 126 Paterno Library
> The Pennsylvania State University
> University Park, PA 16802
>
> 814/865-1755
> FAX 814/863-7293