I've done some digging.
It appears that although #27 is a valid *Unicode* character (so you
initially comment wasn't quite correct) it is not a valid XML 1.0
character (see http://www.w3.org/TR/2004/REC-xml-20040204/#charsets).
It is (or rather will be), however, a valid XML 1.1 character
(http://www.w3c.org/TR/2004/REC-xml11-20040204/#charsets), so we aren't
the first people to hit this.
Matthew
> -----Original Message-----
> From: Z39.50 Next-Generation Initiative [mailto:[log in to unmask]]
> On Behalf Of Adam Dickmeiss
> Sent: 28 June 2004 16:49
> To: [log in to unmask]
> Subject: Re: Unserializable scan response
>
> Matthew J. Dovey wrote:
>
> >Are you sure - they seem to be listed in the Unicode
> standard "Controls
> >and Basic Latin" set. (http://www.unicode.org/charts/PDF/U0000.pdf)
> >
> >
> >
> OK. libxml2 fails this.
> <x>
> 
> </x>
>
> What does other people's XML parsers do? If they work, I'll
> file a bug report to xmlsoft.
>
> -- Adam
>
> >Matthew
> >
> >
> >
> >>-----Original Message-----
> >>From: Z39.50 Next-Generation Initiative
> [mailto:[log in to unmask]] On Behalf
> >>Of Adam Dickmeiss
> >>Sent: 28 June 2004 15:53
> >>To: [log in to unmask]
> >>Subject: Re: Unserializable scan response
> >>
> >>LeVan,Ralph wrote:
> >>
> >>
> >>
> >>>How can there be a character that you can't at least
> >>>
> >>>
> >>serialize as a hex
> >>
> >>
> >>>code?
> >>>
> >>>
> >>>
> >>>
> >>All hex codes (when given as &#..) must be part of UNICODE charset.
> >>And ESC and some others in range 0x01-0x1f aren't valid.
> >>
> >>-- Adam
> >>
> >>
> >>
> >>>Ralph
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>>-----Original Message-----
> >>>>From: Robert Sanderson [mailto:[log in to unmask]]
> >>>>Sent: Monday, June 28, 2004 10:02 AM
> >>>>To: [log in to unmask]
> >>>>Subject: Unserializable scan response
> >>>>
> >>>>I've got a gateway to a Z39.50 server which returns
> >>>>
> >>>>
> >>characters which I
> >>
> >>
> >>>>can't serialize to XML and/or successfully deserialize,
> for example
> >>>>the ascii escape character. (Why this is in the index I
> >>>>
> >>>>
> >>have no idea,
> >>
> >>
> >>>>as it makes the term impossible to search with, but that's not my
> >>>>problem or my
> >>>>fault)
> >>>>
> >>>>As we don't have term surrogate diagnostics, what do I do?
> >>>>
> >>>>* Simply omit the term?
> >>>> This seems wrong, but I'm tending towards it as the best
> >>>>
> >>>>
> >>of a bad lot.
> >>
> >>
> >>>>* Return a term with a null value?
> >>>> Wrong, as a search for the null value may or may not
> >>>>
> >>>>
> >>produce results.
> >>
> >>
> >>>>* Strip out the unserializable characters and return the
> >>>>
> >>>>
> >>resulting term?
> >>
> >>
> >>>> Seems wrong as the numberOfRecords will probably be
> >>>>
> >>>>
> >>wrong, barring good
> >>
> >>
> >>>> fortune.
> >>>>
> >>>>* Other?
> >>>>
> >>>>Rob
> >>>>
> >>>>
> >>>> ,'/:. Dr Robert Sanderson ([log in to unmask])
> >>>> ,'-/::::. http://www.o-r-g.org/~azaroth/
> >>>> ,'--/::(@)::. Special Collections and Archives,
> >>>>
> >>>>
> >>extension 3142
> >>
> >>
> >>>>,'---/::::::::::. University of Liverpool
> >>>>____/:::::::::::::.
> >>>>I L L U M I N A T I L5R Shop: http://www.cardsnotwords.com/
> >>>>
> >>>>
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >
> >
> >
> >
>
|