On Monday, January 7, 2013, Andrew Cunningham wrote:
And it got me to thinking, a point quoted in the article was that each of the implementations is doing different transformations on the MARC records

The second point is that RDF/XML, N-triples and JSON formats are supported.

One markup format, one plain text format, one javascript format.

Which got me to thinking, each of these formats has different requirements; for instance:

* RDF/XML would use markup where N-triples and JSON would use Unicode Formatting Control Characters.

* RDF/XML and N-triples would reference characters outside the Basic Multilingual Plane directly as characters or as six digit hexadecimal numerical entities, while JSON requires to four digit hexadecimal numerical entities representing UTF-16 surrogate pairs.

* RDF/XML can use characters directly or XML/HTML style hexadecimal or decimal numerical character references or named entities (e.g. Ā) while JSON requires javascript nuerical entities ,e.g. \u0100; finally N-triples is more agnostic but has some interesting requirements, e.g. requires support for all Unicode characters and references charmod, and indicates a preference for actual characters over escaped characters, except where required by the encoding.

So different intermediation processing of characters maybe required for each format, as well as logic to handle markup versus Unicode format control characters.

If this makes sense?

It does, but I'm not sure why it matters?  It's all RDF and presumably one would be using RDF parsers to handle the character encodings. 

I mean, we deal with this already with MARC8, UTF-8 (in MARC-21), and MARCXML. It's only really a problem because we use encodings that nobody else in the world uses so we have to come up with our own parsers and serializers (and, in many languages, MARC-8 support is just ignored). 

The RDF community is already dealing with this (plus other serializations), so I don't really see how this is an issue. 

Although, admittedly, I may be missing your point here. 


On 8 January 2013 08:55, Andrew Cunningham <[log in to unmask]');" target="_blank">[log in to unmask]> wrote:

I hate to tell you this but numbers aren't language neutral.

But there are bigger internationalisation issues and potentially a lot of things that bibframe will inherit from parent markup standars including language tagging, bidi, encoding requirements, variatiin selectors, its, etc.

On 08/01/2013 8:42 AM, "J. McRee Elrod" <[log in to unmask]');" target="_blank">[log in to unmask]> wrote:
Andres Cunniungham said:

>Although, my primary interest and concern is the internationalisation
>architecture that will underly Bibframe.

Moving from language neutral numbers to English based html markup is
hardly a move towward internationalisation.

   __       __   J. McRee (Mac) Elrod ([log in to unmask]');" target="_blank">[log in to unmask])
  {__  |   /     Special Libraries Cataloguing   HTTP://www.slc.bc.ca/
  ___} |__ \__________________________________________________________

Andrew Cunningham
Project Manager, Research and Development
Social and Digital Inclusion Team
Public Libraries and Community Engagement
State Library of Victoria
328 Swanston Street
Melbourne VIC 3000

Ph: +61-3-8664-7430
Mobile: 0459 806 589
Email: [log in to unmask]');" target="_blank">[log in to unmask]