Print

Print


I think we'll have to agree to disagree then, since I think the issues lie
with the conversion between MARC and Bibframe and specifically information
that doesn't exist in MARC, while you believe its a RDF or HTML issue with
no need for it to enter the Bibframe realm.

A.


On 8 January 2013 15:01, Ross Singer <[log in to unmask]> wrote:

> N-Triples is plain-text serialization of RDF, nothing more. This is RDF's
> problem.
>
> JSON's RFC (4627) says "JSON text SHALL be encoded in Unicode. The default
> encoding is UTF-8."
>
> Again, why wouldn't you be using a JSON parser/serializer to handle this?
> And if it's RDF/JSON (or JSON-LD), again, it's RDF's problem.
>
> "RDF in HTML guise" (I assume this is referring to RDFa?) would defer to
> the charset declaration of the page, as would any other HTML-based
> serialization.
>
> HTML is certainly likely to be the most error-prone (since it's also the
> most democratic), but it, again, is an "HTML problem". If the character
> encoding matches the declared charset and it's valid HTML... what else can
> you do?
>
> -Ross.
>
> On Monday, January 7, 2013, Andrew Cunningham wrote:
>
>> if RDF is going to be used exclusively, but if you have N-triples and
>> JSON in the mix as container formats ... different issue.
>>
>> And no, they aren't RDF's problems, RDF in XML format inherits a lot of
>> features from XML and can leverage off ITS etc. Likewise RDF in HTML5 guise
>> can leverage off internationalisation features in HTML5 or HTMLNext (Living
>> HTML or whatever its called this week). The issues are more related to
>> bibframe and the conversion process from MARC formats to Bibframe
>> regardless of the container.
>>
>> For instance RDF and XML would use one system for language tagging a
>> record, and MARC and possibly. Bibframe use a different system for tagging
>> the language of the item/object being described.
>>
>> Two different functions and two different language tag schemes ... BCP47
>> vs ISO-639-2 (B)
>>
>> And in theory the record might consist of multiple "language" tags since
>> in script and romanisation would be different language tags in the BCP-47
>> sense.
>>
>> This level of complexity would become an issue when records are being
>> transformed into XML or HTML5 formats to be used by user agents, since
>> accessibility requirements will kick in in various jurisdictions.
>>
>> It will also impact on font rendering of content, in IE10 and latest
>> versions of Firefox content marked up with a language tag of "tr" will kick
>> in The Turkish language system in the font if present in the fonts OT
>> tables, etc.
>>
>> At least that's my high level tack on it.
>>
>> In MARC-8 and MARC-21 you didn't have to concern yourselves with this,
>> they essentially lived in isolation. In theory internationalisation was
>> based on a 40+ year old model.
>>
>> But RDF in XML, RDF in HTML5, N-triple and JSON each bring their own
>> requirements to Bibframe;
>>
>> As do the programming languages used;
>>
>> Accessibility requirements;
>>
>> etc.
>>
>> Bibframe is movement into a model where there are many inter-dependencies
>> and external requirements on the model. Going from an isolated industry
>> standard to leveraging off international standrads
>>
>> Andrew
>>
>>
>>
>>
>>
>>
>> On 8 January 2013 13:36, Ross Singer <[log in to unmask]> wrote:
>>
>> Why are they issues, though?  They're RDF's problems, not Bibframe's.
>> Isn't that part of the point of using existing standards?
>>
>> -Ross.
>>
>>
>> On Monday, January 7, 2013, Andrew Cunningham wrote:
>>
>> Although those legacy encodings specific to the library industry would
>> not exist in Bibframe
>>
>> Ultimately the issues are more related to how the parsed content is going
>> to be consumed or going to be used. If it is to be human editable or
>> presented to user agents then more complex processing that inserts markup
>> or formatting control characters that are not present in the MARC records
>> would sometimes be required.
>>
>> A lot of this is just tip of the iceberg ... esp if transforming to
>> HTML5, language tagging in the record versus language tagging of the
>> record, and a range of other issues.
>>
>>
>> Andrew
>>
>>
>> On 8 January 2013 12:36, Ross Singer <[log in to unmask]> wrote:
>>
>> On Monday, January 7, 2013, Andrew Cunningham wrote:
>>
>> Just reading through Roy Tennant's article at
>> http://www.thedigitalshift.com/2013/01/roy-tennant-digital-libraries/library-of-congress-bibframe-initiative-part-2/
>>
>> And it got me to thinking, a point quoted in the article was that each of
>> the implementations is doing different transformations on the MARC records
>>
>> The second point is that RDF/XML, N-triples and JSON formats are
>> supported.
>>
>> One markup format, one plain text format, one javascript format.
>>
>> Which got me to thinking, each of these formats has different
>> requirements; for instance:
>>
>> * RDF/XML would use markup where N-triples and JSON would use Unicode
>> Formatting Control Characters.
>>
>> * RDF/XML and N-triples would reference characters outside the Basic
>> Multilingual Plane directly as characters or as six digit hexadecimal
>> numerical entities, while JSON requires to four digit hexadecimal numerical
>> entities representing UTF-16 surrogate pairs.
>>
>> * RDF/XML can use characters directly or XML/HTML style hexadecimal or
>> decimal numerical character references or named entities (e.g. &#x0100;)while JSON requires javascript nuerical entities ,e.g. \u0100; finally
>> N-triples is more agnostic but has some interesting requirements, e.g.
>> requires support for all Unicode characters and references charmod, and
>> indicates a preference for actual characters over escaped characters,
>> except where required by the encoding.
>>
>> So different intermediation processing of characters maybe required for
>> each format, as well as logic to handle markup versus Unicode format
>> control characters.
>>
>> If this makes sense?
>>
>>
>> It does, but I'm not sure why it matters?  It's all RDF and presumably
>> one would be using RDF parsers to handle the character encodings.
>>
>>  I mean, we deal with this already with MARC8, UTF-8 (in MARC-21), and
>> MARCXML. It's only really a problem because we use encodings that nobody
>> else in the world uses so we have to come up with our own parsers and
>> serializers (and, in many languages, MARC-8 support is just ignored).
>>
>> The RDF community is already dealing with this (plus other
>> serializations), so I don't really see how this is an issue.
>>
>> Although, admittedly, I may be missing your point here.
>>
>> -Ross.
>>
>>
>>
>>
>> On 8 January 2013 08:55, Andrew Cunningham <[log in to unmask]>wrote:
>>
>> I hate to tell you this but numbers aren't language neutral.
>>
>> But there are bigger internationalisation issues and poten
>>
>> State Library of Victoria
>>
>> 328 Swanston Street
>> Melbourne VIC 3000
>> Australia
>>
>> Ph: +61-3-8664-7430
>> Mobile: 0459 806 589
>> Email: [log in to unmask]
>>
>> http://www.openroad.net.au/
>> http://www.mylanguage.gov.au/
>> http://www.slv.vic.gov.au/
>>
>


-- 
Andrew Cunningham
Project Manager, Research and Development
Social and Digital Inclusion Team
Public Libraries and Community Engagement
State Library of Victoria
328 Swanston Street
Melbourne VIC 3000
Australia

Ph: +61-3-8664-7430
Mobile: 0459 806 589
Email: [log in to unmask]

http://www.openroad.net.au/
http://www.mylanguage.gov.au/
http://www.slv.vic.gov.au/