> From: Geoff Mottram [mailto:[log in to unmask]]
> Subject: Re: [MODS] Language lookup : working example
>
> I totally support Yves' point of view. The most important design
> consideration that can be made with MODS is to adhere to standards that
have
> been embraced by the world outside of the bibliographic community. In
order
> for the library community to enjoy the benefits of the XML products and
> expertise being created in the general computing world, some
library-centric
> practices may have to be abandoned. Users will no more be creating raw XML
> records than they do raw MARC records. Whatever user interface is used to
> create MODS records by a particular institution can convert between
language
> codes as necessary but any underlying XML record should adhere to larger
> industry practices.
>
> From: "Yves Pratter" <[log in to unmask]>
> Subject: Re: [MODS] Language lookup : working example
>
> I would explain my reasoning about using RFC3066 rather than ISO639-2/B.
> I would say that it could be wrong, it is just my tought and i would share
> it with you in order to go beyond the technical point of view of the
> engineer (vs librarian).
I also agree with Yves' and Geoff's. Yves' makes some good points and
articulated them better than what I previous said. However, I think we
all need to make sure that we are talking about the same things. My feeling
is that all MODS elements that have a content model of #PCDATA should be
repeatable and allow xml:lang as an attribute. When we talk about
"language"
in this fashion what we are really talking about is "audience".
We need to distinguish the two, and distinguish between the "metadata" and
the "object". Let's take a book as an example. The book, e.g. "object",
may have been written in French however, the content of the "metadata"
elements may have been specified in English. You need to reflect in the
"metadata" that the "object" was written in French and you also need to
reflect in the "metadata" that the "metadata" was in English.
When we talk about the "metadata" being in English, what we are really
talking about is the intended "audience" for the "metadata". The "audience"
should be described by the xml:lang attribute on each MODS element that has
a content model of #PCDATA or xsd:string. Making MODS elements that have
that content model, repeatable, allows a single metadata record to be
targeted at multiple "audiences". Thus, systems displaying information
from the metadata record can target the appropriate "audience".
The other aspect of language is the language of the "object". This could
be either a separate element or an attribute on something. The schema
could use RFC3066, ISO639-2B, or ISO639-2T. However, Yves' makes a good
point, in that RFC3066 is more flexible with it's extension mechanism,
e.g. x-, for languages that we have not yet considered to be "standard".
For example, there probably is a Star Trek society for Klingons, whom
write articles in Klingon. The last time I looked there was no "standard"
ISO639-2B or ISO639-2T code for Klingon. So how does one express the
language of the "object" for that book or article? This is where
ISO639-2/BT fails the Library and Metadata communities. It would be best
that even for the description of the language of the "object" that RFC3066
is used instead of ISO639-2/BT.
The issue of RFC3066 vs. ISO639-2/BT for the language of the "object"
becomes a moot point when your schema defines the language of the "object"
to be a URN rather than a hard coded, code list within the schema. Then the
Metadata community, who is using the MODS schema, can decide for themselves
which is more relevant. The Library community can use "urn:ISO639-2B:eng"
and other Metadata communities can use "urn:RFC3066:en". When the Library
community needs to describe that Klingon article, they can just simple use
"urn:RFC3066:x-klingon". If you want to restrict the URN types for the
language of the "object" you can simple put a pattern restrictor on the
element or attribute in the schema, to be something like:
^urn:(RFC3066|ISO639-2B|ISO639-2T):[A-Za-z0-9-]{2,}$
This solves the problem with different Metadata communities using different
standards and also allows me to redefine the schema pattern restrictor if
my Metadata community is using yet another standard. Of course I would go
to IANA or whomever and register those URN's, as well. Don't know why they
have never done that themselves...
Andy.
|