Print

Print


I strongly agree with Peter's arguments. This is the direction I was 
intending; Peter states the case much more clearly and comprehensively.

-Joan




Peter Constable <[log in to unmask]> 
Sent by: ISO 639 Joint Advisory Committee <[log in to unmask]>
2007-12-13 10:48 AM
Please respond to
ISO 639 Joint Advisory Committee <[log in to unmask]>


To
[log in to unmask]
cc

Subject
Re: decision required: "other" collections






I’d like to return to Rebecca’s comment,
 
> There is a problem with understanding the scope of the language if we
> remove "(Other)" and name all these with "Languages". The distinction
> of
> course is that when we use (Other) it means that some of the languages
> within the group have their own identifiers, while others go in this
> bucket. It alerts the user to make sure that the language in question
> is
> not separately defined by its own identifier.
 
There is a problem today in 639-2 with understanding the scope of 
“(Other)” collections, and removing “(Other)” doesn’t change that. So, 
with or without coming up with informative mapping tables, I don’t think 
removing “(Other)” is a problem for 639-2; in fact, it removes a problem 
for 639-2 that I’ve mentioned before. Let me elaborate on these points.
 
Joan mentioned wen “Sorbian languages” as an example: how does one know 
what the scope is? Well, if one does a bit of research, they probably 
discover there are two languages, Upper and Lower Sorbian, and then they 
discover that these are coded in 639-2 as hsb and dsb. They also note the 
text in clause 4.1.1 of 639-3, 
 
“A collective language code is not intended to be used when an individual 
language code or another more specific collective language code is 
available.”
 
So, they determine whether their document is Upper Sorbian, Lower Sorbian, 
or some other previously unrecognized Sorbian; if its one of the first 
two, they tag with hsb or dsb accordingly, and only if it’s the latter do 
they use wen.
 
Now consider the same question applied to bat “Baltic (Other)”: how does 
one know what the scope is? The same process gets used: one has to do a 
bit of research, which reveals that there are three languages: Latvian, 
Lithuanian and Prussian. They also discover that the first two are coded 
in 639-2, lav and lit respectively. Again, following the guidance in 
4.1.1, they tag their document lav if it’s Latvian, lit if it’s 
Lithuanian, and bat if it’s Prussian (or some other 
previously-unrecognized Baltic language).
 
For any collection, the process is the same. Having “(Other)” in the name 
really doesn’t provide much benefit. Rebecca suggested that it’s a clue 
that some languages in the group have their own identifier, but relying on 
the “languages” vs. “(Other)” distinction is unreliable and misleading. 
Consider these:
 
·         afa “Afro-Asiatic languages”: at least 3 languages in this group 
are coded in 639-2. Three other collections (ber, cus, sem) also fall in 
this group.
·         alg “Algonquian languages”: at least 7 languages in this group 
are coded in 639-2.
·         ath “Athabascan languages”: at least 8 languages in this group 
are coded in 639-2, and apa “Apache languages” also falls in this group.
·         iro “Iroquoian languages”: at least 2 languages in this group 
are coded in 639-2.
·         mun “Munda languages”: at least 1 language in this group is 
coded in 639-2.
·         nai “North American Indian languages”: at least 8 languages in 
this group are coded in 639-2. Two collections (alg, ath) also fall in 
this group.
·         sio “Siouan languages”: at least 2 languages in this group are 
coded in 639-2.
·         wen “Sorbian languages”: all languages in this group are coded 
in 639-2.
 
Gary Simons and I pointed out these cases over five years ago. (See 
http://www.ethnologue.com/14/iso639/analysis.asp and the accompanying 
paper we provided to the JAC, “An Analysis of ISO 639”, 
http://www.sil.org/silewp/2002/SILEWP2002-004.pdf.) So, the potential 
benefit Rebecca suggested of having some collections marked “(Other)” 
really doesn’t exist; in fact, users are misled if they make the necessary 
assumption. 
 
Also, if the assumption is made that groups *exclude* some languages, then 
groups become unstable: any time a language in that group gets coded in 
639-2, it’s no longer in that group, so the scope of the code narrows, and 
any documents tagged with that group ID are now erroneously tagged. Gary 
Simons and I pointed this out seven years ago. Since we did the analysis 
of collection in 2002, there have been at least 37 languages newly coded 
in 639-2. Every single one of those additions caused a change of one or 
(because some collections contain collections) possibly more collections, 
and any existing records tagged with a collection ID became erroneously 
tagged. If we stick with this assumption of exclusion, then every time we 
code a language in 639-2 we are breaking some unknown quantity of users’ 
data. That’s a serious stability problem that exists only because we have 
“(Other)” collections and an assumption of exclusion. If we were to drop 
that assumption and remove “(Other)” from names, then we would be making a 
significant improvement to 639-2, even without mapping tables. 
 
Note that we don’t have to drop the guidance in 4.1.1 of 639-2. (In fact, 
if you think about it, 4.1.1 only makes sense if we *don’t* have an 
assumption of exclusion: to use the collection rather than the individual 
language wouldn’t even be an option because the latter is excluded.)
 
Thus, whether or not we introduce informative collection mapping tables, 
it makes sense to remove “(Other)” from collection names. I recommend that 
we make that change now.
 
 
Peter
 
 
From: ISO 639 Joint Advisory Committee [mailto:[log in to unmask]] On Behalf 
Of Joan Spanne
Sent: Monday, December 10, 2007 9:53 AM
To: [log in to unmask]
Subject: Re: decision required: "other" collections
 

The issue has existed since prior to Peter's analyses of collections when 
working on 639-3. For instance, there are code elements for Upper Sorbian 
[hsb] and Lower Sorbian [dsb] (since 2003-09-01), even while Sorbian 
Languages [wen] also existed. It would appear that Upper Sorbian and Lower 
Sorbian are the only recognized constituent individual languages for 
Sorbian Languages, so "other" would not apply, but in any case, there is 
no reference (implicit or explicit) to instruct an inquirer to look up the 
individual Sorbian languages. 

I have been working on mapping all individual languages in 639-3 that are 
not also in Part 2 onto the collection code elements of Part 2 (unless 
already mapped to a macrolanguage in Part 2). (That is how I knew of the 
Sorbian languages case.) This is essentially an update of work that Peter 
and Gary Simons did a few years ago in preparing the first code tables 
drafts for 639-3. I can expand my mapping exercise to include individual 
languages of 639-2, mapping to the most appropriate collective code 
element for each. I have a number of motivations for doing this, but it 
basically fits in with Peter's "third option." 

Whether an amendment is required is perhaps a part of the larger set of 
questions regarding informative aspects of each standard when we get to 
really dealing with the whole set as a database. 

-Joan 


Peter Constable <[log in to unmask]> 
Sent by: ISO 639 Joint Advisory Committee <[log in to unmask]> 
2007-12-07 10:35 PM 


Please respond to
ISO 639 Joint Advisory Committee <[log in to unmask]>



To
[log in to unmask] 
cc

Subject
Re: decision required: "other" collections
 








One option is that we *don't* provide such information and assume that an 
application will supply it on its own as needed.

Another option is that ISO 639-5 include informative mapping tables 
listing for each collection all of the entries it encompasses.

A third option is that informative mapping information can be provided in 
the opposite direction: each entry in 639-1/-2/-3/-5 would include an 
informative property listing one or more IDs for collections that include 
that given item.

I think the second or third could potentially be done as a maintenance 
exercise by the RAs or the JAC, though I also wouldn't assert there 
wouldn't be grounds for someone to say these required an amendment. IMO, 
the text in either 3.3 or 4.1.1 of 639-2 does not include anything that 
would prevent us from making name changes of this nature. On the other 
hand, adding informative mapping data would represent a significant 
technical change in the content of any of the standards that might warrant 
the amendment process.


Peter



> -----Original Message-----
> From: ISO 639 Joint Advisory Committee [mailto:[log in to unmask]] On
> Behalf Of Rebecca S. Guenther
> Sent: Friday, December 07, 2007 2:10 PM
> To: [log in to unmask]
> Subject: Re: decision required: "other" collections
>
> Peter:
>
> There is a problem with understanding the scope of the language if we
> remove "(Other)" and name all these with "Languages". The distinction
> of
> course is that when we use (Other) it means that some of the languages
> within the group have their own identifiers, while others go in this
> bucket. It alerts the user to make sure that the language in question
> is
> not separately defined by its own identifier. So if we don't make that
> distinction it will be hard for the user to know whether to look
> further.
> Perhaps this is an issue of documentation, when you suggest that there
> would be application decisions made for a subset. Currently we don't
> really have a mechanism to make these sorts of statements. Do you have
> a
> suggestion so that we don't totally lose this information? How could we
> document in the ISO 639-2 code lists?
>
> I'm not really concerned about MARC, because we have always said we
> don't
> have to use the same language names, only that the codes themselves
> represent the same entities. But some in the bibliographic world (and
> beyond) use the documentation on the ISO 639-2 site alone and somehow
> they
> will need to understand the scope of the language.
>
> Rebecca
>
> On Thu, 6 Dec 2007, Peter Constable wrote:
>
> > Ping?
> >
> > It's been over a week; I'd like to see us move toward closure on this
> > issue, please.
> >
> >
> > Peter
> >
> > From: ISO 639 Joint Advisory Committee [mailto:[log in to unmask]] On
> Behalf Of Peter Constable
> > Sent: Wednesday, November 28, 2007 3:45 PM
> > To: [log in to unmask]
> > Subject: decision required: "other" collections
> >
> > I want to revive this discussion so that hopefully we can bring
> > closure on it. I introduced two issues at the same time last April,
> > "other" collections, and "mis". The latter got people's attention,
> and
> > the former never got resolved. (The mis issue was resolved, so the
> > passing mention of it below can be ignored.)
> >
> > Millicent replied that removing "Other" may be a problem for those
> > using ISO 639-2 but not ISO 639-3. I responded to that suggesting
> that
> > this can be considered an application decision. Havard further
> > responded mentione 639-5 in the context of the entire 639 family
> > suggesting that 639-2 may be one of many possible subsets in which
> the
> > meaning of "other" would differ - the implication being that each
> > subset needs to define the intension or extension of collections
> > considered to be "other" collections in relation to the given subset.
> > (Havard's message, which includes what Millicent and I wrote, is
> > attached.)
> >
> > I note that the code table in ISO 639-5 FDIS does not include
> > "(Other)" in any entries, including the entries for all of the
> "other"
> > collections currently in 639-2.
> >
> > My proposal to remove "other" as described below stands.
> >
> >
> > Peter
> >
> > From: ISO 639 Joint Advisory Committee [mailto:[log in to unmask]] On
> Behalf Of Peter Constable
> > Sent: Thursday, April 19, 2007 1:28 PM
> > To: [log in to unmask]
> > Subject: decisions required: "other" collections, mis
> >
> > One of the issues I had identified was that the exclusive "other"
> > collections no longer make sense in a general application of ISO 639
> > since now every known language has its own identifier. It was not an
> > issue that absolutely needed to be addressed before part 3 was
> > published, but part 3 is now published, and users of the standards
> are
> > encountering this issue. Specifically, the group that works on IETF
> > language tags is currently revising that spec to incorporate part 3
> > and would like to see all the collections handled consistently in a
> > way that allows their application to treat them all as inclusive.
> >
> > So, I propose that "other" be removed from all collection names
> > (except perhaps mis - I'll discuss that in another thread). I
> > understand that some applications, such as MARC, would still want to
> > treat some collections as exclusive. I don't see this change as
> > contradicting that: we simply need to clarify that, in a particular
> > application that does not use all of the identifiers in the combined
> > parts of ISO 639, particular collections may be used in an exclusive
> > manner, at the discretion of the particular application.
> >
> > Proposed change: make all collections to be of one type with one
> > pattern for naming.
> >
> > Action if accepted:
> >
> > * ISO 639-2 tables and the draft table for ISO 639-5: all names of
> the
> > form "Foo (Other)" changed to "Foo languages". A note added in
> > appropriate places explaining that applications may use collections
> in
> > an exclusive manner according to the needs of the particular
> > application. (Corresponding changes should get made in a revision to
> > the text of ISO 639-2.)
> >
> > * ISO 639-3: A note added in description of collection scope
> > explaining that applications may use collections in an exclusive
> > manner according to the needs of the particular application.
> >
> >
> >
> > Peter
> >