Print

Print


I'd like to return to Rebecca's comment,

> There is a problem with understanding the scope of the language if we
> remove "(Other)" and name all these with "Languages". The distinction
> of
> course is that when we use (Other) it means that some of the languages
> within the group have their own identifiers, while others go in this
> bucket. It alerts the user to make sure that the language in question
> is
> not separately defined by its own identifier.

There is a problem today in 639-2 with understanding the scope of "(Other)" collections, and removing "(Other)" doesn't change that. So, with or without coming up with informative mapping tables, I don't think removing "(Other)" is a problem for 639-2; in fact, it removes a problem for 639-2 that I've mentioned before. Let me elaborate on these points.

Joan mentioned wen "Sorbian languages" as an example: how does one know what the scope is? Well, if one does a bit of research, they probably discover there are two languages, Upper and Lower Sorbian, and then they discover that these are coded in 639-2 as hsb and dsb. They also note the text in clause 4.1.1 of 639-3,

"A collective language code is not intended to be used when an individual language code or another more specific collective language code is available."

So, they determine whether their document is Upper Sorbian, Lower Sorbian, or some other previously unrecognized Sorbian; if its one of the first two, they tag with hsb or dsb accordingly, and only if it's the latter do they use wen.

Now consider the same question applied to bat "Baltic (Other)": how does one know what the scope is? The same process gets used: one has to do a bit of research, which reveals that there are three languages: Latvian, Lithuanian and Prussian. They also discover that the first two are coded in 639-2, lav and lit respectively. Again, following the guidance in 4.1.1, they tag their document lav if it's Latvian, lit if it's Lithuanian, and bat if it's Prussian (or some other previously-unrecognized Baltic language).

For any collection, the process is the same. Having "(Other)" in the name really doesn't provide much benefit. Rebecca suggested that it's a clue that some languages in the group have their own identifier, but relying on the "languages" vs. "(Other)" distinction is unreliable and misleading. Consider these:


*         afa "Afro-Asiatic languages": at least 3 languages in this group are coded in 639-2. Three other collections (ber, cus, sem) also fall in this group.

*         alg "Algonquian languages": at least 7 languages in this group are coded in 639-2.

*         ath "Athabascan languages": at least 8 languages in this group are coded in 639-2, and apa "Apache languages" also falls in this group.

*         iro "Iroquoian languages": at least 2 languages in this group are coded in 639-2.

*         mun "Munda languages": at least 1 language in this group is coded in 639-2.

*         nai "North American Indian languages": at least 8 languages in this group are coded in 639-2. Two collections (alg, ath) also fall in this group.

*         sio "Siouan languages": at least 2 languages in this group are coded in 639-2.

*         wen "Sorbian languages": all languages in this group are coded in 639-2.

Gary Simons and I pointed out these cases over five years ago. (See http://www.ethnologue.com/14/iso639/analysis.asp and the accompanying paper we provided to the JAC, "An Analysis of ISO 639", http://www.sil.org/silewp/2002/SILEWP2002-004.pdf.) So, the potential benefit Rebecca suggested of having some collections marked "(Other)" really doesn't exist; in fact, users are misled if they make the necessary assumption.

Also, if the assumption is made that groups *exclude* some languages, then groups become unstable: any time a language in that group gets coded in 639-2, it's no longer in that group, so the scope of the code narrows, and any documents tagged with that group ID are now erroneously tagged. Gary Simons and I pointed this out seven years ago. Since we did the analysis of collection in 2002, there have been at least 37 languages newly coded in 639-2. Every single one of those additions caused a change of one or (because some collections contain collections) possibly more collections, and any existing records tagged with a collection ID became erroneously tagged. If we stick with this assumption of exclusion, then every time we code a language in 639-2 we are breaking some unknown quantity of users' data. That's a serious stability problem that exists only because we have "(Other)" collections and an assumption of exclusion. If we were to drop that assumption and remove "(Other)" from names, then we would be making a significant improvement to 639-2, even without mapping tables.

Note that we don't have to drop the guidance in 4.1.1 of 639-2. (In fact, if you think about it, 4.1.1 only makes sense if we *don't* have an assumption of exclusion: to use the collection rather than the individual language wouldn't even be an option because the latter is excluded.)

Thus, whether or not we introduce informative collection mapping tables, it makes sense to remove "(Other)" from collection names. I recommend that we make that change now.


Peter


From: ISO 639 Joint Advisory Committee [mailto:[log in to unmask]] On Behalf Of Joan Spanne
Sent: Monday, December 10, 2007 9:53 AM
To: [log in to unmask]
Subject: Re: decision required: "other" collections


The issue has existed since prior to Peter's analyses of collections when working on 639-3. For instance, there are code elements for Upper Sorbian [hsb] and Lower Sorbian [dsb] (since 2003-09-01), even while Sorbian Languages [wen] also existed. It would appear that Upper Sorbian and Lower Sorbian are the only recognized constituent individual languages for Sorbian Languages, so "other" would not apply, but in any case, there is no reference (implicit or explicit) to instruct an inquirer to look up the individual Sorbian languages.

I have been working on mapping all individual languages in 639-3 that are not also in Part 2 onto the collection code elements of Part 2 (unless already mapped to a macrolanguage in Part 2). (That is how I knew of the Sorbian languages case.) This is essentially an update of work that Peter and Gary Simons did a few years ago in preparing the first code tables drafts for 639-3. I can expand my mapping exercise to include individual languages of 639-2, mapping to the most appropriate collective code element for each. I have a number of motivations for doing this, but it basically fits in with Peter's "third option."

Whether an amendment is required is perhaps a part of the larger set of questions regarding informative aspects of each standard when we get to really dealing with the whole set as a database.

-Joan

Peter Constable <[log in to unmask]>
Sent by: ISO 639 Joint Advisory Committee <[log in to unmask]>

2007-12-07 10:35 PM
Please respond to
ISO 639 Joint Advisory Committee <[log in to unmask]>


To

[log in to unmask]

cc

Subject

Re: decision required: "other" collections







One option is that we *don't* provide such information and assume that an application will supply it on its own as needed.

Another option is that ISO 639-5 include informative mapping tables listing for each collection all of the entries it encompasses.

A third option is that informative mapping information can be provided in the opposite direction: each entry in 639-1/-2/-3/-5 would include an informative property listing one or more IDs for collections that include that given item.

I think the second or third could potentially be done as a maintenance exercise by the RAs or the JAC, though I also wouldn't assert there wouldn't be grounds for someone to say these required an amendment. IMO, the text in either 3.3 or 4.1.1 of 639-2 does not include anything that would prevent us from making name changes of this nature. On the other hand, adding informative mapping data would represent a significant technical change in the content of any of the standards that might warrant the amendment process.


Peter



> -----Original Message-----
> From: ISO 639 Joint Advisory Committee [mailto:[log in to unmask]] On
> Behalf Of Rebecca S. Guenther
> Sent: Friday, December 07, 2007 2:10 PM
> To: [log in to unmask]
> Subject: Re: decision required: "other" collections
>
> Peter:
>
> There is a problem with understanding the scope of the language if we
> remove "(Other)" and name all these with "Languages". The distinction
> of
> course is that when we use (Other) it means that some of the languages
> within the group have their own identifiers, while others go in this
> bucket. It alerts the user to make sure that the language in question
> is
> not separately defined by its own identifier. So if we don't make that
> distinction it will be hard for the user to know whether to look
> further.
> Perhaps this is an issue of documentation, when you suggest that there
> would be application decisions made for a subset. Currently we don't
> really have a mechanism to make these sorts of statements. Do you have
> a
> suggestion so that we don't totally lose this information? How could we
> document in the ISO 639-2 code lists?
>
> I'm not really concerned about MARC, because we have always said we
> don't
> have to use the same language names, only that the codes themselves
> represent the same entities. But some in the bibliographic world (and
> beyond) use the documentation on the ISO 639-2 site alone and somehow
> they
> will need to understand the scope of the language.
>
> Rebecca
>
> On Thu, 6 Dec 2007, Peter Constable wrote:
>
> > Ping?
> >
> > It's been over a week; I'd like to see us move toward closure on this
> > issue, please.
> >
> >
> > Peter
> >
> > From: ISO 639 Joint Advisory Committee [mailto:[log in to unmask]] On
> Behalf Of Peter Constable
> > Sent: Wednesday, November 28, 2007 3:45 PM
> > To: [log in to unmask]
> > Subject: decision required: "other" collections
> >
> > I want to revive this discussion so that hopefully we can bring
> > closure on it. I introduced two issues at the same time last April,
> > "other" collections, and "mis". The latter got people's attention,
> and
> > the former never got resolved. (The mis issue was resolved, so the
> > passing mention of it below can be ignored.)
> >
> > Millicent replied that removing "Other" may be a problem for those
> > using ISO 639-2 but not ISO 639-3. I responded to that suggesting
> that
> > this can be considered an application decision. Havard further
> > responded mentione 639-5 in the context of the entire 639 family
> > suggesting that 639-2 may be one of many possible subsets in which
> the
> > meaning of "other" would differ - the implication being that each
> > subset needs to define the intension or extension of collections
> > considered to be "other" collections in relation to the given subset.
> > (Havard's message, which includes what Millicent and I wrote, is
> > attached.)
> >
> > I note that the code table in ISO 639-5 FDIS does not include
> > "(Other)" in any entries, including the entries for all of the
> "other"
> > collections currently in 639-2.
> >
> > My proposal to remove "other" as described below stands.
> >
> >
> > Peter
> >
> > From: ISO 639 Joint Advisory Committee [mailto:[log in to unmask]] On
> Behalf Of Peter Constable
> > Sent: Thursday, April 19, 2007 1:28 PM
> > To: [log in to unmask]
> > Subject: decisions required: "other" collections, mis
> >
> > One of the issues I had identified was that the exclusive "other"
> > collections no longer make sense in a general application of ISO 639
> > since now every known language has its own identifier. It was not an
> > issue that absolutely needed to be addressed before part 3 was
> > published, but part 3 is now published, and users of the standards
> are
> > encountering this issue. Specifically, the group that works on IETF
> > language tags is currently revising that spec to incorporate part 3
> > and would like to see all the collections handled consistently in a
> > way that allows their application to treat them all as inclusive.
> >
> > So, I propose that "other" be removed from all collection names
> > (except perhaps mis - I'll discuss that in another thread). I
> > understand that some applications, such as MARC, would still want to
> > treat some collections as exclusive. I don't see this change as
> > contradicting that: we simply need to clarify that, in a particular
> > application that does not use all of the identifiers in the combined
> > parts of ISO 639, particular collections may be used in an exclusive
> > manner, at the discretion of the particular application.
> >
> > Proposed change: make all collections to be of one type with one
> > pattern for naming.
> >
> > Action if accepted:
> >
> > * ISO 639-2 tables and the draft table for ISO 639-5: all names of
> the
> > form "Foo (Other)" changed to "Foo languages". A note added in
> > appropriate places explaining that applications may use collections
> in
> > an exclusive manner according to the needs of the particular
> > application. (Corresponding changes should get made in a revision to
> > the text of ISO 639-2.)
> >
> > * ISO 639-3: A note added in description of collection scope
> > explaining that applications may use collections in an exclusive
> > manner according to the needs of the particular application.
> >
> >
> >
> > Peter
> >