LISTSERV mailing list manager LISTSERV 16.0

Help for ISOJAC Archives


ISOJAC Archives

ISOJAC Archives


ISOJAC@LISTSERV.LOC.GOV


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Monospaced Font

LISTSERV Archives

LISTSERV Archives

ISOJAC Home

ISOJAC Home

ISOJAC  December 2007

ISOJAC December 2007

Subject:

Re: decision required: "other" collections

From:

Milicent K Wewerka <[log in to unmask]>

Reply-To:

ISO 639 Joint Advisory Committee <[log in to unmask]>

Date:

Fri, 14 Dec 2007 07:46:16 -0500

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (257 lines)

As for Sorbian, apparently there is (or was) a language called East Sorbian so the use of "wen" would be appropriate for that.

There is also the possibility of using "und" if one doesn't know which language one has.

Milicent Wewerka

>>> Peter Constable <[log in to unmask]> 12/13/07 5:45 PM >>>
One more point: consider a scenario such as a librarian who knows they've got a book in (e.g.) a Sorbian language, but they don't know which. With the exclusion assumption in force, they *must* find out which specific language it is; it would be *incorrect* to tag it using wen "Sorbian languages" since the book is surely in a language excluded from that collection.

Now, if you say, "We don't really intend collections to be *that* exclusive, and it's OK for them to use wen if they don't have means to determine the specific language," then that would need to get applied consistently. In particular, it would also need to apply to "(Other)" collections. So, for instance, if the same librarian has a book in (say) some Slavic language but they're not sure which, then it should be just as acceptable in this case to tag the book using sla "Slavic (Other)". But notice: the book is in fact in *some* Slavic language, which may well be Russian or Czech or Serbian; in other words, we'd be saying it's OK to tag content in those languages using that collection ID. The implication of that would be that these really aren't "other" collections after all.

The point being this: if we insist these are really "other" collections, then we deny users the option of using the more generic collection IDs when knowledge is limited, and if we're at all consistent that must be true for all collections. We can have "other" collections, or we can allow users some flexibility, but we cannot have both. In my mind, there's no question that users should have flexibility to use collections in ways that meet their needs.


Btw, in my list of example groups, I listed afa "Afro-Asiatic languages"; that was an error: it is "Afro-Asiatic (Other)". The argument is supported by the remaining cases, however.


Peter

From: ISO 639 Joint Advisory Committee [mailto:[log in to unmask]] On Behalf Of Peter Constable
Sent: Thursday, December 13, 2007 8:49 AM
To: [log in to unmask]
Subject: Re: decision required: "other" collections

I'd like to return to Rebecca's comment,

> There is a problem with understanding the scope of the language if we
> remove "(Other)" and name all these with "Languages". The distinction
> of
> course is that when we use (Other) it means that some of the languages
> within the group have their own identifiers, while others go in this
> bucket. It alerts the user to make sure that the language in question
> is
> not separately defined by its own identifier.

There is a problem today in 639-2 with understanding the scope of "(Other)" collections, and removing "(Other)" doesn't change that. So, with or without coming up with informative mapping tables, I don't think removing "(Other)" is a problem for 639-2; in fact, it removes a problem for 639-2 that I've mentioned before. Let me elaborate on these points.

Joan mentioned wen "Sorbian languages" as an example: how does one know what the scope is? Well, if one does a bit of research, they probably discover there are two languages, Upper and Lower Sorbian, and then they discover that these are coded in 639-2 as hsb and dsb. They also note the text in clause 4.1.1 of 639-3,

"A collective language code is not intended to be used when an individual language code or another more specific collective language code is available."

So, they determine whether their document is Upper Sorbian, Lower Sorbian, or some other previously unrecognized Sorbian; if its one of the first two, they tag with hsb or dsb accordingly, and only if it's the latter do they use wen.

Now consider the same question applied to bat "Baltic (Other)": how does one know what the scope is? The same process gets used: one has to do a bit of research, which reveals that there are three languages: Latvian, Lithuanian and Prussian. They also discover that the first two are coded in 639-2, lav and lit respectively. Again, following the guidance in 4.1.1, they tag their document lav if it's Latvian, lit if it's Lithuanian, and bat if it's Prussian (or some other previously-unrecognized Baltic language).

For any collection, the process is the same. Having "(Other)" in the name really doesn't provide much benefit. Rebecca suggested that it's a clue that some languages in the group have their own identifier, but relying on the "languages" vs. "(Other)" distinction is unreliable and misleading. Consider these:


* afa "Afro-Asiatic languages": at least 3 languages in this group are coded in 639-2. Three other collections (ber, cus, sem) also fall in this group.

* alg "Algonquian languages": at least 7 languages in this group are coded in 639-2.

* ath "Athabascan languages": at least 8 languages in this group are coded in 639-2, and apa "Apache languages" also falls in this group.

* iro "Iroquoian languages": at least 2 languages in this group are coded in 639-2.

* mun "Munda languages": at least 1 language in this group is coded in 639-2.

* nai "North American Indian languages": at least 8 languages in this group are coded in 639-2. Two collections (alg, ath) also fall in this group.

* sio "Siouan languages": at least 2 languages in this group are coded in 639-2.

* wen "Sorbian languages": all languages in this group are coded in 639-2.

Gary Simons and I pointed out these cases over five years ago. (See http://www.ethnologue.com/14/iso639/analysis.asp and the accompanying paper we provided to the JAC, "An Analysis of ISO 639", http://www.sil.org/silewp/2002/SILEWP2002-004.pdf.) So, the potential benefit Rebecca suggested of having some collections marked "(Other)" really doesn't exist; in fact, users are misled if they make the necessary assumption.

Also, if the assumption is made that groups *exclude* some languages, then groups become unstable: any time a language in that group gets coded in 639-2, it's no longer in that group, so the scope of the code narrows, and any documents tagged with that group ID are now erroneously tagged. Gary Simons and I pointed this out seven years ago. Since we did the analysis of collection in 2002, there have been at least 37 languages newly coded in 639-2. Every single one of those additions caused a change of one or (because some collections contain collections) possibly more collections, and any existing records tagged with a collection ID became erroneously tagged. If we stick with this assumption of exclusion, then every time we code a language in 639-2 we are breaking some unknown quantity of users' data. That's a serious stability problem that exists only because we have "(Other)" collections and an assumption of exclusion. If we were to drop that assumption and remove "(Other)" from names, then we would be making a significant improvement to 639-2, even without mapping tables.

Note that we don't have to drop the guidance in 4.1.1 of 639-2. (In fact, if you think about it, 4.1.1 only makes sense if we *don't* have an assumption of exclusion: to use the collection rather than the individual language wouldn't even be an option because the latter is excluded.)

Thus, whether or not we introduce informative collection mapping tables, it makes sense to remove "(Other)" from collection names. I recommend that we make that change now.


Peter


From: ISO 639 Joint Advisory Committee [mailto:[log in to unmask]] On Behalf Of Joan Spanne
Sent: Monday, December 10, 2007 9:53 AM
To: [log in to unmask]
Subject: Re: decision required: "other" collections


The issue has existed since prior to Peter's analyses of collections when working on 639-3. For instance, there are code elements for Upper Sorbian [hsb] and Lower Sorbian [dsb] (since 2003-09-01), even while Sorbian Languages [wen] also existed. It would appear that Upper Sorbian and Lower Sorbian are the only recognized constituent individual languages for Sorbian Languages, so "other" would not apply, but in any case, there is no reference (implicit or explicit) to instruct an inquirer to look up the individual Sorbian languages.

I have been working on mapping all individual languages in 639-3 that are not also in Part 2 onto the collection code elements of Part 2 (unless already mapped to a macrolanguage in Part 2). (That is how I knew of the Sorbian languages case.) This is essentially an update of work that Peter and Gary Simons did a few years ago in preparing the first code tables drafts for 639-3. I can expand my mapping exercise to include individual languages of 639-2, mapping to the most appropriate collective code element for each. I have a number of motivations for doing this, but it basically fits in with Peter's "third option."

Whether an amendment is required is perhaps a part of the larger set of questions regarding informative aspects of each standard when we get to really dealing with the whole set as a database.

-Joan
Peter Constable <[log in to unmask]>
Sent by: ISO 639 Joint Advisory Committee <[log in to unmask]>

2007-12-07 10:35 PM
Please respond to
ISO 639 Joint Advisory Committee <[log in to unmask]>


To

[log in to unmask]

cc

Subject

Re: decision required: "other" collections







One option is that we *don't* provide such information and assume that an application will supply it on its own as needed.

Another option is that ISO 639-5 include informative mapping tables listing for each collection all of the entries it encompasses.

A third option is that informative mapping information can be provided in the opposite direction: each entry in 639-1/-2/-3/-5 would include an informative property listing one or more IDs for collections that include that given item.

I think the second or third could potentially be done as a maintenance exercise by the RAs or the JAC, though I also wouldn't assert there wouldn't be grounds for someone to say these required an amendment. IMO, the text in either 3.3 or 4.1.1 of 639-2 does not include anything that would prevent us from making name changes of this nature. On the other hand, adding informative mapping data would represent a significant technical change in the content of any of the standards that might warrant the amendment process.


Peter



> -----Original Message-----
> From: ISO 639 Joint Advisory Committee [mailto:[log in to unmask]] On
> Behalf Of Rebecca S. Guenther
> Sent: Friday, December 07, 2007 2:10 PM
> To: [log in to unmask]
> Subject: Re: decision required: "other" collections
>
> Peter:
>
> There is a problem with understanding the scope of the language if we
> remove "(Other)" and name all these with "Languages". The distinction
> of
> course is that when we use (Other) it means that some of the languages
> within the group have their own identifiers, while others go in this
> bucket. It alerts the user to make sure that the language in question
> is
> not separately defined by its own identifier. So if we don't make that
> distinction it will be hard for the user to know whether to look
> further.
> Perhaps this is an issue of documentation, when you suggest that there
> would be application decisions made for a subset. Currently we don't
> really have a mechanism to make these sorts of statements. Do you have
> a
> suggestion so that we don't totally lose this information? How could we
> document in the ISO 639-2 code lists?
>
> I'm not really concerned about MARC, because we have always said we
> don't
> have to use the same language names, only that the codes themselves
> represent the same entities. But some in the bibliographic world (and
> beyond) use the documentation on the ISO 639-2 site alone and somehow
> they
> will need to understand the scope of the language.
>
> Rebecca
>
> On Thu, 6 Dec 2007, Peter Constable wrote:
>
> > Ping?
> >
> > It's been over a week; I'd like to see us move toward closure on this
> > issue, please.
> >
> >
> > Peter
> >
> > From: ISO 639 Joint Advisory Committee [mailto:[log in to unmask]] On
> Behalf Of Peter Constable
> > Sent: Wednesday, November 28, 2007 3:45 PM
> > To: [log in to unmask]
> > Subject: decision required: "other" collections
> >
> > I want to revive this discussion so that hopefully we can bring
> > closure on it. I introduced two issues at the same time last April,
> > "other" collections, and "mis". The latter got people's attention,
> and
> > the former never got resolved. (The mis issue was resolved, so the
> > passing mention of it below can be ignored.)
> >
> > Millicent replied that removing "Other" may be a problem for those
> > using ISO 639-2 but not ISO 639-3. I responded to that suggesting
> that
> > this can be considered an application decision. Havard further
> > responded mentione 639-5 in the context of the entire 639 family
> > suggesting that 639-2 may be one of many possible subsets in which
> the
> > meaning of "other" would differ - the implication being that each
> > subset needs to define the intension or extension of collections
> > considered to be "other" collections in relation to the given subset.
> > (Havard's message, which includes what Millicent and I wrote, is
> > attached.)
> >
> > I note that the code table in ISO 639-5 FDIS does not include
> > "(Other)" in any entries, including the entries for all of the
> "other"
> > collections currently in 639-2.
> >
> > My proposal to remove "other" as described below stands.
> >
> >
> > Peter
> >
> > From: ISO 639 Joint Advisory Committee [mailto:[log in to unmask]] On
> Behalf Of Peter Constable
> > Sent: Thursday, April 19, 2007 1:28 PM
> > To: [log in to unmask]
> > Subject: decisions required: "other" collections, mis
> >
> > One of the issues I had identified was that the exclusive "other"
> > collections no longer make sense in a general application of ISO 639
> > since now every known language has its own identifier. It was not an
> > issue that absolutely needed to be addressed before part 3 was
> > published, but part 3 is now published, and users of the standards
> are
> > encountering this issue. Specifically, the group that works on IETF
> > language tags is currently revising that spec to incorporate part 3
> > and would like to see all the collections handled consistently in a
> > way that allows their application to treat them all as inclusive.
> >
> > So, I propose that "other" be removed from all collection names
> > (except perhaps mis - I'll discuss that in another thread). I
> > understand that some applications, such as MARC, would still want to
> > treat some collections as exclusive. I don't see this change as
> > contradicting that: we simply need to clarify that, in a particular
> > application that does not use all of the identifiers in the combined
> > parts of ISO 639, particular collections may be used in an exclusive
> > manner, at the discretion of the particular application.
> >
> > Proposed change: make all collections to be of one type with one
> > pattern for naming.
> >
> > Action if accepted:
> >
> > * ISO 639-2 tables and the draft table for ISO 639-5: all names of
> the
> > form "Foo (Other)" changed to "Foo languages". A note added in
> > appropriate places explaining that applications may use collections
> in
> > an exclusive manner according to the needs of the particular
> > application. (Corresponding changes should get made in a revision to
> > the text of ISO 639-2.)
> >
> > * ISO 639-3: A note added in description of collection scope
> > explaining that applications may use collections in an exclusive
> > manner according to the needs of the particular application.
> >
> >
> >
> > Peter
> >

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

April 2021
January 2021
November 2020
June 2020
May 2019
February 2019
September 2018
April 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
May 2016
April 2016
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
October 2013
September 2013
August 2013
July 2013
May 2013
April 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
December 2006
November 2006
October 2006
September 2006
August 2006
July 2006
June 2006
May 2006
April 2006
March 2006
January 2006
December 2005
November 2005
October 2005
September 2005
August 2005
July 2005
June 2005
May 2005
April 2005
March 2005
February 2005
January 2005
December 2004
November 2004
October 2004
September 2004
August 2004
July 2004
June 2004
May 2004
April 2004
March 2004
February 2004
January 2004
December 2003
November 2003
October 2003
September 2003
August 2003
July 2003
June 2003
May 2003
April 2003
March 2003
February 2003
January 2003
December 2002
November 2002
October 2002
September 2002
August 2002
July 2002
June 2002
May 2002
April 2002
March 2002
February 2002
January 2002
December 2001
November 2001
October 2001
September 2001
August 2001
July 2001
June 2001
April 2001
March 2001
February 2001
January 2001
December 2000
November 2000
October 2000
September 2000
August 2000
July 2000
June 2000
May 2000
April 2000
March 2000
February 2000
January 2000

ATOM RSS1 RSS2



LISTSERV.LOC.GOV

CataList Email List Search Powered by the LISTSERV Email List Manager