Print

Print


Dear Mark Davis,
Thank you for commenting these exchanges.
And let me add that, in my personal opinion, every one of your three brief comments goes exactly the wrong way by completely ignoring the real spirit and history of ISO 639.
Bien amicalement.
Gérard Lang
Le mieux pour l'industrie de l'information est parfois l'ennemi du bien de la société 
Le 26 sept. 2013 à 10:08, Mark Davis ☕ a écrit :

> This is a long and convoluted thread, but I have a couple of brief comments.
> 
> 1. It is a terrible idea to have a code for Montenegrin. Any in-depth discussion with people from that area of the world reveals that the differences between Serbian and Montenegrin are on the order of dialect differences, not languages. The differences are comparable to those you see across English or Spanish, and no more different than one encounters between different parts of Serbia itself.
> 
> Secondly, there is already a well-recognized language subtag (BCP47) for Montenegrin: sr-ME. Introducing an equivalent to that will simply bring another opportunity for software breakage, nothing more. So in the interests of stability, no new code for Montenegrin should be added. (This is also a dangerous path for the committee to follow; departing from the pragmatic principles that have governed the assignment codes—especially those affecting stability—will cause downstream clients to find other solutions.)
> 
> 2. While the formal title is "Codes for the representation of names of languages", that is, and always has been, recognized as a misnomer. It is and always has been codes for languages, not their names. (Otherwise, each alternate name for each language would have required a different code, which has never been the case.)
> 
> 2. The visual association between a three letter code and a language is of little importance. These codes are simply internal identifiers. While it is useful to try to maintain some sort of association, it is in the end, not particularly significant.
> 
> 
> 
> Mark
> 
> — Il meglio è l’inimico del bene —
> 
> 
> On Mon, Sep 23, 2013 at 11:20 PM, ISO639-3 <[log in to unmask]> wrote:
> Dear Gerard et al.,
> 
> The one thing no one has mentioned in your discussion is a problem of phonology: most of the codes that are pronounceable, and that comprise the first letters of a language name are already taken. Also, because of phonological frequency of these segments, languages beginning in "B" "K" and "M" have few available codes (10 total available for these 3 initial letters).
> 
> In addition, few of the codes for the 11 languages of interest to Mr. Lang have been blocked for use by national languages, but others are not.
> 
> I have a function on my system which can query available codes, should you need it in the future.
> 
> Melinda
> 
> On Mon, 23 Sep 2013 19:11:41 +0200
>  Gérard Lang-Marconnet <[log in to unmask]> wrote:
> >
> >Le 23 sept. 2013 à 18:10, Gérard Lang-Marconnet a écrit :
> >
> >> Dear John,
> >> I am following your request and relaying our exchanges to the JAC Listserv.
> >> By the way, I would be most happy (and maybe some others would also be) to have the exact list of the persons on the list that receive the messages we exchange.
> >> Bien amicalement.
> >> Gérard Lang
> >> Le 23 sept. 2013 à 16:43, Gérard Lang-Marconnet a écrit :
> >>
> >>> OK for me.
> >>> Gérard Lang
> >>> Le 23 sept. 2013 à 16:35, Zagas, John a écrit :
> >>>
> >>>> I kindly ask you all:  Please post these to the JAC Listserv.   I do not see the reason why this discussion is being restricted to us four.  I will start posting these messages to the listserv if I continue to be cc'd on these.
> >>>>
> >>>> Thank you very much.
> >>>>
> >>>> John Zagas
> >>>>
> >>>> Library of Congress
> >>>> Network Development & MARC Standards Office
> >>>> 101 Independence Ave., S.E.
> >>>> Washington, DC  20540-4402
> >>>> USA
> >>>> Phone: 202.707.1153
> >>>> FAX:   202.707.0115
> >>>> E-Mail: [log in to unmask]
> >>>>
> >>>>
> >>>>
> >>>> From: Gérard Lang-Marconnet [mailto:[log in to unmask]]
> >>>> Sent: Monday, September 23, 2013 10:32 AM
> >>>> To: Sebastian Drude
> >>>> Cc: Galinski Christian; Zagas, John; Lang Gérard
> >>>> Subject: Re: Combinatorial analysis regarding visual association between a reference name of alanguage and possible ISO 639 code elements for the representation of this language name
> >>>>
> >>>> Dear Sebastian,
> >>>> If we suppose that every interesting language can be named; and better can be attributed at least one autonym, one name in english and also one name in french to allow identification without too much ambiguity; then we have no  problem with the standard's title "Codes for the representation of the names of languages". And clearly the codes elements are representing a reference name for the underlying language. This does not at all allow that many names for the same language will have distinct entries in ISO 639, this only says that when new ISO 639 entry is identified by some array of names for this language, the code element to be attributed is representing the reference language name choosen in this array. Let me also add that it seems much more easy to know what is a language name that to know what is a language. For example, "Serbo-Croatian" is certainly a name of language , but there was clearly no unanimity to introduce an alpha-3 code element making an ISO 639-2 !
>  entry for
> >this name of language when this would have been legally mandatory because there existed an alpha-2 code element "sr" that was an ISO 639-1 entry from the beginning.  If we would want a single code point for each language, independant of the different or same names of these languages, we would have to turn to a numeric coding scheme.
> >>>> As a statistician, I would say that this is what makes a nomenclature richer than a classification.
> >>>> When building a classification, you make hierarchical aggregations of elements of the domain you are studying and use classes, so that at each level of the hierarchy all classes cover the total domain with a void intersection between two distinct classes. Building a nomenclature from a classification is using the resources of terminology to give each class of each level a distinguishing identifying name allowing to immediately recognize what elements can be affected to this class.
> >>>> Bien amicalement.
> >>>> Gérard Lang
> >>>> Le 23 sept. 2013 à 15:42, Sebastian Drude a écrit :
> >>>>
> >>>>
> >>>> Thanks for the explanations of the mathematical calculus, Gerard.
> >>>>
> >>>> As for the name of the standard, codes for names of languages; this always has struck me as inadequate.
> >>>> In my perhaps naïve point of view, it is obvious that the codes refer to the languages themselves, and that they are normalized additional “names” for them, instead of referring to other names.
> >>>>
> >>>> Otherwise, we would not give different ISO code points for two languages who share one English(?) name.
> >>>> Likewise, with good reasons we do not hand out different codes for languages that happen to have several alternative names (which holds for almost all languages).
> >>>>
> >>>> To have a single code point for each LANGUAGE, independent of the different or same names of these languages, seems to me to be the very point of ISO 639.
> >>>>
> >>>> Best,
> >>>> Sebastian (Drude)
> >>>>
> >>>> I ask for your understanding if, in the interest of being quick and short, this mail may not fulfil all requirements on form and politeness.
> >>>> --
> >>>> PD Dr. Sebastian Drude, The Language Archive
> >>>> Max-Planck-Institute for Psycholinguistics
> >>>> P.O. Box 310, 6500 AH Nijmegen, The Netherlands
> >>>> Email: [log in to unmask] – Phone: (+31) 24-3521.470
> >>>> http://www.mpi.nl/people/drude-sebastian
> >>>>
> >>>> From: Gérard Lang-Marconnet [mailto:[log in to unmask]]
> >>>> Sent: Samstag, 14. September 2013 16:28
> >>>> To: Gérard Lang-Marconnet
> >>>> Cc: Sebastian Drude; Galinski Christian; Zagas John
> >>>> Subject: Re: Combinatorial analysis regarding visual association between a reference name of alanguage and possible ISO 639 code elements for the representation of this language name
> >>>>
> >>>> Please excuse my mistake.
> >>>> Evidently 3.N.N.(N-26) must be replaced by 3.N.N.(26-N).
> >>>> Gérard Lang
> >>>> Le 14 sept. 2013 à 15:46, Gérard Lang-Marconnet a écrit :
> >>>>
> >>>>
> >>>>
> >>>> Dear Sebastian,
> >>>> The subject and the meaning of this combinatorial exercise is as follows.
> >>>> Considering a reference language name whose script in the latin alphabet uses exactly N distinct letters (for example "english" uses 7 distinct letters), let us say that a three-letter code element written with the 26 letters of the latin alphabet has a "strong visual association" with this language name if every one of the three letters in the code element is a letter taken in the language name (we do not look only for "abbreviations", so that we do not ask the order of the occurences of the letters in the code element to be the same that in the language name and we allow the same letter to occur two or three times in the code element even in the case that there is only one occurence in the reference language name. There are exactly N.N.N such code elements having a strong visual association with a reference language name written with N letters.
> >>>> Now, in the case that no such code element is available, or judged correct as a representation of this name of language, let us consider that we have a "moderately interesting visual association" in the case that only two of the three letters composing the code element occur in the considered language name, so that the third one will have no occurence in the language name.  There are exactly 3.N.N.(N-26) such code elements having a moderately interesting visual association with the considered language name. It is only in the case that no such code element is available or judged correct that we can claim that it is absolutely not possible to choose a code element having an interesting visual association to represent the reference name we choose for the considered language.
> >>>> I hope this explanation will satisfy your question.
> >>>> And in my opinion, this is giving a partial answer to Christian's paragraph. Let me also remind you that the very title of ISO 639 is "Codes for the representation of names of languages", so that as a principle the ISO 639 code elements are considered as representing not directly languages (whose socio-political status, or many others characteristics, may change) but names for these languages.
> >>>> Bien cordialement.
> >>>> Gérard Lang
> >>>>
> >>>> Le 14 sept. 2013 à 12:29, Sebastian Drude a écrit :
> >>>>
> >>>>
> >>>>
> >>>> Dear Gérard,
> >>>> Although I consider myself quite strong in mathematics and logics, I cannot make any sense whatsoever of the formula P(N)=N.N.N + 3.N.N.(26-N).
> >>>> If the exercise is to arrive at the number of combinations of three different from 26 letters, one just would calculate “N * (N-1) * (N-2)”, right?  (First letter any of the 26, second any other than the first, third one any other than the first and the second.)
> >>>> But as far as I know there is no rule that states that all three letters have to be different.
> >>>>
> >>>> So what is the intention / rule of finding all “interesting” combinations behind your formula?
> >>>>
> >>>> And, more importantly, why would this settle that Christian is wrong with the second part of his mail, or answer to this part of Christian`s mail at all:
> >>>> “To this we can add today that we should find better rules in selecting language identifiers/symbols so that they are not necessarily be considered as abbreviations.
> >>>> Needless to say that languages change (in terms of linguistic norm, user distribution and language status) and also the socio-political status of language names may change – so we will increasingly run into problems in the coding of language names, if they are based on abbreviation.”
> >>>>
> >>>> Sorry if I am slow in following your thinking, but I fear there are many implicit presuppositions that many may take for granted and which I do now know.
> >>>>
> >>>> Cordially,
> >>>> Sebastian
> >>>>
> >>>> I ask for your understanding if, in the interest of being quick and short, this mail may not fulfil all requirements on form and politeness.
> >>>> --
> >>>> PD Dr. Sebastian Drude, The Language Archive
> >>>> Max-Planck-Institute for Psycholinguistics
> >>>> P.O. Box 310, 6500 AH Nijmegen, The Netherlands
> >>>> Email: [log in to unmask] – Phone: (+31) 24-3521.470
> >>>> http://www.mpi.nl/people/drude-sebastian
> >>>>
> >>>> From: Gérard Lang-Marconnet [mailto:[log in to unmask]]
> >>>> Sent: Freitag, 13. September 2013 19:02
> >>>> To: Galinski Christian; Sebastian Drude; Zagas John; Lang Gérard
> >>>> Subject: Fwd: Conbinatorial analysis//Re: AW: AW: Alpha-3 ISO 639 reserved code elements/// New JAC ballot  on the n  ame of language  mont énégri n/Montenegrin
> >>>>
> >>>> The appropriate message, one more time.
> >>>> Gérard Lang
> >>>>
> >>>> Début du message réexpédié :
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> De : Gérard Lang-Marconnet <[log in to unmask]>
> >>>> Date : 19 novembre 2012 16:20:56 HNEC
> >>>> À : Lang Gérard <[log in to unmask]>
> >>>> Objet : Réexp : Conbinatorial analysis//Re: AW: AW: Alpha-3 ISO 639 reserved code elements/// New JAC ballot  on the n  ame of language  mont énégri n/Montenegrin
> >>>>
> >>>>
> >>>>
> >>>> Début du message réexpédié :
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> De : Gérard Lang-Marconnet <[log in to unmask]>
> >>>> Date : 1 novembre 2012 18:35:42 HNEC
> >>>> À : ISO JAC Voting Member List <[log in to unmask]>, Guenther Rebecca <[log in to unmask]>, Lang Gérard <[log in to unmask]>
> >>>> Objet : Réexp : Conbinatorial analysis//Re: AW: AW: Alpha-3 ISO 639 reserved code elements/// New JAC ballot  on the n  ame of language  mont énégri n/Montenegrin
> >>>>
> >>>>
> >>>>
> >>>> Début du message réexpédié :
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> De : Gérard Lang-Marconnet <[log in to unmask]>
> >>>> Date : 26 août 2012 10:49:59 HAEC
> >>>> À : Budin Gerhard <[log in to unmask]>, Lang Gérard <[log in to unmask]>
> >>>> Cc : Peter Constable <[log in to unmask]>, ISO639-3 Melinda <[log in to unmask]>
> >>>> Objet : Conbinatorial analysis//Re: AW: AW: Alpha-3 ISO 639 reserved code elements/// New JAC ballot  on the n  ame of language  mont énégri n/Montenegrin
> >>>>
> >>>> Dear Gerhard,
> >>>> Thank you for your message.
> >>>> In fact, my combinatorial analysis was not fully complete (it is well known that combinatorial analysis is a subtle matter !), so that the true results are a little better that the ones I gave in my previous message.
> >>>> There is a more general and more direct approach, as follows.
> >>>> For a basic word built with N distinct roman letters (N being an integer number between 1 and 26), we have:
> >>>> -N.N.N (the cube of N) code element with all three letters taken among the N letters of the basic considered word;
> >>>> -and 3.N.N.(26-N) code elements with two of the three letters taken among the N lettres of the basic word and the third letter taken among the (26-N) others roman letters.
> >>>> So that the number of interesting possibilities for a word having N distinct roman letters is: P(N)=N.N.N + 3.N.N.(26-N)= N.N(N + 3(26-N))= N.N(78-2.N).
> >>>>
> >>>> The corresponding P(N) numbers (for N varying from 1 to 10) are:
> >>>> N=1      N.N=1        78-2=76      P(N)=   76
> >>>> N=2      N.N=4        78-4=74      P(N)=  296
> >>>> N=3      N.N=9        78-6=72      P(N)=  648
> >>>> N=4      N.N=16      78-8=70      P(N)=1120
> >>>> N=5      N.N=25     78-10=68     P(N)=1700
> >>>> N=6      N.N=36     78-12=66     P(N)=2376
> >>>> N=7      N.N=49     78-14=64     P(N)=3136
> >>>> N=8      N.N=64     78-16=62     P(N)=3968
> >>>> N=9      N.N=81     78-18=60     P(N)=4860
> >>>> N=10    N.N=100   78-20=58     P(N)=5800
> >>>>
> >>>> Bien amicalement.
> >>>> Gérard Lang
> >>>>
> >>>>
> >>>> Le 26 août 2012 à 02:23, Budin Gerhard a écrit :
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> dear Gérard,
> >>>> thank you for your thoughtful and interesting message, I enjoyed reading about the combinatorial background.
> >>>> regards
> >>>> Gerhard
> >>>>
> >>>>
> >>>> Univ.-Prof. Dr. Gerhard Budin
> >>>>
> >>>> Centre for Translation Studies
> >>>> University of Vienna
> >>>> Gymnasiumstraße 50
> >>>> A-1190 Vienna, Austria
> >>>> E-Mail: [log in to unmask]
> >>>> T: +43 1 4277 58020
> >>>> F: +43 1 4277 9580
> >>>> M: +43 664 60277 58020
> >>>>
> >>>> Institute for Corpus Linguistics and Text Technology
> >>>> Austrian Academy of Sciences
> >>>> Sonnenfelsgasse 19/8
> >>>> A-1010 Vienna, Austria
> >>>> E-Mail: [log in to unmask]
> >>>> T: +43 1 51581 2300 (Secretary)
> >>>>
> >>>>
> >>>>
> >>>> ________________________________________
> >>>> Von: Gérard Lang-Marconnet [[log in to unmask]]
> >>>> Gesendet: Samstag, 25. August 2012 10:51
> >>>> An: Budin Gerhard; Lang Gérard
> >>>> Cc: Peter Constable; ISO639-3 Melinda
> >>>> Betreff: Re: AW: Alpha-3 ISO 639 reserved code elements/// New JAC ballot  on the n  ame of language  mont énégri n/Montenegrin
> >>>>
> >>>> Dear All,
> >>>> Thank you for agreeing "me" and "onm".
> >>>> While I globally agree with Gerhard's message, I do not see the situation as pessismistic as him and Peter. Sure, as long as the code element for the name of language "english" is not something like "wzx" or an alpha-3 code element build with a strong visual association with say the romanized version of the russian translation of the word "english" written with he cyrillic alphabet or that the code element "spa" is clearly build with the three third letters of the english translation of the autonym of the considered name of language, there is strictly no hope to convince people that this is plain hazard.
> >>>> It is more simple, honest and convincing to publicly admit that the initial plan was really to have a strong visual association between the names of languages and the alpha-2 and alpha-3 code elements choosen to represent them and build upon the (if necessary romanized) autonym or the english or the french linguistic version of this name. And so is it explicitely written in the normative texts of ISO 639:1988, ISO 639-2:1998, ISO 639-1:2002 (and also ISO 639-5:2008 ?), and so is it in fact evidently done in ISO 639-1, ISO 639-2 and ISO 639-5, and so for maybe all most written and spoken languages of the world.
> >>>> Problems came with ISO 639-3 and its title  "Alpha-3 code for a comprehensive coverage of languages", supposed to build a code "that aims to define three letters identifiers for all known human languages"; I voted against the choice of this title because i found it unusefully pompous and also dangerous (as the creation of "Europanto" immediately proved). With only 17576 possible identifiers for around 7800 names of languages, it was clearly becoming a challenge to maintain a strong a visual association between code elements and names of languages. But this was nevertheless the case that there was a (maybe not so strong as before) visual association in a vast majority of cases in the initial (not published within the standard) version of ISO 639-3.  So, it is now a veritable provocation to explain to people coming now to ask for the creation of a new entry within ISO 639-3 that they have strictly no chance to get a visual association (having mnemonic virtues) between the!
>   choosen
> >code element and the name of "their" language.
> >>>> First, there is a choice for the base word to be represented between the autonym, french or english vesion that gives some commodity. After that saying a visual association does not say systematically take the three first letters of the base word, or even have all three letters of the code elements within the set of letters building the base word. What is at least wanted to have a chance of visual association is that at least two of the three letters of the code element are among the set of letters of the base word; this is certainly not a strong association, but it is far better than no association art all. And it is clearly not always possible to find such a solution. But people would find it far more respectful of their language and culture if we clearly and honestly admitted that the rule is as I propose (or something clear like that), and that every effort will be mad before choosing a code element having strictly no visual association with the name of language. I !
>  would add
> >that every choice has cultural, historic, politic and psychological connections that cannot be underestimed. I will give the recent following exemple from ISO 3166-1: When admitted as a new UN member state, South Sudan  was to be a new entry and asked for the alpha-2 code element "SS"; some members of the Maintenance Agency (and notably our german colleague) were not happy with this request. But South Sudan maintained his choice that was approved after I remarked that we had previously attributed to Saudi Arabia the code element "SA" that was as bad as "SS".
> >>>> Finally, as a mathematician, I would say that the law of combinatorics are not as bad as you seem to think. Let's take the example of the base word "english", taken as the autonym of the considered name of language; this word has seven, all distincts, roman letters, so that the number of alpha-3 code elements having two of this seven letters as first letters is 7.6.25=1 050. It is not sure that one of these 1 050 combinations is still free, but the chances are not so bad ! And if this is not sufficient, we can look for code elements whose second and third letter, but not the first one, are among the seven letters: this gives us a new set of posibilities, all distincts of the previous ones, and there are 19.7.9=798 such possible choices.   And, if we are now looking for code elements whose first and third letter, but not the second one, are among the seven letters, this still gives us a new set of possibilities all distinct of the previous ones, and there also are 7.19.6!
>  =798.
> >>>> This gives us a set of 1 050+ 798 + 798= 2 646 possible alpha-3 code elements having a (maybe not so good !) visual association with the word "english" !
> >>>> The chances that the intersection of this set of alpha-3 code elements (that represents 2 646/ 17576= 0.150 5 of the total possibilities) with the remaining free set of alpha-3 code elements (that represent say 17 576 - 7800/ 17 576= 0.556 2 of the total possibilities) be void are not so big !
> >>>> Bien cordialement.
> >>>> Gérard Lang
> >>>>
> >>>> Le 25 août 2012 à 07:26, Budin Gerhard a écrit :
> >>>>
> >>>> dear all,
> >>>> so then let's go for onm for Montenegrin and prepare the dossier and the JAC vote.
> >>>> We have been trying desperately in the past and will have to continue to do so to explain to the interested public at large and to particular language communities concerned when assigning a code element to "their" language that given the laws of combinatorics it is simply impossible to always comply with their wishes to have highly mnemonic code for their language, when such a desirable code element(s) had already been assigned to another language or languages long time ago. When going through all our language code elements, it becomes clear that quite a number of languages must already to with code elements that are not or almost not mnemonic. Of course we have always tried to be "as mnemonic as possible" in choosing new elements, such as in the present case.
> >>>> regards
> >>>> Gerhard
> >>>>
> >>>>
> >>>>
> >>>> Univ.-Prof. Dr. Gerhard Budin
> >>>>
> >>>> Centre for Translation Studies
> >>>> University of Vienna
> >>>> Gymnasiumstraße 50
> >>>> A-1190 Vienna, Austria
> >>>> E-Mail: [log in to unmask]
> >>>> T: +43 1 4277 58020
> >>>> F: +43 1 4277 9580
> >>>> M: +43 664 60277 58020
> >>>>
> >>>> Institute for Corpus Linguistics and Text Technology
> >>>> Austrian Academy of Sciences
> >>>> Sonnenfelsgasse 19/8
> >>>> A-1010 Vienna, Austria
> >>>> E-Mail: [log in to unmask]
> >>>> T: +43 1 51581 2300 (Secretary)
> >>>>
> >>>>
> >>>>
> >>>> ________________________________________
> >>>> Von: Peter Constable [[log in to unmask]]
> >>>> Gesendet: Samstag, 25. August 2012 01:29
> >>>> An: Gérard Lang-Marconnet; ISO639-3 Melinda
> >>>> Cc: Budin Gerhard
> >>>> Betreff: RE: Alpha-3 ISO 639 reserved code elements/// New JAC ballot  on the n  ame of language  mont énégri n/Montenegrin
> >>>>
> >>>> I'd like to understand _why_ it would be useful to reserve a code element for Montenegrin. I'm more interested in that than in the choice of code element.
> >>>>
> >>>> If a code element _were_ assigned / reserved for Montenegrin, I have no objection to Gérard's choice, "onm".
> >>>>
> >>>>
> >>>> Peter
> >>>>
> >>>>
> >>>>
> >>>
> >>
> >
>