Print

Print


Karen Coyle writes:
 > > I'm just trying to understand how to get names like the following
 > > properly coded to they can be reliably formatted for
 > > citation/bibliographies:
 > >
 > > Dr. Jennifer Jones
 > > Jane Smith Jr.
 > > John Q. Whoever III
 > > Baron von Hausman
 > >
 > > The first is easy.  The other less so.
 >
 > Yes, and to do it correctly would take artificial intelligence. You can
 > write algorithms to parse out some of the elements, like "Jr" and "Mr.",
 > you can link a "von" to the family name, but there are some names that
 > only a person, with knowledge of the context (including the language
 > being used), can figure out. It turns out that figuring out names takes
 > up to half of the time used by library catalogers when cataloging a
 > book.
 >
 > If you have a limited context you can do pretty well. But a universal
 > name parser may be on the order of machine translation of poetry. ;-)
 >

Doesn't that call for an extended list of namePart attributes, plus
the recommendation to *not* dump the full name into a single namePart
element but rather to use a separate element for each name part (which
the element name suggests anyway)?

We've been facing this problem lately in a different
context. Applications that create formatted bibliographies from raw
bibliographic data need to distinguish between at least first, middle,
and last names, plus honorific or lineage information. The terms
"first", "middle", and "last" are culture-specific, unfortunately, and
sometimes not even applicable in the western culture, but they should
suffice here to explain the problem. Consider the name "Franklin Delano
Roosevelt" as an example. Besides the reordering of the name parts,
punctuation, use of spaces and such, the following basic
representations may be used by different journals using this name in a
bibliography:

Roosevelt, F.D.

Roosevelt, Franklin D.

Roosevelt, Franklin Delano

In this case "Roosevelt" is undoubtedly the last name/family name,
whereas "Franklin" is the first name/given name/name likely not to be
initialized, and "Delano" is the middle name/name likely to be
initialized. There are other examples where the first vs. middle
distinction is incorrect based on the order, as in the fictionary
"S. John Miller", who prefers to abbreviate his first given name (or
whatever we should call it).

You've mentioned correctly that no software this side of artificial
intelligence will be able to tell these name parts apart unless a
human being can wrap the proper markup around the parts. From my point
of view the current distinction between "family" and "given" combined
with the possibility to put everything into a single element makes it
unnecessarily hard to work with these data in the context of
bibliography formatting.

regards,
Markus


--
Markus Hoenicka
[log in to unmask]
(Spam-protected email: replace the quadrupeds with "mhoenicka")
http://www.mhoenicka.de