On Sun, 21 Dec 2003 00:30:15 +0100, "Markus Hoenicka"
<[log in to unmask]> said:
> > If you have a limited context you can do pretty well. But a universal
> > name parser may be on the order of machine translation of poetry. ;-)
>
> Doesn't that call for an extended list of namePart attributes, plus
> the recommendation to *not* dump the full name into a single namePart
> element but rather to use a separate element for each name part (which
> the element name suggests anyway)?
>
> We've been facing this problem lately in a different
> context. Applications that create formatted bibliographies from raw
> bibliographic data need to distinguish between at least first, middle,
> and last names, plus honorific or lineage information. The terms
> "first", "middle", and "last" are culture-specific, unfortunately, and
> sometimes not even applicable in the western culture
[...snip...]
> You've mentioned correctly that no software this side of artificial
> intelligence will be able to tell these name parts apart unless a
> human being can wrap the proper markup around the parts. From my point
> of view the current distinction between "family" and "given" combined
> with the possibility to put everything into a single element makes it
> unnecessarily hard to work with these data in the context of
> bibliography formatting.
I'm curious what people in the library community think about this issue.
The "other context" that Markus was referring to was a discussion on the
refdb list that started with a French user who argued that middle names
have no place in bibliographic metadata, to which Markus answered with
the pragmatic observation that many bibliographic styles demand different
formatting for middle names, so the metadata ought to support it too. I
have come to believe that it's important to separate metadata from
formatting issues -- e.g. that one should not force contortions in the
metadata simply to get proper output formatting -- even if they cannot be
divorced completely.
So, a few observations:
1) It strikes me that Markus is right that records that end up in MODS
(for example, via transformation from MARCXML) ought to parse the names
at least into family and given names and termsOfAddress. There ought to
be no problem in converting it back to MARCXML.
2) I'm not convinced middle name should be in the metadata. It seems to
me multiple given name elements could solve the problem, though this
admittedly introduces the possibility for variability in coding
practices. The idea is:
<namePart type="given">Franklin</namePart>
<namePart type="given">Delano</namePart>
A processor could say the first element should be handled as a first name
for purposes of citation formatting or searching, and the second as a
"middle" name.
3) Re: Karen's observation that termOfAddress goes at the end because of
tradition: I don't like this either. What happens if there are two
termsOfAddress, one which is understood to preceed the name (e.g. "Sir"),
and the other which goes at the end? Seems to me that the term thus
ought to go in the correct order, particularly since XML/XSLT has no
limitations on reordering elements.
4) Finally, hypothetically, let's say authority data for names could be
represented in XML (maybe even as a web service). Would that be a place
to consider further parsing of names? I still think pieces like "von" and
"III" are best be parsed, even if I'm not convinced yet that "middle"
name should.
BibTeX has an algorithm that can handle "von," incidentally, by virtue of
the fact that it is lower-case. In that case, von would be stuck in
family name. That doesn't strike me as very bullet-proof though.
Bruce
|