When I search "michael douglas" in IMDb, I get a list of identities for
"Michael Douglas." They have the roman numeral designation, but more
importantly, they also have information associating, say, a particular
Michael Douglas with "stunts" for "Ski School". The point is that the roman
numeral is just a neutral bit of differentiating data (in principle; not
claiming that IMDb has done a great job of merging and differentiating
persons). Once the differentiation is achieved, other facts can be called up
and joined to the differentiated identity. The type of facts and the type of
display could vary; it wouldn't have to follow the pattern in IMDb. That
said, if I'm looking for a Michael Douglas who's credited as a stuntperson
on the DVD I'm cataloging, the information IMDb provides is more useful to
me than a list of birth dates would be. It's also more economical, since the
facts about this Michael Douglas are already recorded in the data IMDb has
for "Ski School," and don't need to be researched. A machine could come up
with them, given the right underlying data structures.

I also like that all the "Michael Douglas" entries are pulled together in
IMDb. In our catalog, any $c text gets alphabetized with all the middle
names and initials for other Michael Douglases, making the task of browsing
to find a particular Michael Douglas that much more arduous. If our indexes
could collocate the $c cases, that would help, but I've given up on hoping
for that. The better solution would be to make the name heading simple, make
it always possible to differentiate one heading from another, and work on
deriving the additional identifying information associated with the
identified name for the list display. That wouldn't solve all our problems.
There will always be ambiguous cases and the potential for human error. But
it would be a big step forward in terms of our ability to differentiate
entities and convey useful information about them.


On Tue, Oct 26, 2010 at 9:29 AM, Mike Tribby
<[log in to unmask]>wrote:

> While I appreciate Stephen's reasoning and his careful handling of this
> topic, I would suggest that the setting in which one practices might be a
> determinant in the orderliness of the consideration of when authors of
> diverse topics seem likely to be the same person. That being said, though,
> if we had other information carefully and reasonably assembled, improvements
> in our ability to match authors would be significantly improved. A lot of
> the new or unknown authors with which we deal in a mostly popular materials
> environment are identified by their publishers with the kind of information
> not usually allowed in headings such as where they live, with whom they
> live, and precisely which kind of Lab or other retriever they favor for
> canine companionship.
> Any strategy we allow will have some potential for misconstruing authors'
> identities, including the present rules. Stephen, how would the heading for
> our goat/paper airplane author look in a search result? "Smith, Joe, paper
> airplane/goat guy"? Would we be expecting patrons to click on a name heading
> to see what factors led us to believe that person was the author of the work
> represented by the bib record? If we're not using dates or fuller forms of
> names, what differentiation would be present in the record? I'm hoping it
> won't be the less than helpful (IMNSHO) and frequently wrong numerical
> designations of IMDb-- but what else?
> Mike Tribby
> Senior Cataloger
> Quality Books Inc.
> The Best of America's Independent Presses
> mailto:[log in to unmask]
> -----Original Message-----
> From: Program for Cooperative Cataloging [mailto:[log in to unmask]]
> On Behalf Of Stephen Hearn
> Sent: Monday, October 25, 2010 5:33 PM
> To: [log in to unmask]
> Subject: Re: Theses name headings and privacy concerns
> Following the example Mike proposes--I have a book on raising angora goats
> published in Spain by author Juan Gomez, and I see an authority record for
> "Gomez, Juan" with the title "Aerodynamics of paper airplane design" in a
> 670. A second look up tells me the 670 book is published in the US. The
> question is, are these the same person? While it's true that I can't
> definitively say they are different people without knowing some unique fact
> for both of them, e.g., that they have different birth dates, I can
> nevertheless use cataloger's judgment, which tells me that it's highly
> likely that these are two people. My contention is that our accuracy rate
> for correctly distinguishing people with common names would be significantly
> better if we had rules in place that enabled us to make and apply such
> judgments, rather than bundling persons we are virtually certain are
> different people onto undifferentiated authorities. In practice, it's rare
> to see anyone left on an undifferentiated authority when a date is
> discovered for the person. By Mike's reasoning, if I found the plane
> designer and the goat raiser sharing an undifferentiated authority and then
> found a date for the goat raiser, I could do nothing--I still wouldn't have
> definitive proof that they're different people. But in practice, such date
> discoveries regularly account for the creation of a new, unique authority,
> as the PERSNAME-L list attests. We do use judgment in these cases when the
> rules allow us to. Separating heading strings from differentiation would
> enable us to apply such judgment in all cases.
> The advantage of moving to identifiers for managing the uniqueness of
> entities is that they provide a stronger basis for assembling linked data.
> For example, if OCLC modified its use of "controlled heading" links to
> enable an auxiliary display of bib data linked to a given authority, I could
> see more information about the plane designer noted above with my first
> look-up. The authority record could reach out and find a set of titles
> positively identified as being by my author by another cataloger. That would
> make my searching easier.
> There are lots of ways this could work and could look, and ways it would
> still be vulnerable to careless data entry; but on the whole, I think we'd
> be better off.
> Stephen
> On Mon, Oct 25, 2010 at 1:32 PM, Mike Tribby <
> [log in to unmask]> wrote:
>        Generally speaking I think worries about identity theft resulting
> from name authority work revealing persons' birthdates or fuller forms of
> their names are overblown. That doesn't mean that every author or other
> contributor wants their vital information shared and, having worked with
> more that a few authors who adamantly didn't want certain facts made a
> public part of their NAR, I sympathize with their desire to have some
> control over that information. As far as identity theft, though, it should
> be pointed out that the Mark Twain example is valid (as an example of
> finding useful tidbits for information theft) more because of Twain's fame
> than because he's dead. Granted most modern identity thieves would shy away
> from using a birthdate from the 1800s, but they's shy away from using a
> famous name even more. Some cases of identity theft do indeed using the
> personal information of dead people, just not famous dead people.
>        I routinely give birthdate information not needed to create a
> currently unique NAR in a 670 note, especially if requested to not use the
> information by the author. But regardless of how we create unique name
> authority records I don't see how Stephen Hearn's scenario really changes
> much: "Once the uniqueness of a person's authority record is switched to a
> machine-processable identifier rather than the current name heading, that
> identifier can be used more successfully to locate information about the
> person via linked data stores--e.g., affiliation, other authored titles,
> etc.--thereby making the decisions about who likely wrote what simpler."
>        How does that change make it easier to divine that the author with
> the common name who until recently wrote about the aerodynamics of paper
> airplane design has now moved to another country and taken up writing about
> raising angora goats? For at least the first title about the goats, we'd
> still have the problem with matching the author to his previous work and,
> thereby to the proper NAR.
>        Mike Tribby
>        Senior Cataloger
>        Quality Books Inc.
>        The Best of America's Independent Presses
>        mailto:[log in to unmask]
> --
> Stephen Hearn, Metadata Strategist
> Technical Services, University Libraries University of Minnesota 160 Wilson
> Library
> 309 19th Avenue South
> Minneapolis, MN 55455
> Ph: 612-625-2328
> Fx: 612-625-3428
> No virus found in this incoming message.
> Checked by AVG -
> Version: 8.5.449 / Virus Database: 271.1.1/3220 - Release Date: 10/26/10
> 06:34:00

Stephen Hearn, Metadata Strategist
Technical Services, University Libraries
University of Minnesota
160 Wilson Library
309 19th Avenue South
Minneapolis, MN 55455
Ph: 612-625-2328
Fx: 612-625-3428