When I search "michael douglas" in IMDb, I get a list of identities for "Michael Douglas." They have the roman numeral designation, but more importantly, they also have information associating, say, a particular Michael Douglas with "stunts" for "Ski School". The point is that the roman numeral is just a neutral bit of differentiating data (in principle; not claiming that IMDb has done a great job of merging and differentiating persons). Once the differentiation is achieved, other facts can be called up and joined to the differentiated identity. The type of facts and the type of display could vary; it wouldn't have to follow the pattern in IMDb. That said, if I'm looking for a Michael Douglas who's credited as a stuntperson on the DVD I'm cataloging, the information IMDb provides is more useful to me than a list of birth dates would be. It's also more economical, since the facts about this Michael Douglas are already recorded in the data IMDb has for "Ski School," and don't need to be researched. A machine could come up with them, given the right underlying data structures.

I also like that all the "Michael Douglas" entries are pulled together in IMDb. In our catalog, any $c text gets alphabetized with all the middle names and initials for other Michael Douglases, making the task of browsing to find a particular Michael Douglas that much more arduous. If our indexes could collocate the $c cases, that would help, but I've given up on hoping for that. The better solution would be to make the name heading simple, make it always possible to differentiate one heading from another, and work on deriving the additional identifying information associated with the identified name for the list display. That wouldn't solve all our problems. There will always be ambiguous cases and the potential for human error. But it would be a big step forward in terms of our ability to differentiate entities and convey useful information about them.

Stephen

On Tue, Oct 26, 2010 at 9:29 AM, Mike Tribby <[log in to unmask]> wrote:
While I appreciate Stephen's reasoning and his careful handling of this topic, I would suggest that the setting in which one practices might be a determinant in the orderliness of the consideration of when authors of diverse topics seem likely to be the same person. That being said, though, if we had other information carefully and reasonably assembled, improvements in our ability to match authors would be significantly improved. A lot of the new or unknown authors with which we deal in a mostly popular materials environment are identified by their publishers with the kind of information not usually allowed in headings such as where they live, with whom they live, and precisely which kind of Lab or other retriever they favor for canine companionship.

Any strategy we allow will have some potential for misconstruing authors' identities, including the present rules. Stephen, how would the heading for our goat/paper airplane author look in a search result? "Smith, Joe, paper airplane/goat guy"? Would we be expecting patrons to click on a name heading to see what factors led us to believe that person was the author of the work represented by the bib record? If we're not using dates or fuller forms of names, what differentiation would be present in the record? I'm hoping it won't be the less than helpful (IMNSHO) and frequently wrong numerical designations of IMDb-- but what else?




Mike Tribby
Senior Cataloger
Quality Books Inc.
The Best of America's Independent Presses

mailto:[log in to unmask]


-----Original Message-----
From: Program for Cooperative Cataloging [mailto:[log in to unmask]] On Behalf Of Stephen Hearn
Sent: Monday, October 25, 2010 5:33 PM
To: [log in to unmask]
Subject: Re: Theses name headings and privacy concerns

Following the example Mike proposes--I have a book on raising angora goats published in Spain by author Juan Gomez, and I see an authority record for "Gomez, Juan" with the title "Aerodynamics of paper airplane design" in a 670. A second look up tells me the 670 book is published in the US. The question is, are these the same person? While it's true that I can't definitively say they are different people without knowing some unique fact for both of them, e.g., that they have different birth dates, I can nevertheless use cataloger's judgment, which tells me that it's highly likely that these are two people. My contention is that our accuracy rate for correctly distinguishing people with common names would be significantly better if we had rules in place that enabled us to make and apply such judgments, rather than bundling persons we are virtually certain are different people onto undifferentiated authorities. In practice, it's rare to see anyone left on an undifferentiated authority when a date is discovered for the person. By Mike's reasoning, if I found the plane designer and the goat raiser sharing an undifferentiated authority and then found a date for the goat raiser, I could do nothing--I still wouldn't have definitive proof that they're different people. But in practice, such date discoveries regularly account for the creation of a new, unique authority, as the PERSNAME-L list attests. We do use judgment in these cases when the rules allow us to. Separating heading strings from differentiation would enable us to apply such judgment in all cases.

The advantage of moving to identifiers for managing the uniqueness of entities is that they provide a stronger basis for assembling linked data. For example, if OCLC modified its use of "controlled heading" links to enable an auxiliary display of bib data linked to a given authority, I could see more information about the plane designer noted above with my first look-up. The authority record could reach out and find a set of titles positively identified as being by my author by another cataloger. That would make my searching easier.

There are lots of ways this could work and could look, and ways it would still be vulnerable to careless data entry; but on the whole, I think we'd be better off.

Stephen




On Mon, Oct 25, 2010 at 1:32 PM, Mike Tribby <[log in to unmask]> wrote:


       Generally speaking I think worries about identity theft resulting from name authority work revealing persons' birthdates or fuller forms of their names are overblown. That doesn't mean that every author or other contributor wants their vital information shared and, having worked with more that a few authors who adamantly didn't want certain facts made a public part of their NAR, I sympathize with their desire to have some control over that information. As far as identity theft, though, it should be pointed out that the Mark Twain example is valid (as an example of finding useful tidbits for information theft) more because of Twain's fame than because he's dead. Granted most modern identity thieves would shy away from using a birthdate from the 1800s, but they's shy away from using a famous name even more. Some cases of identity theft do indeed using the personal information of dead people, just not famous dead people.

       I routinely give birthdate information not needed to create a currently unique NAR in a 670 note, especially if requested to not use the information by the author. But regardless of how we create unique name authority records I don't see how Stephen Hearn's scenario really changes much: "Once the uniqueness of a person's authority record is switched to a machine-processable identifier rather than the current name heading, that identifier can be used more successfully to locate information about the person via linked data stores--e.g., affiliation, other authored titles, etc.--thereby making the decisions about who likely wrote what simpler."

       How does that change make it easier to divine that the author with the common name who until recently wrote about the aerodynamics of paper airplane design has now moved to another country and taken up writing about raising angora goats? For at least the first title about the goats, we'd still have the problem with matching the author to his previous work and, thereby to the proper NAR.



       Mike Tribby
       Senior Cataloger
       Quality Books Inc.
       The Best of America's Independent Presses

       mailto:[log in to unmask]





--

Stephen Hearn, Metadata Strategist
Technical Services, University Libraries University of Minnesota 160 Wilson Library
309 19th Avenue South
Minneapolis, MN 55455
Ph: 612-625-2328
Fx: 612-625-3428


No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.449 / Virus Database: 271.1.1/3220 - Release Date: 10/26/10 06:34:00



--
Stephen Hearn, Metadata Strategist
Technical Services, University Libraries
University of Minnesota
160 Wilson Library
309 19th Avenue South
Minneapolis, MN 55455
Ph: 612-625-2328
Fx: 612-625-3428