I have been thinking about the roles of URIs and strings in bibliographic data. It seems clear that we need both and that they should work together. The question then becomes when to use which and for what? How do they relate and how can we make them work together effectively? I am trying to get my head around how this might work from a practical perspective.

Some time ago there was a discussion about recording the place of publication. Places seem to me to be an example of the best case for using URIs: there are comprehensive, externally-maintained lists at the level of specificity required (generally cities for places of publication). In most cases, it seems like just using a URI and abandoning transcription for place of publication would be functional. This wouldn't work for things like early printed books, but for most contemporary materials, it would seem sufficient to make a note if there were something unusual about the way the name was presented on the resource (and does BF have a way to connect notes to the elements that they are describing?). From a practical perspective, the place of publication still needs to be based on the resource rather than being a characteristic of the publisher. Publishers move around and have offices in many places. Trying to track who was based where and when would be a nightmare. Using URIs would have the benefit of distinguishing London, England from London, Ontario and London, Ohio.

Names are another area where we will make use of URIs. There are a great many more names than places and, although there are multiple external sources of URIs, there is nothing like the comprehensive coverage that is available for places.

What happens if there isn't an existing URI for a name? There is a cost to making a string into a useful thing and putting it in its place in the universe. The cost is lessened by eliminating the need to create a unique string, but it is not zero. Not everyone will be able to contribute to shared lists like the LC National Authority File. Not every name is worth the trouble of disentangling.

Will people just coin a one-off URI using their own domain? How reliable will these be? The advantage of this approach is that if new information becomes available, it's easier to integrate. You can just say that this locally-maintained URI represents the same thing as this NAF identifier and not have to mess with the string. If you start out with the assumption of separate until proven the same, this might work reasonably well since it's easier to merge than to split apart.

Relationship designators or roles are another area where URIs for a controlled vocabulary have a lot of potential. However, more so than names and places, relationship designators would benefit from central planning and systematically coordinated terms. They are more useful with adequate cross-references and clear definitions so that they can be consistently applied. A list of relationship designators is necessarily more open-ended and amorphous than lists of names or places. All this makes them more difficult to apply. There always seem to be discussions and teeth gnashing on RDA-L and elsewhere about what relationship designators to use when.

I have some data that might inform this discussion. As some of you know, OLAC (http://olacinc.org) has been crowdsourcing the parsing of statements about responsibility from MARC records for moving images. (If you want to help us out, go to http://olac-annotator.org/ ). Our goal is to get a pool of correct answers to help train a machine to do this parsing so that we end up with more structured data. As a side effect of this project, we have accumulated a long list of roles as they are actually described in moving image resources. Here are a few observations from looking at this list, none of them particularly surprising, but it is nevertheless useful to have real data.

1. There is a clear 80/20 type pattern
The majority of roles listed on moving image records fall into these categories:  directors, producers, production companies, writers, cinematographers, editors, music-related credits and credits related to performance or onscreen participation. Although moving images often have many, many contributors, these are roles that are considered central and that have traditionally been recorded in cataloging records.

2. There are many refinements and sub-categories within these broad categories
You can't just stop there, though. Within each category, there are various sub-categories. Most of the producers are just plain producers, but there are also executive producers, associate producers, assistant producers, co-producers, senior producers, supervising producers, chief producers, etc.

There are also roles which have a different scope than just being straightforwardly related to an entire film or program.
Producer for BBC
Producer in Japan
Producer of U.S. release
executive producer of English version
Series producer
Series senior producer
There are roles with the word producer that are a somewhat or entirely different kind of thing from what is meant by the plain word "producer." Some or all of these should not be related to the producer category: line producer, field producer, coordinating producer, technical producer, creative producer, music producer, audio mix producer, audio producer, soundtrack producer, concert producer, creative producer. And these are just the English terms. In many languages, there is something that translates to "general producer." I'm not quite sure where that fits.

Someone has devoted a whole page to disentangling the mainstream uses of producer: http://johnaugust.com/2004/producer-credits-and-what-they-mean The Producers Guild of America also takes a stab at it: http://www.producersguild.org/?page=faq

There are also TV/DVD/video producers who can be contrasted with stage producers for filmed performances (this particular distinction is more common with directors, but it does occur with producers). However, DVD/video producers can be something else entirely if they are just concerned with putting together a DVD of existing footage. When one word or phrase, such as DVD producer, is used for two different actions, that is one challenge for applying relationship designators. Looking at the list of phrases describing roles we have compiled, I am struck by three other significant challenges for applying relationship designators.

My favorite is the archetypal: "responsible, Lê Mỹ Phương," but there are many others:
action, Shyam Kaushal
associate, Ved M Rao
concept, Clive Sugars
devised & designed by Kamalini Dutt
idea, Hana Bělohradská
Series proposed by Benoit Peeters
supervisor, Dr. Nurdin Perdana
Team works, Kartawijaya ... [et al.]
with Wolfgang Brendel
developed by Robbe de Hert & Fernand Avwera
Of course, there are also names with no role given. It is hard to assign a useful relationship designator for most of these.

Some of these are more ambiguous than others and some are probably less ambiguous with the resource in hand.
Musical adaptation and direction, Penella [direction or musical direction?]
music and soundtrack producer, Kurt Munkacsi [music or music producer?]
author and singer of songs, Vladimir Vyso︠t︡ski [author or author of songs?]
associate director and editor, Alexander Hammid [associate editor or editor?]
graphics and video editor, Michael Seibert [graphics or graphics editor?]

bass: voice or instrument?
songs: writer or singer?
music: composer or performer?
narration by: writer or speaker?
Most of these are likely transcription problems, but some of them are bound to be real.
Winstar Cinema release, Shochiku Co., Ltd. presents a 3H Film Productions of a film by Hou Hsiao-Hsien
a Universal presents
Finally, there is the other half of that 80/20 equation: the long, long tail.
•    anthropological consultants, Carlos Fausto…
•    Dancing directors, Kanokwan Wannamāt…
•    dream sequences based on designs by Salvador Dali.
•    Julie Moir Messervy, garden designer
•    Km. Durga, tanpura
•    synchronization director, Matjaž Žbontar
•    tiger trainer, Boris Ėder
•    spider web spinner, Malcolm Wheel.
•    floor drop painted by Vaughn Patterson
•    body art & locations, Eeo Stubblefield
•    movement coordinator, Geraldine Stephenson
For linking, there needs to be a certain amount of lumping so that things come together. It probably isn't helpful to have a separate relationship designator for every possible term. On the other hand, to accurately represent a resource, it's better to be specific. Saying someone is a "contributor" and not giving any more details isn't very useful. Would an approach like that taken by IMDb fulfill both needs? They have broad categories (writing), which would be good for linking, and then put more specific, clarifying phrases (story, original story by, screenplay), which tell you exactly what the person did, next to people's names. Is there a way in BF to link the name URI, the transcribed name, the relationship designator and the transcribed role as IMDb does?
Produced by [category]
J.J. Abrams    ...    producer
Jeffrey Chernov    ...    executive producer
Tommy Gormley    ...    co-producer

Art Direction by [category]
Ramsey Avery    ...    supervising art director
Kasra Farahani
Andrew Murdock    ...    (as Andrew E.W. Murdock)
Alan Au    ...    (uncredited)
A few semi-random thoughts for what they're worth.


PS If you're interested in the relationship designator data, I'll be talking about this ALA Midwinter at the Catalog Management Interest Group on Saturday, January 31, 1-2:30 pm in room W196c of the convention center (http://alamw15.ala.org/node/25808)

PPS Why not help us out and annotate a few credits at http://olac-annotator.org/

Kelley McGrath
Metadata Management Librarian
University of Oregon
[log in to unmask]