Kelley,

I had some thoughts about some of your questions--which I hope will be helpful.  You've done a great job at articulating these issue, by the way!

On Sun, Jan 8, 2012 at 7:51 PM, Kelley McGrath <[log in to unmask]> wrote:

I'd like to organize my thoughts around some issues that came up when OLAC was doing our initial investigations into the potential of the FRBR model to improve access to moving images (see http://www.olacinc.org/drupal/?q=node/27, particularly part 3a). There we were talking about works, but I'd like to work through an example using language track information on DVDs.

It's easy to see how to construct a statement that says this DVD is usable in English

DVD1 -- hasLanguage -- English

But during our discussions, we realized that we wanted to record several more specific aspects of language information, including whether the language is

Spoken, signed or written
Within written whether it is captions (open, closed, SDH), subtitles, or intertitles
The original language or a translation
Primary or secondary


These seem like important considerations, well worth including. But your question is how to do that.

 
Primary vs. secondary might seem like an odd thing to want to know, but in practice, you can go wrong if you don't make this distinction. IMDb often fails on this count, which leads to a list of the most popular Thai language films being topped by The Hangover Part II (2011) and Rambo (2008) (see http://www.imdb.com/language/th) and The Godfather (http://www.imdb.com/title/tt0068646/) is listed as if it is equally in English, Italian and Latin. You also see this lack of distinction in library bibliographic records, especially for educational/documentary videos with a few subtitled sequences in another language.

So maybe one way to go at this would be to combine all these characteristics into one mega predicate

DVD1 -- hasLanguagePrimaryAudio -- English

And then map that to the less restrictive cases so

hasLanguagePrimaryAudio -- isSubTypeOf -- hasLanguagePrimary
hasLanguagePrimaryAudio -- isSubTypeOf -- hasLanguageAudio
hasLanguagePrimary -- isSubTypeOf -- hasLanguage
hasLanguageAudio -- isSubTypeOf -- hasLanguage

Are you speaking about property relationships here, or relationships between concepts used as part of a descriptive vocabulary (which is what your subType relationships sound like to me).  If you take a look at some of the RDA relationships in the OMR (for example: http://metadataregistry.org/schemapropel/list/schema_property_id/422.html, you can see all the subproperties for adaptationOfWork) there is a more general property [basedOnWork], and more specific properties [novelizationOfWork].  
 
In the value vocabularies, the different concepts may have hierarchical relationships, which are expressed in the form of SKOS broader/narrower aspects. See: http://metadataregistry.org/concept/list/vocabulary_id/99.html for an example of one of those vocabularies.



so if someone is just looking at the unrefined language level they can get that. But it does seem like an awful lot of possibilities to account for.

Maybe another way would be to introduce an intermediate entity between the DVD and the language information like this. One advantage is that you could distinguish mixed soundtracks from multiple soundtracks as in statements 1 and 2 in the example below for a DVD with the movie's original mixed Arabic and French soundtrack, a dubbed Spanish soundtrack and an English subtitle track.

DVD1 hasLanguageStatement LanguageStatement1
LanguageStatement1 -- Language -- Arabic
LanguageStatement1 -- Language -- French
LanguageStatement1 -- LanguageLevel -- Primary
LanguageStatement1 -- LanguageType -- Audio
LanguageStatement1 -- LanguageOriginal -- Original
LanguageStatement1 -- InfoSource -- Container

DVD1 hasLanguageStatement LanguageStatement2
LanguageStatement2 -- Language -- Spanish
LanguageStatement2 -- LanguageLevel -- Primary
LanguageStatement2 -- LanguageType -- Audio
LanguageStatement2 -- LanguageOriginal -- Translation
LanguageStatement2 -- InfoSource -- Container


DVD1 hasLanguageStatement LanguageStatement3
LanguageStatement3 -- Language -- English
LanguageStatement3 -- LanguageLevel -- Primary
LanguageStatement3 -- LanguageType -- Written
LanguageStatement3 -- LanguageTypeWritten -- Subtitle
LanguageStatement3 -- LanguageOriginal -- Translation
LanguageStatement3 -- InfoSource -- Container

And then you would have to give people who want to use this data some way to connect the dots, which I'm not sure how to do.

This approach would also be useful for ordering data. For instance, for film and video, the order in which cast names are presented is important, as well as the type of ordering. In addition, this could allow you to make statements about where the data came from. So you could have something that linked transcribed names with identifiers.

Work1 hasCastCredits CastStatement1

CastStatement1 hasSource Manifestation1 [or http://www.imdb.com/title/tt0101531/ which is where I actually took this from or some other reference source or unspecified for legacy data or where someone doesn't want to bother]
CastStatement1 hasOrder CreditsOrder

CastStatement1 hasCredit CreditStatement1
CreditStatement1 hasPosition 1
CreditStatement1 hasTranscribedName "Charlie Sheen"
CreditStatement1 hasNAR http://id.loc.gov/authorities/names/n88368094 [Sheen, Charlie]
CreditStatement1 hasFunction http://id.loc.gov/vocabulary/relators/act.html [actor]
...
CastStatement1 hasCredit CreditStatement15
CreditStatement1 hasPosition 15
CreditStatement15 hasTranscribedName "Larry Fishburne"
CreditStatement15 hasNAR http://id.loc.gov/authorities/names/no93030105 [Fishburne, Laurence, 1961-]
CreditStatement1 hasFunction http://id.loc.gov/vocabulary/relators/act.html [actor]

Of course this is a lot of nesting and you'd have to make it work for data consumers who didn't want all that complexity.

I think your use of the word 'nesting' was a clue to me that you're thinking of this problem more as an XML thing than an RDF thing.


How would you approach these kinds of problems with a named graph? Or is this not something where you'd want a named graph? Is it better not to do all this in linked data but rather some format for internal consumption and just use the linked data for the simplified data that external users are likely to want? Am I hopelessly on the wrong track?


For the named graph approach, I think you would need to look more carefully at how your data structures and extensions are built, rather than think of the process of making relationships as 'nesting',

Does that make sense?

Diane
 
Kelley


Kelley McGrath
University of Oregon
[log in to unmask]