Stephen Paul Davis wrote:
I understand that W3C and others have recognized that the RDF triples
approach in fact lacks two important parameters that will need to be
defined before we go much further, namely namespace and provenance. So
we'll need "quintuples" instead of triples
**
Ivan Herman <[log in to unmask]> wrote
Well, yes and no...
The RDF community and, more specifically, the RDF Working Group, has to
come to grip with the notion of named graphs. Simply put, there should
be a way to consider a set of triples and identify that set with a
URI... Once this is somehow settled, the general framework can be used
to attach, eg, provenance information to a graph... So we are not
talking about quintuples. You can look at the named graphs as quads
(that is the way many system implement them) but that is only an
implementation detail for now.
**
I find this idea of "named graphs" very interesting and would like to
understand it better. Apologies in advance for the length of my
questions and comments. I hope they make some sense as my understanding
of linked data is rudimentary.
At ALA Annual someone made the comment that linked data doesn't support
assertions in the form X said that Y is Z. Other people said this wasn't
true, but I didn't hear any explanations of how you could do it. I am
coming at this from a cataloger's perspective and for a project I am
working on there are times when I think I want to say things like this
or other things that seem to require more than three data points. I am
not sure how much sense this will make, but I thought I'd throw it out
there and see if I'm at all on the right track.
I'd like to organize my thoughts around some issues that came up when
OLAC was doing our initial investigations into the potential of the FRBR
model to improve access to moving images (see
http://www.olacinc.org/drupal/?q=node/27, particularly part 3a). There
we were talking about works, but I'd like to work through an example
using language track information on DVDs.
It's easy to see how to construct a statement that says this DVD is
usable in English
DVD1 -- hasLanguage -- English
But during our discussions, we realized that we wanted to record
several more specific aspects of language information, including whether
the language is
Spoken, signed or written
Within written whether it is captions (open, closed, SDH), subtitles,
or intertitles
The original language or a translation
Primary or secondary
Primary vs. secondary might seem like an odd thing to want to know, but
in practice, you can go wrong if you don't make this distinction. IMDb
often fails on this count, which leads to a list of the most popular
Thai language films being topped by The Hangover Part II (2011) and
Rambo (2008) (see http://www.imdb.com/language/th) and The Godfather
(http://www.imdb.com/title/tt0068646/) is listed as if it is equally in
English, Italian and Latin. You also see this lack of distinction in
library bibliographic records, especially for educational/documentary
videos with a few subtitled sequences in another language.
So maybe one way to go at this would be to combine all these
characteristics into one mega predicate
DVD1 -- hasLanguagePrimaryAudio -- English
And then map that to the less restrictive cases so
hasLanguagePrimaryAudio -- isSubTypeOf -- hasLanguagePrimary
hasLanguagePrimaryAudio -- isSubTypeOf -- hasLanguageAudio
hasLanguagePrimary -- isSubTypeOf -- hasLanguage
hasLanguageAudio -- isSubTypeOf -- hasLanguage
so if someone is just looking at the unrefined language level they can
get that. But it does seem like an awful lot of possibilities to account
for.
Maybe another way would be to introduce an intermediate entity between
the DVD and the language information like this. One advantage is that
you could distinguish mixed soundtracks from multiple soundtracks as in
statements 1 and 2 in the example below for a DVD with the movie's
original mixed Arabic and French soundtrack, a dubbed Spanish soundtrack
and an English subtitle track.
DVD1 hasLanguageStatement LanguageStatement1
LanguageStatement1 -- Language -- Arabic
LanguageStatement1 -- Language -- French
LanguageStatement1 -- LanguageLevel -- Primary
LanguageStatement1 -- LanguageType -- Audio
LanguageStatement1 -- LanguageOriginal -- Original
LanguageStatement1 -- InfoSource -- Container
DVD1 hasLanguageStatement LanguageStatement2
LanguageStatement2 -- Language -- Spanish
LanguageStatement2 -- LanguageLevel -- Primary
LanguageStatement2 -- LanguageType -- Audio
LanguageStatement2 -- LanguageOriginal -- Translation
LanguageStatement2 -- InfoSource -- Container
DVD1 hasLanguageStatement LanguageStatement3
LanguageStatement3 -- Language -- English
LanguageStatement3 -- LanguageLevel -- Primary
LanguageStatement3 -- LanguageType -- Written
LanguageStatement3 -- LanguageTypeWritten -- Subtitle
LanguageStatement3 -- LanguageOriginal -- Translation
LanguageStatement3 -- InfoSource -- Container
And then you would have to give people who want to use this data some
way to connect the dots, which I'm not sure how to do.
This approach would also be useful for ordering data. For instance, for
film and video, the order in which cast names are presented is
important, as well as the type of ordering. In addition, this could
allow you to make statements about where the data came from. So you
could have something that linked transcribed names with identifiers.
Work1 hasCastCredits CastStatement1
CastStatement1 hasSource Manifestation1 [or
http://www.imdb.com/title/tt0101531/ which is where I actually took this
from or some other reference source or unspecified for legacy data or
where someone doesn't want to bother]
CastStatement1 hasOrder CreditsOrder
CastStatement1 hasCredit CreditStatement1
CreditStatement1 hasPosition 1
CreditStatement1 hasTranscribedName "Charlie Sheen"
CreditStatement1 hasNAR http://id.loc.gov/authorities/names/n88368094
[Sheen, Charlie]
CreditStatement1 hasFunction
http://id.loc.gov/vocabulary/relators/act.html [actor]
...
CastStatement1 hasCredit CreditStatement15
CreditStatement1 hasPosition 15
CreditStatement15 hasTranscribedName "Larry Fishburne"
CreditStatement15 hasNAR http://id.loc.gov/authorities/names/no93030105
[Fishburne, Laurence, 1961-]
CreditStatement1 hasFunction
http://id.loc.gov/vocabulary/relators/act.html [actor]
Of course this is a lot of nesting and you'd have to make it work for
data consumers who didn't want all that complexity.
How would you approach these kinds of problems with a named graph? Or
is this not something where you'd want a named graph? Is it better not
to do all this in linked data but rather some format for internal
consumption and just use the linked data for the simplified data that
external users are likely to want? Am I hopelessly on the wrong track?
Kelley
Kelley McGrath
University of Oregon
[log in to unmask]
|