Print

Print


On Tue, Jun 5, 2012 at 11:53 AM, Karen Coyle <[log in to unmask]> wrote:

Keeping an exact order is less intuitive in RDF. I'm not sure how that
> would be done.
>

This would be good time to try and go over  some of the ways that one can
represent this kind of ordered  values  using some different types of
knowledge representation languages.

*Introduction*

Instead of just focusing on the RDF, I'll also look at some related
languages for networked knowledge representation (KR) - primarily ISO
Common Logic and IKL (with maybe a little CyCL).   These languages use
parentheses to mark the beginning and end of expressions. The first thing
inside the parentheses is the predicate.  Thus, for example,  to say that
Gene loves Jezebel one would write:

(loves Gene Jezebel)

In RDF/XML this corresponds to"

<rdf:Description about="#Gene">
       <loves rdf:Resource="#Jezebel/>
 </rdf:Description>


In N3  this can be  written as:

<#Gene>  :loves <#Jezebel> .


*Approach #1: Keep all the values in an ordered list.*

This is very easy to do in the Common Logic style KR languages, as they
support predicates (properties) that can take arbitrary numbers of
arguments.

(authors work1 JohnSmith FredBloggs PaulErdos Golgo13)


In RDF, predicates can only have two arguments (these arguments, together
with the predicate name, are the three parts of the triple).  However, the
various syntaxes for RDF have special support for handling lists or sets of
arguments.

In RDF/XML we can build a list using the "Collection" syntax:

<rdf:Description about="#work1">
   <authors rdf:parseType="Collection">
      <rdf:Description about="#JohnSmith"/>
      <rdf:Description about="#FredBloggs"/>
      <rdf:Description about="#PaulErdos"/>
      <rdf:Description about="#Golgo13"/>
   </authors>
</rdf:Description>


This states that work1 has a value for authors that is a list of four
names.

However, because RDF only traffics in triples, this notation requires some
transformation.  What happens is that the contents of the "Collection"
element is used to build an explicit rdf:List object.    For details see
Appendix 1.

The drawback of using a single assertion to maintain the order is that it
becomes much harder to work with the data, as we are no longer making
statements about the relationship between an individual  author and a
specific work

*Approach #2: Use multiple assertions, with the rank of the author
included. *

This approach is very simple to use in Common Logic et al.  Since we can
use predicates with more than two arguments, we can define an author
predicate that takes as arguments a work, an author, and the rank of this
author for this work.  For example:

(author work1 JohnSmith 1)
(author work1 FredBloggs 2)
(author work1 PaulErdos 3)
(author work1 Golgo13 4)

In RDF the situation is slightly more complicated, since we can only use
predicates with two arguments. However, the situation is not too bad; we
just need to create an extra object for each value;

_:w1a1 :author <#JohnSmith> .
_:w1a1 :rank "1" .
_:w1a2 :author <#FredBloggs> .
_:w1a2 :rank "2" .
_:w1a3 :author <#PaulErdos> .
_:w1a3 :rank "3" .
_:w1a4 :author <#Golgo13> .
_:w1a4 :rank "4" .
<#work1> :rankedAuthor _:w1a1 .
<#work1> :rankedAuthor _:w1a2 .
<#work1> :rankedAuthor _:w1a3 .
<#work1> :rankedAuthor _:w1a4 .


We can use a feature of OWL 2 called Property Role Chains to associate the
value of author from the rankedAuthor objects  without having to explicitly
look at the rankedAuthor objects.

It is important to note here that, unlike in the first example, we do not
know that there is nobody behind Golgo 13. This can be handled in a few
different ways.
In CyC, one can state that the complete extent of  predicate is known,
which means that if the system cannot infer that that there are any more
authors, it is can infer that there aren't.   This "World Closing" can also
be done at query time, using Negation as  Failure semantics (e.g. using the
"NOT EXISTS" filter in a SPARQL query.

We can also make explicit assertions; for example, in the CL family, we can
assert that there can for all works there can only be one author at each
rank, and that there for a specific work there is no author whose rank is
greater than 4.    In IKL, CycL, and OWL, we can also state that the work1
is something that has exactly four values of author.

*Approach 3:  Use constraints and rules *

In situations where only some authors are given numeric rank, and the rest
are ordered by some other principal (e.g. lexicographic order, or no order
specified), we can just state the constraints on authorship are, and leave
the ordering to be determined by the computer.  We could then indicate that
JohnSmith was principal investigator; that no-one goes behind Golgo 13, and
the relative contributions of all authors,  then calculate appropriately
ordered lists of authors based on context (which might be that of the
query, or that of the work, or some other set of rules.

This is where the advantages of representing data as logical propositions,
rather than as strings should become immediately obvious to anyone who has
ever done work on  scientometrics.   Also, many people may
be disappointed to learn that their college courses in philosophy might
turn out to be of practical use.

It should be clear why no one should reasonably expect catalogers to enter
this sort of information directly.  It should also be clear that the Rules
for a Knowledge Based need to be developed with direct input from Subject
Matter Experts  who understand the  theory behind the practice.  Most
important of all, it ought to be obvious that any new Bibliographic
Framework needs to consider all the changes to work flows and practice that
can be helped or hindered by different choices, and which cost/benefit
tradeoffs need to be made.

*References: *

Information about ISO Common Logic and IKL, as well as relevant portions of
RDF can be found in Pat Hayes's guide at
http://www.ihmc.us/users/phayes/ikl/guide/guide.html . There are several
examples of how one can handle ordered lists in the section on "SEQUENCE
MARKERS VS. ARGUMENT LISTS" in the examples in Appendix  B.

Information about RDF syntax can be found at
http://www.w3.org/TR/rdf-syntax-grammar/

Golgo 13 eye mask can be found at
http://www.amiami.com/top/detail/detail?oldscode=122694

*Appendix 1:  How RDF turns Collections into triples.*

Lists in RDF are a lot like lists in programming languages like Lisp and
Scheme.
Non empty List content  is handled by creating a new List object, and
defining two property values for it.
The property rdf:first is set to the value of the first value of the list.
The property rdf:rest is set to point to a List object containing the rest
of the list.
The first value of the rest of the list is the second value in the
collection.
If there is no more content in the collection, the value of rdf:rest is set
to the value rdf:nil.  This explicitly states that there are no more
elements in the list that we don't know about; In our example we can thus
be sure that there is no-one behind Golgo 13.

The value of the property on the object we're describing is then set to
point to the first object in the list.

In N3 this becomes:

_:list1 rdf:first <#JohnSmith> .
_:list1 rdf:rest  _:list2 .
_:list2 rdf:first <#FredBloggs> .
_:list2 rdf:rest  _:list3 .
_:list3 rdf:first <#PaulErdos> .
_:list3 rdf:rest  _:list4 .
_:list4 rdf:first <#Golgo13> .
_:list4 rdf:rest rdf:nil .
<#work1> ex:authors _:list1 .

This can become somewhat ungainly.