On Tue, Jun 5, 2012 at 11:53 AM, Karen Coyle <[log in to unmask]> wrote:
Keeping an exact order is less intuitive in RDF. I'm not sure how
that would be done.
This would be good time to try and go over some of the ways that one can represent this kind of ordered values using some different types of knowledge representation languages.
Instead of just focusing on the RDF, I'll also look at some related languages for networked knowledge representation (KR) - primarily ISO Common Logic and IKL (with maybe a little CyCL). These languages use parentheses to mark the beginning and end of expressions. The first thing inside the parentheses is the predicate. Thus, for example, to say that Gene loves Jezebel one would write:
In RDF/XML this corresponds to"
In N3 this can be written as:
<#Gene> :loves <#Jezebel> .
Approach #1: Keep all the values in an ordered list.
This is very easy to do in the Common Logic style KR languages, as they support predicates (properties) that can take arbitrary numbers of arguments.
(authors work1 JohnSmith FredBloggs PaulErdos Golgo13)
In RDF, predicates can only have two arguments (these arguments, together with the predicate name, are the three parts of the triple). However, the various syntaxes for RDF have special support for handling lists or sets of arguments.
In RDF/XML we can build a list using the "Collection" syntax:
This states that work1 has a value for authors that is a list of four names.
However, because RDF only traffics in triples, this notation requires some transformation. What happens is that the contents of the "Collection" element is used to build an explicit rdf:List object. For details see Appendix 1.
The drawback of using a single assertion to maintain the order is that it becomes much harder to work with the data, as we are no longer making statements about the relationship between an individual author and a specific work
Approach #2: Use multiple assertions, with the rank of the author included.
This approach is very simple to use in Common Logic et al. Since we can use predicates with more than two arguments, we can define an author predicate that takes as arguments a work, an author, and the rank of this author for this work. For example:
In RDF the situation is slightly more complicated, since we can only use predicates with two arguments. However, the situation is not too bad; we just need to create an extra object for each value;
(author work1 JohnSmith 1)
(author work1 FredBloggs 2)
(author work1 PaulErdos 3)
(author work1 Golgo13 4)
We can use a feature of OWL 2 called Property Role Chains to associate the value of author from the rankedAuthor objects without having to explicitly look at the rankedAuthor objects.
_:w1a1 :author <#JohnSmith> .
_:w1a1 :rank "1" .
_:w1a2 :author <#FredBloggs> .
_:w1a2 :rank "2" .
_:w1a3 :author <#PaulErdos> .
_:w1a3 :rank "3" .
_:w1a4 :author <#Golgo13> .
_:w1a4 :rank "4" .
<#work1> :rankedAuthor _:w1a1 .
<#work1> :rankedAuthor _:w1a2 .
<#work1> :rankedAuthor _:w1a3 .
<#work1> :rankedAuthor _:w1a4 .
It is important to note here that, unlike in the first example, we do not know that there is nobody behind Golgo 13. This can be handled in a few different ways.
In CyC, one can state that the complete extent of predicate is known, which means that if the system cannot infer that that there are any more authors, it is can infer that there aren't. This "World Closing" can also be done at query time, using Negation as Failure semantics (e.g. using the "NOT EXISTS" filter in a SPARQL query.
We can also make explicit assertions; for example, in the CL family, we can assert that there can for all works there can only be one author at each rank, and that there for a specific work there is no author whose rank is greater than 4. In IKL, CycL, and OWL, we can also state that the work1 is something that has exactly four values of author.
Approach 3: Use constraints and rules
In situations where only some authors are given numeric rank, and the rest are ordered by some other principal (e.g. lexicographic order, or no order specified), we can just state the constraints on authorship are, and leave the ordering to be determined by the computer. We could then indicate that JohnSmith was principal investigator; that no-one goes behind Golgo 13, and the relative contributions of all authors, then calculate appropriately ordered lists of authors based on context (which might be that of the query, or that of the work, or some other set of rules.
This is where the advantages of representing data as logical propositions, rather than as strings should become immediately obvious to anyone who has ever done work on scientometrics. Also, many people may be disappointed to learn that their college courses in philosophy might turn out to be of practical use.
It should be clear why no one should reasonably expect catalogers to enter this sort of information directly. It should also be clear that the Rules for a Knowledge Based need to be developed with direct input from Subject Matter Experts who understand the theory behind the practice. Most important of all, it ought to be obvious that any new Bibliographic Framework needs to consider all the changes to work flows and practice that can be helped or hindered by different choices, and which cost/benefit tradeoffs need to be made.
Information about ISO Common Logic and IKL, as well as relevant portions of RDF can be found in Pat Hayes's guide at http://www.ihmc.us/users/phayes/ikl/guide/guide.html
. There are several examples of how one can handle ordered lists in the section on "SEQUENCE MARKERS VS. ARGUMENT LISTS" in the examples in Appendix B.
Appendix 1: How RDF turns Collections into triples.
Lists in RDF are a lot like lists in programming languages like Lisp and Scheme.
Non empty List content is handled by creating a new List object, and defining two property values for it.
The property rdf:first is set to the value of the first value of the list.
The property rdf:rest is set to point to a List object containing the rest of the list.
The first value of the rest of the list is the second value in the collection.
If there is no more content in the collection, the value of rdf:rest is set to the value rdf:nil. This explicitly states that there are no more elements in the list that we don't know about; In our example we can thus be sure that there is no-one behind Golgo 13.
The value of the property on the object we're describing is then set to point to the first object in the list.
In N3 this becomes:
This can become somewhat ungainly.
_:list1 rdf:first <#JohnSmith> .
_:list1 rdf:rest _:list2 .
_:list2 rdf:first <#FredBloggs> .
_:list2 rdf:rest _:list3 .
_:list3 rdf:first <#PaulErdos> .
_:list3 rdf:rest _:list4 .
_:list4 rdf:first <#Golgo13> .
_:list4 rdf:rest rdf:nil .
<#work1> ex:authors _:list1 .