Print

Print


Karen,

you can use constraints in RDF. Do not put the blame on RDF, it is just a modeling language. RDF is not per se just for inferencing new facts but it can also be instrumented by rules that can be interpreted for restricting classes to domains, checking properties for valid values and so on. It depends on how rules are interpreted and the reasoner works. 

Here is an example product for implementing integrity constraints on RDF (maybe there are more):

http://docs.stardog.com/#_validating_constraints

The operations over library data I expect are pretty clear to me, they were formulated in the Paris Principles in 1961

"Functions of the Catalogue
The catalogue should be an efficient instrument for ascertaining
2.1 whether the library contains a particular book specified by
(a) its author and title, or
(b) if the author is not named in the book, its title alone, or
(c) if author and title are inappropriate or insufficient for identification, a suitable
substitute for the title; and
2.2 (a) which works by a particular author and
 (b) which editions of a particular work are in the library."

In the meantime, they have been reformulated as International Cataloguing Principles (ICP) in http://www.ifla.org/files/assets/cataloguing/icp/icp_2009-en.pdf

With SPARQL, RDF stores can be queried. For tasks of document-based information retrieval, SPARQL complicates the matter for users and implementors and is also not very efficient, for example performing GROUP BY or ORDER BY. Also, I am not interested in presenting triples to patrons, I want to present documents with answers according to the Paris Principles / ICP. Therefore, I load RDF as JSON-LD into Elasticsearch for faster and more convenient document retrieval with filters, rankings, and aggregations, which offer more powerful solutions to requirements such as

- when queried, the catalogue should display a complete result set with most relevant documents first
- the catalogue should allow to display all relevant works grouped by authors and all relevant editions grouped by work
- the catalogue should allow to reorder, page, refine, and extend result sets by simple operations by the user
- a union catalogue should ascertain the complete list of libraries that hold a particular edition of a work and display the services offered by the library for that item

The model of RDF triples is nevertheless important, it provides better maintainability and portability of information sets on the Web (e.g. linking to other catalogues, export/import, incremental loads, merging catalogues into a union catalogue, referencing to other data sets like research data), therefore it promises to reduce costs massively (which still has to be proven by calculations). But that is nothing to concern a user with who wants to search the catalogue.

Jörg

On Wed, Jan 7, 2015 at 8:13 PM, Karen Coyle <[log in to unmask]> wrote:
Joe, I like the suggestion of using classes. Some RDF-using communities seem to be very class-heavy, others less so. The implications of lots of classes vs. a few classes still isn't clear to me in terms of how it affects practice, but clearly classes provide functionality that we  may not be used to exploiting.

On 1/7/15 8:49 AM, Joseph Kiegel wrote:
I agree with you that mapping BF to "constrained" (typed) RDA will be necessary and useful.

At the end of my message, I tried to make the point that this won't be possible.  I used classes but it is better to use properties instead.  Once you map rdam:reproductionOfManifestation to bf:reproduction and rdai:reproductionOfItem to bf:reproduction, you can't go back the other way. That is, bf:reproduction does not contain the information you need to choose the correct RDA property in the BF -> RDA mapping.  You no longer know whether you came from reproductionOfManifestation or reproductionOfItem.

I suspect that "mapping" is not the right term here, and maybe that's the issue. If you look at some of the recent presentations that Gordon has done,[1] you see that you can create relationships between terms, e.g. bf:reproduction is a super-property of rdam:reproductionOfManifestation and rdai:reproductionOfItem. You don't change the two RDA properties to bf:reproduction -- they stay what they are, and you navigate the relationship. That doesn't entirely solve the problem, because as is always the case with data it is very hard to go from less specific to more specific. However, I go back to an earlier question, which is: what do we need to do with this data, and under what circumstances do these differences matter? For example, if you have

resourceA a bf:Work .
resourceA bf:workTitle "Moby Dick" .
resourceA bf:creator http://..
resource7 a rdac:Work .
resourceA bf:language "ENG" .
resource8 a rdac:Expression .
resource8 rdae:language "ENG" .
resource8 rdae:expressionOf resource3 .
resource3 rdaw:workTitle "Moby Dick" .
resource3 rdaw:personalCreator http://...

You actually have a lot of information here. If this information exists in open linked data space, you can find resources that are in language ENG, and you have essentially the same (well, close to the same) data elements for the RDA and the BF descriptions, even though they are structured differently. In both you have access to the Work and Expression information. (This would be more easily explained with a diagram ;-))

As Gordon says, however, there may still be differences. bf:Work may not be one-to-one on *all* information with rdac:Work+rdac:Expression. But linked data is designed to be used across heterogeneous data, and allows for gaps and differences. It will probably be no less precise than any previous mappings that we did (e.g. MARC to Dublin Core - from 1100 data elements to 15!).

The question, therefore, is not "Can I map property1 to propertyZ" but "do I have the information I need?" This involves not just property definitions but the whole meaning provided by the graph.

This describes an open world usage, and doesn't touch on the question of what data our library system/closed world will use. There can be a considerable difference between the closed world and the open world, and many enterprise systems (banks, medical data...) export to the open world data that is very different from their internal view of their data. What I find unclear at the moment in library-land is: what we are designing for, and, once again, what do we expect to do with it?


kc
[1] http://www.slideshare.net/GordonDunsire









Joe

--------------------------------------------------
From: "Fallgren, Nancy (NIH/NLM) [E]" <[log in to unmask]>
Sent: Wednesday, January 07, 2015 8:08 AM
To: <[log in to unmask]>
Subject: Re: [BIBFRAME] Constrained vs unconstrained schemas

Hi All,

FWIW . . .
We are working with the "constrained" version (with a nod to Karen's comments re use of the term 'constrained') of RDA/RDF and mapping that to a BIBFRAME core vocabulary precisely because we don't know what a cataloging input UI will look like post-MARC or how BF will be generated from that input.  Since BF and RDA have different structures, our thinking is to use the "constrained" RDA/RDF so that the RDA data can be reconstructed easily and losslessly back into its WEMI entities structure from BF should that prove useful or necessary.

-Nancy

Nancy J. Fallgren
Metadata Specialist Librarian
Cataloging and Metadata Management Section
Technical Services Division
National Library of Medicine

[log in to unmask]

-----Original Message-----
From: Gordon Dunsire [mailto:[log in to unmask]COM]
Sent: Tuesday, January 06, 2015 7:42 AM
To: [log in to unmask]
Subject: Re: [BIBFRAME] Constrained vs unconstrained schemas

All

Many applications based on RDF data will need to know what type of thing is being described by a triple. An application can get that information implicitly, from the domain and range of the triple's property, or explicitly, from a separate triple stating the thing's type. There is no guarantee that such a type triple exists, or is connected to the local graph, or can be retrieved from the global graph.

The quality (effectiveness, efficiency, etc.) of these applications is likely to depend on the accuracy and completeness of entity typing. More sophisticated applications are likely to depend also on the semantic coherence of the results of typing.

Publishers of data based on specific ontologies should be able to choose whether to provide type triples implicitly or explicitly. Using properties constrained by domain and range allows implicit typing by applications intended to consume the data. The maintainers of the specific ontology are probably the best agents to provide data publishers and consumers with the RDF element sets for the constrained properties and, indeed, the type classes used to constrain them.

Publishing data using constrained properties does not prevent its use by applications that are simple, low-quality, or do not require entity typing.
Such applications may use RDF maps to dumb-down constrained properties to unconstrained versions, or simply ignore domains and ranges. The RDF maps may be local to the application, or provided by the maintainers of the constrained elements or some other agent.

I agree that the publishers of library data in RDF should be able to specify how it is intended to be used by libraries: this is a closed-world assumption. The BF model seems to be mainly influenced by the data currently used by library applications based on MARC21; the FRBR model reflects the functional requirements to support world-wide consensus on user tasks. I think both of these bases, data and users, are good indicators of the needs of future library applications. I therefore think it is a benefit that the BIBFRAME Initiative (BFI), IFLA, and the JSC for RDA are providing constrained RDF element sets for BF, FRBR, ISBD, and RDA. I also think the provision of unconstrained element sets is a good thing, together with mappings from constrained to unconstrained properties. I do not know whether BFI intends to publish unconstrained properties. I do know that the FRBR Review Group decided not to do so because of its plans to consolidate the FRBR, FRAD, and FRSAD models (now approaching completion), and that the ISBD Review Group has an unconstrained element set ready for publication in the near future with a corresponding map.

The JSC and ISBD Review Group have collaborated on a map between the ISBD and RDA elements [1]. The map, based on an updated version of the agreed element alignment [2] will be published in the next few weeks. It necessarily uses unconstrained properties to link well-formed ISBD and RDA data together, and was a stimulus to the development of the unconstrained ISBD element set. As noted in the pre-print cited by Karen, there is also a map between ISBD and FRBR classes which requires local semantics for "aspect" relationships [3].

I am not convinced that the assumption that RDA Work and RDA Expression are equivalent to/same as BF Work is a useful or valid one [4]. I think there may be similar problems with RDA Manifestation, RDA Item, and BF Instance.
The ISBD/RDA experience shows that careful consideration of implicit semantics in definitions and scope notes is required, as well as explicit semantics in domain, range, and sub-property relationships.

So I do not advise mapping either the constrained or unconstrained RDA properties to constrained BF properties without further clarification of the class relationships. It is ok to map constrained BF properties to unconstrained RDA properties. A full map between RDA and BF requires the use of unconstrained RDA and BF properties. And, by definition, a roundtrip from constrained to unconstrained to constrained is somewhat lossy (as well as incoherent).

I think we need further investigation of the relationship between the RDA/FRBR models and BF, probably best carried out by the JSC and BFI. And we need to test interoperability using orthodox RDA and BF data. Fortunately, we now have the beta of version 3 of RIMMF to create orthodox RDA data [5].
So perhaps we can do something useful with RDA and BF data after the Jane-athon [6].

Cheers

Gordon

[1] http://www.rda-jsc.org/docs/6JSC-Chair-4.pdf
[2]
http://www.ifla.org/files/assets/cataloguing/isbd/OtherDocumentation/ISBD2RD
A%20Alignment%20v1_1.pdf
[3]
http://www.ifla.org/files/assets/cataloguing/isbd/OtherDocumentation/resourc
e-wemi.pdf
[4] http://www.gordondunsire.com/pubs/pres/RDAMARCBIBFRAME.pptx
[5] http://www.rdaregistry.info/rimmf/index.html
[6] http://www.rdatoolkit.org/janeathon

If it is a camel, a weasel, and a whale, then it is a cloud (inferred from Hamlet, Act 3, Scene 2).


-----Original Message-----
From: Bibliographic Framework Transition Initiative Forum [mailto:[log in to unmask]GOV] On Behalf Of Joseph Kiegel
Sent: 05 January 2015 23:21
To: [log in to unmask]
Subject: Re: [BIBFRAME] Constrained vs unconstrained schemas

Thanks, this helps a lot.  I had viewed domains as more restrictive than they are.

I agree with your larger question that we need to understand the operations that will be performed on our data in RDF.  Perhaps we can't anticipate what other people will do, but we should be able to specify what libraries will do.


Joe

--------------------------------------------------
From: "Karen Coyle" <[log in to unmask]>
Sent: Monday, January 05, 2015 1:38 PM
To: <[log in to unmask]>
Subject: Re: [BIBFRAME] Constrained vs unconstrained schemas

Joseph, You might want to look at my blog post on RDF classes:

http://kcoyle.blogspot.com/2014/11/classes-in-rdf.html

and the article by Baker-Coyle-Petiya

http://kcoyle.net/LHTv32n4preprint.pdf

There are actually no "constraints" in RDF, just potential inferences.
The inferences are based on the stated domains and ranges of the
properties.
There are examples of this in the Baker et al article using RDA,
FRBRer and BIBFRAME. There is no conflict with a subject being
inferred as being an instance of more than one class as long as the
classes themselves are not declared as disjoint. (The article explains
this better than I can in an email. ) The documentation for RDA,
BIBFRAME and FRBRer all presents classes as determinants of data
structure. This, to me, is a common error in RDF development. That any
subject can be an instance of more than one class is necessary for the
RDF graph's flexibility, and should be proof that classes do not
constraint your data to a single graph structure.

The declared domains of properties only come into play if inferencing
is applied. A big question, therefore, is whether any inferencing will
be done at all over the data. The utility of, for example, the RDA
classes to me is that it allows you to do simple queries for
categories of triples, e.g. "give me all of the work triples for the
manifestation with this ISBN." Other than that you can ignore the fact
that domains have been declared if they don't serve your needs.

Your question, however, brings up a much larger question that I
haven't seen discussed anywhere, which is: what kinds of operations do
we expect to perform over library data in RDF? That question really
should be answered before domains and ranges are defined, because that
is the function of those capabilities of RDF.

kc











On 1/5/15 12:52 PM, Joseph Kiegel wrote:
A comparison of BIBFRAME and RDA in RDF (referred to below as RDA),
in an attempt to map RDA to BIBFRAME, raised the issue of constrained
vs unconstrained schemas.

The full set of RDA properties is constrained by the RDA classes of
Agent, Work, Expression, Manifestation and Item.  That is, each
property is related to a specific class when appropriate: e.g.
abridgementOfExpression and abridgementOfWork.  A parallel set of
properties has been created where the constraints of class are lifted:
e.g. abridgementOf.  This unconstrained version of RDA loses the
context of some properties but is intended to facilitate mapping to
schemas that do not use the FRBR model underlying RDA.

BIBFRAME is a constrained schema, but constrained by different classes:
Agent, Work, and Instance.  There is no unconstrained version of
BIBFRAME.

A mapping of RDA to BIBFRAME presents choices and challenges.

Is it better to use constrained RDA, which causes explicit conflicts
of
domain:  e.g. mapping rdam:reproductionOfManifestation to
bf:reproduction and rdai:reproductionOfItem to bf:reproduction?

Or is it better to use unconstrained RDA, which still has conflicts
(an unconstrained domain vs a constrained one in BIBFRAME): e.g.
mapping rdau:reproductionOf to bf:reproduction?

It is not obvious which is the better choice.  Although perhaps we
need both mappings, each with its own problems regarding original and
destination domains.

A corollary of the question is that any roundtrip RDA -> BF -> RDA is
lossy. If constrained RDA is used as a starting point, RDA classes
are lost in the mapping itself, and if unconstrained RDA is used,
classes are lost prior to mapping. Either way, RDA classes cannot be
recovered in a BF -> constrained RDA mapping.


--
Karen Coyle
[log in to unmask] http://kcoyle.net
m: +1-510-435-8234
skype: kcoylenet/+1-510-984-3600



--
Karen Coyle
[log in to unmask] http://kcoyle.net
m: +1-510-435-8234
skype: kcoylenet/+1-510-984-3600