bf has some very complex use cases:

    - it is the inheritor of MARC and will be expected to find a way of representing that data

    - it will need to represent data created according to our latest set of cataloging rules (in this case, RDA)

    - it should provide a light, extensible framework for representing all the data library patrons may have interest in (which is virtually everything) and all of the above as natively in RDF as possible

At some point we have to acknowledge that not all these can be accommodated equally as well and compromises will need to be made.  But if the point of bf is to integrate this data into the web via RDF, it seems we should compromise in this aspect least.  Otherwise what is the point?  I think the conversations of the past few weeks have been very helpful in this regard.


On 8/1/2014 9:49 AM, [log in to unmask] wrote:
[log in to unmask]" type="cite">
Thanks, Rob.

But where is the "radical" thought which caught my eye? ;)

Here, the ship hasn't sailed yet. 

Let's be radical: for my future work with bibliographical data, I will ignore systems that do not support the distinction of core data with RDF statements that can be processed by a machine, and the descriptive data, required for presentational services, with languages and rules how to describe data, e.g. on the web.

Let's drop all legacy OPACs and discovery systems now. Cataloging of URLs  - that's where it all started. The mix of all kinds of control and descriptive "web data" in the catalog. 

It's not "MARC must die". It's "Bad data without clear semantics must die".

Just to add one minor thing, beside RDFa,it is also possible to add JSON-LD into HTML, Google is using that:


On Fri, Aug 1, 2014 at 6:18 PM, Robert Sanderson <[log in to unmask]> wrote:

Dear all,

In my experience, RDF and Linked Data can do both presentation based information (eg here is content to present directly to the user, without semantics eg [1]) and it can do semantic, descriptive information (here is a rich description of the resource, say a book or annotation eg [2]) but both at once is very challenging without simply repeating everything in a for-machines way and a for-humans way as per the current titleStatement, providerStatement, and one assumes authorStatement, subjectStatement, etc.  

Here are two radical ideas, for which the boat has probably long since sailed, but I'll throw them out there regardless.

1. Don't try to mix them up.  Have two completely separate descriptions, where one is intended for humans to read, and the other is intended for machines to reason upon and search.  A machine will only ever throw a transcribed string through to the user, so make it easy for them to do that by separating the non-semantic information from the semantic information, with links between them. 

2.  Mix them up using the appropriate technology: HTML + RDFA.  Instead of thinking about triples for everything, instead create the HTML that you want the user to see.  Then annotate that HTML with RDFA properties to add the semantics into the record (and really a record now, not a graph).  This way there's only one record to maintain that has both, but uses presentation technology for presenting things to users, and semantic technology for enabling machines to understand the information.

Basically -- use the right tools for the job.  RDF has a hard time representing transcriptions outside of non-semantic strings because it was never intended to do that.  Order in RDF is a complete pain, because a graph is inherently unordered, but there are very real use cases that require order.  On the other hand, RDF is fantastic for controlled data as that is precisely its intended usage.  We should make the most appropriate use of the tools that we have available to us, rather than treating everything as a nail.



[1].  The IIIF Presentation API is focused on this approach of giving information intended for a client to display, while still being useful linked data by referencing existing semantic descriptions and following REST and JSON-LD.
[2].  The Open Annotation work is a rich data model that provides semantics for web annotation, but says almost nothing about presentation.

Rob Sanderson
Technology Collaboration Facilitator
Digital Library Systems and Services
Stanford, CA 94305

Philip E. Schreur
Head, Metadata Department
Stanford University
650-725-1120 (fax)