Print

Print


Karen,

thanks for the follow up.

I don't think you can solve this with subclasses/subproperties. What is needed is a transformation mechanism. For XML it is XSLT, for RDF it is SPARQL CONSTRUCT. And its main advantage is being standard and supported by all triplestores and other RDF tools.

In my mind the implementation starts to take shape:
1. MARC-XML to MARC-RDF via XSLT transformation to RDF/XML. That could be loss-less and reversible.
2. MARC-RDF to BIBFRAME plus other RDF vocabularies via SPARQL CONSTRUCT. That would be most likely lossy and one-way, as you say. The complete query might be several thousand lines long, but that is still more manageable than multiple imperative implementations.

I think BIBFRAME tries to cover too much even in the Linked Data layer. For example, taxonomies and categorization are not specific to bibliographic data and are already covered by established vocabularies such as SKOS. I don't see why they should also be included in BF, unless they were added in support for MARC, in which case it is bad design.

BIBFRAME should be the glue between different Linked Data vocabularies relevant to bibliographic data, and not a blanket to cover them all.


Martynas

On Mon, Feb 2, 2015 at 3:40 PM, Karen Coyle <[log in to unmask]> wrote:
Given that Joerg's data is similar to MARC but actually a different format, I've located the MARCXML for that same book:

http://lccn.loc.gov/77009389/marcxml

This translates to BIBFRAME as:

http://bibframe.org/resources/ZVC1422885084/bibframe.n3
http://bibframe.org/resources/ZVC1422885084/bibframe.rdf (rdf/xml)

We don't have an RDF version of the MARC, as Joerg does for MAB, but it just might be possible to mock one up. For the fields and subfields one can generate an identifier using the tag and subfield code ("http://example.com/245a") but there also needs to be a way to include the indicator values since these can actually change the meaning of the field. This is an example of a field with one tag, but different meanings, as encoded in the indicators:

  024 Other standard identifier

  • 0 - International Standard Recording Code
  • 1 - Universal Product Code
  • 2 - International Standard Music Number
  • 3 - International Article Number
  • 4 - Serial Item and Contribution Identifier
  • 7 - Source specified in subfield $2
  • 8 - Unspecified type of standard number or code

The indicators modify the semantics of the data that follows in the field.

The other complication, which I'm sure that the LC folks are well aware of, is that in most cases (but not all, a qualification that almost always has to be made with MARC), subfields in the same field must be kept together as a unit (and many fields are repeatable):

700 1_  |a Lloyd, G. E. R.  |q (Geoffrey Ernest Richard),  |d 1933-
700 1_  |a Owen, G. E. L.  |q (Gwilym Ellis Lane),  |d 1922-

In the database that I created of MARC values (now a few years out of date) I came up with well over 2K separate properties, and 1K coded values.

Note that BIBFRAME currently has something like 400 properties, so the conversion from MARC to BF simply has to be lossy. And, as I mentioned earlier, this weekend even more data elements were added to the MARC format at a meeting at ALA held by the same LC office that is developing BIBFRAME.

Now I'm depressed and it isn't even 7am yet. I'll just have another cup of tea.

kc


On 2/1/15 5:45 AM, Martynas Jusevičius wrote:
Hey again,

I want to illustrate what I mean with the 2 tiers and the mapping
between them with an example.

I used one of the data samples from Jörg's link (about "Aristotle on
mind and the senses") and created a SPARQL query:

PREFIX field: <rdfmab:field#>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX bf: <http://bibframe.org/vocab/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

CONSTRUCT
{
  ?work a bf:Work ;
    dct:created ?createdDate ;
    dct:modified ?modifiedDate ;
    dct:language ?language ;
    bf:title ?title ;
    bf:creator [ a foaf:Person ; foaf:name ?creatorName ] ;
    dct:subject [ a skos:Concept ; skos:prefLabel ?categoryLabel ] .
}
{
  ?work field:902__p|field:902__s ?categoryLabel .
  ?work field:002a_a ?createdString .
  BIND (STRDT(?createdString, xsd:date) AS ?createdDate)
  ?work field:003__a ?modifiedString .
  BIND (STRDT(?modifiedString, xsd:date) AS ?modifiedDate)
  ?work field:036a_a ?language .
  ?work field:101__a ?creatorName .
  ?work field:331__a ?title .
}

You can try running it here:
http://graphity.dydra.com/graphity/marc-test/sparql#marcrdf2bibframe

This query is a declarative, platform-independent mapping between 2
"tiers" (levels of abstraction) of bibliographic data:
1. syntactic MARC-RDF, in this case specified by Jörg
2. conceptual real-world representation, as a mix of BIBFRAME, SKOS,
Dublin Core and other relevant vocabularies

By no means I claim that this is an complete or semantically correct
example, but I hope it gives a better idea of my suggestion.
Feel free to expand and modify it.

Martynas
graphityhq.com

On Sat, Jan 31, 2015 at 5:16 PM, [log in to unmask]
<[log in to unmask]> wrote:
Because MARC is a key/value stream with a string-based encoding semantics,
this does not justify a "direct mapping" to RDF. This is problematic.

From an implementor's view, a correct migration to RDF means to parse MARC
records, map selected MARC-encoded values to functions/objects, evaluate
contextual information and more semantic information from other sources
(catalog codes, authority files), and let the mapping functions create RDF
graphs with the transformed information in it, adding datatype information,
links etc.

I invented something similar to your bfm proposal internally back in 2010.

https://wiki1.hbz-nrw.de/display/SEM/RDF-ISO2709+-+eine+RDF-Serialisierung+fuer+ISO+2709-basierte+bibliografische+Formate+%28MARC%2C+MAB%29

When asked for documenting the format, I hesitated and tried to describe it
as an intermediate serialization format and called it "RDF/ISO2709". But the
very bad side effect was that librarians who were not familiar with RDF and
the semantics of RDF thought RDF was just a wrapper mechanism like XML, they
called RDF a "format" and did not take it seriously as a modern graph model
for the bibliographic data of future catalogs.

Karen picked up the idea 2011 in

http://lists.w3.org/Archives/Public/public-lld/2011Apr/0137.html

So, from my personal experience, I do not recommend to propose a
MARC-centered "serialization only" Bibframe dialect. It will not improve
Bibframe or ease the migration, it will just add a truncated RDF without
links, without URIs, with another migration path.

If Bibframe can be seen as the "one-size-fits-all" RDF model for MARC, is
another question. For much of the data I have, Bibframe is not my first
choice.

Jörg


On Sat, Jan 31, 2015 at 1:14 PM, Martynas Jusevičius <[log in to unmask]>
wrote:
Jeff,

there is one reality, but it can be described in many different ways.
And yes, there should be a separate RDF vocabulary for each level.

Here's a completely fictional example to illustrate what I mean:

1. MARC-syntax level

_:record a bfm:Record ;
  bfm:recordType "Book" ;
  bfm:isbn "123456789" ;
  bfm:title "The Greatest Works" ;
  bfm:author1givenName "John" ;
  bfm:author1familyName "Johnson" ;
  bfm:author2givenName "Tom" ;
  bfm:author2familyName "Thompson" .

2. Linked Data level

<books/123456789#this> a bld:Work, bldtypes:Book ;
  dct:title "The Greatest Works" ;
  bld:isbn "123456789" ;
  bld:authors (<persons/john-johnson#this> <persons/tom-thompson#this>) .

<persons/john-johnson#this> a foaf:Person, bld:Author ;
  foaf:givenName "John" ;
  foaf:familyName "Johnson".

<persons/tom-thompson#this> a foaf:Person, bld:Author ;
  foaf:givenName "Tom" ;
  foaf:familyName "Thompson".


Both examples contain the same information, but it is encoded very
differently. Clearly the Linked Data style is preferred, and the MARC
vocabulary could in theory go away when there are no more legacy MARC
systems to support.

I haven't seen any actual MARC data, but if someone has a simple
example, we could work on that.

Martynas


On Sat, Jan 31, 2015 at 4:21 AM, Jeff Young <[log in to unmask]>
wrote:
Tim,

The semantics behind MARC is based on reality. MARC cares (may) too much
about which names and codes should be used in various structural
positions,
but there are real things lurking behind those.

Jeff



On Jan 30, 2015, at 9:58 PM, Tim Thompson <[log in to unmask]> wrote:

Karen,

Aren't the semantics behind MARC just the semantics of card catalogs and
ISBD, with its nine areas of bibliographic description? ISBD has already
been published by IFLA as a linked data vocabulary
(http://metadataregistry.org/schema/show/id/25.html)--although, sadly,
they
left out the punctuation ;-)

Tim

--
Tim A. Thompson
Metadata Librarian (Spanish/Portuguese Specialty)
Princeton University Library

On Fri, Jan 30, 2015 at 9:01 PM, Young,Jeff (OR) <[log in to unmask]>
wrote:
What if it was two different vocabularies, rather than two different
levels of abstraction?

There is only one reality. A rose by any other name would smell as
sweet.
:-)

Jeff



On Jan 30, 2015, at 8:02 PM, Martynas Jusevičius
<[log in to unmask]>
wrote:

Karen,

lets call those specifications BM (BIBFRAME MARC) and BLD (BIBFRAME
Linked Data).

What I meant is two different levels of abstractions, each with its
own vocabulary and semantics. And a mapping between the two, for
which
SPARQL would be really convenient.

In the 2-tier approach, these are the main tasks:
1. convert MARC data to RDF at the syntax level (BM)
2. design semantically correct bibliographic Linked Data structure
(BLD)
3. define a mapping from BM to BLD

So in that sense I don't think it is similar to profiles, as profiles
deal with a subset of properties, but they still come from the same
vocabulary.

A somewhat similar approach is W3C work on relational databases:
1. direct mapping to RDF: http://www.w3.org/TR/rdb-direct-mapping/
2. customizable declarative mapping to RDF:
http://www.w3.org/TR/r2rml/


Martynas
graphityhq.com

On Fri, Jan 30, 2015 at 10:15 PM, Karen Coyle <[log in to unmask]>
wrote:
Martynas,

I agree that the requirement to accommodate legacy MARC is a
hindrance
to
the development of a more forward-looking RDF vocabulary. I think
that
your
suggest of using SPARQL CONSTRUCT queries is not unlike the concepts
of
selected views or application profiles -- where you work with
different
subsets of a fuller data store, based on need.

I wonder, however, how an RDF model designed "from scratch" would
interact
with a model designed to replicate MARC. I know that people find
this
to be
way too far out there, but I honestly don't see how we'll get to
"real"
RDF
if we hang on not only to MARC but to the cataloging rules we have
today
(including RDA). We'd have to start creating natively RDF data, and
until we
understand what that means without burdening ourselves with pre-RDF
cataloging concepts, it's hard to know what that means.

All that to say that I would love to see a test implementation of
your
idea!

kc


On 1/30/15 9:03 AM, Martynas Jusevičius wrote:

Hey,

after following discussions and developments in the BIBFRAME space,
it
seems to me that it tries to be too many things for too many people.

I think many of the problems stem from the fact that (to my
understanding) BIBFRAME is supposed to accommodate legacy MARC data
and be the next-generation solution for bibliographic Linked Data.
Attempting to address both cases, it fails to address either of them
well.

In my opinion, a possible solution could be to have 2 tiers of RDF
vocabularies:
- a lower-level one that precisely captures the semantics of MARC
- a higher-level one that is designed from scratch for bibliographic
Linked
Data

The conversion between the two (or at least from the lower to the
higher level) could be expressed simply as SPARQL CONSTRUCT queries.

Any thoughts?


Martynas


--
Karen Coyle
[log in to unmask] http://kcoyle.net
m: +1-510-435-8234
skype: kcoylenet/+1-510-984-3600

          

      

-- 
Karen Coyle
[log in to unmask] http://kcoyle.net
m: +1-510-435-8234
skype: kcoylenet/+1-510-984-3600