Print

Print


Karen,

thanks for the follow up.

I don't think you can solve this with subclasses/subproperties. What is
needed is a transformation mechanism. For XML it is XSLT, for RDF it is
SPARQL CONSTRUCT. And its main advantage is being standard and supported by
all triplestores and other RDF tools.

In my mind the implementation starts to take shape:
1. MARC-XML to MARC-RDF via XSLT transformation to RDF/XML. That could be
loss-less and reversible.
2. MARC-RDF to BIBFRAME plus other RDF vocabularies via SPARQL CONSTRUCT.
That would be most likely lossy and one-way, as you say. The complete query
might be several thousand lines long, but that is still more manageable
than multiple imperative implementations.

I think BIBFRAME tries to cover too much even in the Linked Data layer. For
example, taxonomies and categorization are not specific to bibliographic
data and are already covered by established vocabularies such as SKOS. I
don't see why they should also be included in BF, unless they were added in
support for MARC, in which case it is bad design.

BIBFRAME should be the glue between different Linked Data vocabularies
relevant to bibliographic data, and not a blanket to cover them all.


Martynas

On Mon, Feb 2, 2015 at 3:40 PM, Karen Coyle <[log in to unmask]> wrote:

>  Given that Joerg's data is similar to MARC but actually a different
> format, I've located the MARCXML for that same book:
>
> http://lccn.loc.gov/77009389/marcxml
>
> This translates to BIBFRAME as:
>
> http://bibframe.org/resources/ZVC1422885084/bibframe.n3
> http://bibframe.org/resources/ZVC1422885084/bibframe.rdf (rdf/xml)
>
> We don't have an RDF version of the MARC, as Joerg does for MAB, but it
> just might be possible to mock one up. For the fields and subfields one can
> generate an identifier using the tag and subfield code (
> "http://example.com/245a" <http://example.com/245a>) but there also needs
> to be a way to include the indicator values since these can actually change
> the meaning of the field. This is an example of a field with one tag, but
> different meanings, as encoded in the indicators:
>
>   024 Other standard identifier
>
>    - 0 - International Standard Recording Code
>    - 1 - Universal Product Code
>    - 2 - International Standard Music Number
>    - 3 - International Article Number
>    - 4 - Serial Item and Contribution Identifier
>    - 7 - Source specified in subfield $2
>    - 8 - Unspecified type of standard number or code
>
> The indicators modify the semantics of the data that follows in the field.
>
> The other complication, which I'm sure that the LC folks are well aware
> of, is that in most cases (but not all, a qualification that almost always
> has to be made with MARC), subfields in the same field must be kept
> together as a unit (and many fields are repeatable):
>
>   700 1_  |*a* Lloyd, G. E. R.  |*q* (Geoffrey Ernest Richard),  |*d*
> 1933-  700 1_  |*a* Owen, G. E. L.  |*q* (Gwilym Ellis Lane),  |*d* 1922-
>
> In the database that I created of MARC values (now a few years out of
> date) I came up with well over 2K separate properties, and 1K coded values.
>  Note that BIBFRAME currently has something like 400 properties, so the
> conversion from MARC to BF simply has to be lossy. And, as I mentioned
> earlier, this weekend even more data elements were added to the MARC format
> at a meeting at ALA held by the same LC office that is developing BIBFRAME.
>
> Now I'm depressed and it isn't even 7am yet. I'll just have another cup of
> tea.
>
> kc
>
>
> On 2/1/15 5:45 AM, Martynas Jusevičius wrote:
>
> Hey again,
>
> I want to illustrate what I mean with the 2 tiers and the mapping
> between them with an example.
>
> I used one of the data samples from Jörg's link (about "Aristotle on
> mind and the senses") and created a SPARQL query:
>
> PREFIX field: <rdfmab:field#>
> PREFIX dct: <http://purl.org/dc/terms/> <http://purl.org/dc/terms/>
> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> <http://www.w3.org/2001/XMLSchema#>
> PREFIX bf: <http://bibframe.org/vocab/> <http://bibframe.org/vocab/>
> PREFIX foaf: <http://xmlns.com/foaf/0.1/> <http://xmlns.com/foaf/0.1/>
> PREFIX skos: <http://www.w3.org/2004/02/skos/core#> <http://www.w3.org/2004/02/skos/core#>
>
> CONSTRUCT
> {
>   ?work a bf:Work ;
>     dct:created ?createdDate ;
>     dct:modified ?modifiedDate ;
>     dct:language ?language ;
>     bf:title ?title ;
>     bf:creator [ a foaf:Person ; foaf:name ?creatorName ] ;
>     dct:subject [ a skos:Concept ; skos:prefLabel ?categoryLabel ] .
> }
> {
>   ?work field:902__p|field:902__s ?categoryLabel .
>   ?work field:002a_a ?createdString .
>   BIND (STRDT(?createdString, xsd:date) AS ?createdDate)
>   ?work field:003__a ?modifiedString .
>   BIND (STRDT(?modifiedString, xsd:date) AS ?modifiedDate)
>   ?work field:036a_a ?language .
>   ?work field:101__a ?creatorName .
>   ?work field:331__a ?title .
> }
>
> You can try running it here:http://graphity.dydra.com/graphity/marc-test/sparql#marcrdf2bibframe
>
> This query is a declarative, platform-independent mapping between 2
> "tiers" (levels of abstraction) of bibliographic data:
> 1. syntactic MARC-RDF, in this case specified by Jörg
> 2. conceptual real-world representation, as a mix of BIBFRAME, SKOS,
> Dublin Core and other relevant vocabularies
>
> By no means I claim that this is an complete or semantically correct
> example, but I hope it gives a better idea of my suggestion.
> Feel free to expand and modify it.
>
> Martynasgraphityhq.com
>
> On Sat, Jan 31, 2015 at 5:16 PM, [log in to unmask]<[log in to unmask]> <[log in to unmask]> wrote:
>
>  Because MARC is a key/value stream with a string-based encoding semantics,
> this does not justify a "direct mapping" to RDF. This is problematic.
>
> From an implementor's view, a correct migration to RDF means to parse MARC
> records, map selected MARC-encoded values to functions/objects, evaluate
> contextual information and more semantic information from other sources
> (catalog codes, authority files), and let the mapping functions create RDF
> graphs with the transformed information in it, adding datatype information,
> links etc.
>
> I invented something similar to your bfm proposal internally back in 2010.
> https://wiki1.hbz-nrw.de/display/SEM/RDF-ISO2709+-+eine+RDF-Serialisierung+fuer+ISO+2709-basierte+bibliografische+Formate+%28MARC%2C+MAB%29
>
> When asked for documenting the format, I hesitated and tried to describe it
> as an intermediate serialization format and called it "RDF/ISO2709". But the
> very bad side effect was that librarians who were not familiar with RDF and
> the semantics of RDF thought RDF was just a wrapper mechanism like XML, they
> called RDF a "format" and did not take it seriously as a modern graph model
> for the bibliographic data of future catalogs.
>
> Karen picked up the idea 2011 in
> http://lists.w3.org/Archives/Public/public-lld/2011Apr/0137.html
>
> So, from my personal experience, I do not recommend to propose a
> MARC-centered "serialization only" Bibframe dialect. It will not improve
> Bibframe or ease the migration, it will just add a truncated RDF without
> links, without URIs, with another migration path.
>
> If Bibframe can be seen as the "one-size-fits-all" RDF model for MARC, is
> another question. For much of the data I have, Bibframe is not my first
> choice.
>
> Jörg
>
>
> On Sat, Jan 31, 2015 at 1:14 PM, Martynas Jusevičius <[log in to unmask]> <[log in to unmask]>
> wrote:
>
>  Jeff,
>
> there is one reality, but it can be described in many different ways.
> And yes, there should be a separate RDF vocabulary for each level.
>
> Here's a completely fictional example to illustrate what I mean:
>
> 1. MARC-syntax level
>
> _:record a bfm:Record ;
>   bfm:recordType "Book" ;
>   bfm:isbn "123456789" ;
>   bfm:title "The Greatest Works" ;
>   bfm:author1givenName "John" ;
>   bfm:author1familyName "Johnson" ;
>   bfm:author2givenName "Tom" ;
>   bfm:author2familyName "Thompson" .
>
> 2. Linked Data level
>
> <books/123456789#this> a bld:Work, bldtypes:Book ;
>   dct:title "The Greatest Works" ;
>   bld:isbn "123456789" ;
>   bld:authors (<persons/john-johnson#this> <persons/tom-thompson#this>) .
>
> <persons/john-johnson#this> a foaf:Person, bld:Author ;
>   foaf:givenName "John" ;
>   foaf:familyName "Johnson".
>
> <persons/tom-thompson#this> a foaf:Person, bld:Author ;
>   foaf:givenName "Tom" ;
>   foaf:familyName "Thompson".
>
>
> Both examples contain the same information, but it is encoded very
> differently. Clearly the Linked Data style is preferred, and the MARC
> vocabulary could in theory go away when there are no more legacy MARC
> systems to support.
>
> I haven't seen any actual MARC data, but if someone has a simple
> example, we could work on that.
>
> Martynas
>
>
> On Sat, Jan 31, 2015 at 4:21 AM, Jeff Young <[log in to unmask]> <[log in to unmask]>
> wrote:
>
>  Tim,
>
> The semantics behind MARC is based on reality. MARC cares (may) too much
> about which names and codes should be used in various structural
> positions,
> but there are real things lurking behind those.
>
> Jeff
>
>
>
> On Jan 30, 2015, at 9:58 PM, Tim Thompson <[log in to unmask]> <[log in to unmask]> wrote:
>
> Karen,
>
> Aren't the semantics behind MARC just the semantics of card catalogs and
> ISBD, with its nine areas of bibliographic description? ISBD has already
> been published by IFLA as a linked data vocabulary
> (http://metadataregistry.org/schema/show/id/25.html)--although, sadly,
> they
> left out the punctuation ;-)
>
> Tim
>
> --
> Tim A. Thompson
> Metadata Librarian (Spanish/Portuguese Specialty)
> Princeton University Library
>
> On Fri, Jan 30, 2015 at 9:01 PM, Young,Jeff (OR) <[log in to unmask]> <[log in to unmask]>
> wrote:
>
>  What if it was two different vocabularies, rather than two different
> levels of abstraction?
>
> There is only one reality. A rose by any other name would smell as
> sweet.
> :-)
>
> Jeff
>
>
>
>
>  On Jan 30, 2015, at 8:02 PM, Martynas Jusevičius<[log in to unmask]> <[log in to unmask]>
> wrote:
>
> Karen,
>
> lets call those specifications BM (BIBFRAME MARC) and BLD (BIBFRAME
> Linked Data).
>
> What I meant is two different levels of abstractions, each with its
> own vocabulary and semantics. And a mapping between the two, for
> which
> SPARQL would be really convenient.
>
> In the 2-tier approach, these are the main tasks:
> 1. convert MARC data to RDF at the syntax level (BM)
> 2. design semantically correct bibliographic Linked Data structure
> (BLD)
> 3. define a mapping from BM to BLD
>
> So in that sense I don't think it is similar to profiles, as profiles
> deal with a subset of properties, but they still come from the same
> vocabulary.
>
> A somewhat similar approach is W3C work on relational databases:
> 1. direct mapping to RDF: http://www.w3.org/TR/rdb-direct-mapping/
> 2. customizable declarative mapping to RDF:http://www.w3.org/TR/r2rml/
>
>
> Martynasgraphityhq.com
>
>  On Fri, Jan 30, 2015 at 10:15 PM, Karen Coyle <[log in to unmask]> <[log in to unmask]>
> wrote:
> Martynas,
>
> I agree that the requirement to accommodate legacy MARC is a
> hindrance
> to
> the development of a more forward-looking RDF vocabulary. I think
> that
> your
> suggest of using SPARQL CONSTRUCT queries is not unlike the concepts
> of
> selected views or application profiles -- where you work with
> different
> subsets of a fuller data store, based on need.
>
> I wonder, however, how an RDF model designed "from scratch" would
> interact
> with a model designed to replicate MARC. I know that people find
> this
> to be
> way too far out there, but I honestly don't see how we'll get to
> "real"
> RDF
> if we hang on not only to MARC but to the cataloging rules we have
> today
> (including RDA). We'd have to start creating natively RDF data, and
> until we
> understand what that means without burdening ourselves with pre-RDF
> cataloging concepts, it's hard to know what that means.
>
> All that to say that I would love to see a test implementation of
> your
> idea!
>
> kc
>
>
> On 1/30/15 9:03 AM, Martynas Jusevičius wrote:
>
> Hey,
>
> after following discussions and developments in the BIBFRAME space,
> it
> seems to me that it tries to be too many things for too many people.
>
> I think many of the problems stem from the fact that (to my
> understanding) BIBFRAME is supposed to accommodate legacy MARC data
> and be the next-generation solution for bibliographic Linked Data.
> Attempting to address both cases, it fails to address either of them
> well.
>
> In my opinion, a possible solution could be to have 2 tiers of RDF
> vocabularies:
> - a lower-level one that precisely captures the semantics of MARC
> - a higher-level one that is designed from scratch for bibliographic
> Linked
> Data
>
> The conversion between the two (or at least from the lower to the
> higher level) could be expressed simply as SPARQL CONSTRUCT queries.
>
> Any thoughts?
>
>
> Martynas
>
>
> --
> Karen [log in to unmask] http://kcoyle.net
> m: +1-510-435-8234
> skype: kcoylenet/+1-510-984-3600
>
>
> --
> Karen [log in to unmask] http://kcoyle.net
> m: +1-510-435-8234
> skype: kcoylenet/+1-510-984-3600
>
>