Print

Print


Yes, my "map directly" approach is just a programmed crosswalk. Like most crosswalks it is lossy. 

I see this as an evolutionary drilling/refining operation. At this stage, oil exploration is as easy as shooting at squirrels. Someday we will want a lossless MARC/RDF so we can do the kind of fracking that Martynas is suggesting. The cost of fracking can be justified for long-tail information, but it's overkill for the short-tail.

Jeff

> -----Original Message-----
> From: Bibliographic Framework Transition Initiative Forum
> [mailto:[log in to unmask]] On Behalf Of Karen Coyle
> Sent: Monday, February 02, 2015 1:37 AM
> To: [log in to unmask]
> Subject: Re: [BIBFRAME] 2-tier BIBFRAME
> 
> Isn't there also the possibility of arriving at "tiers" through sub-
> classing/sub-properties? I did look at the result briefly (but need to dig
> further into the input structure that Jörg provides) and it looks awkward due
> to the great number of blank nodes. Those nodes mean that you need to carry
> around both "tiers" in order to have a complete set of triples for either
> tier. Perhaps that's just a function of the mechanism used here, but it's one
> of the things that already concerns me about BIBFRAME -- blank nodes make open
> world use more difficult.
> 
> One of the reasons I want to look at the MARC->RDF vocabulary is to see how it
> deals with some of the complexities in the MARC format -- complexities that
> might make the mapping either difficult or lossy.
> 
> Jeff, is the nature of your "map directly" basically a programmed crosswalk,
> like the MARC to BIBFRAME one? If so, it means that the two vocabularies don't
> exist in the same data store. That is one advantage that I see with Martynas'
> method (and that would be the case with the use of sub-class/property) -- that
> you have both vocabularies available at the same time, so any "loss" is
> temporary (e.g. to the individual SPARQL query).
> 
> kc
> 
> On 2/1/15 2:54 PM, Young,Jeff (OR) wrote:
> > I agree that you can't avoid 2 tiers, but there are other ways to
> conceptualize those tiers. You suggest mapping to a 1st tier RDF vocabulary
> and then using a triplestore and SPARQL to do the reconciliation/cleanup into
> a 2nd vocabulary. Another alternative is to map directly into the target
> vocabulary and then use Map/Reduce (possibly using SPARQL on transient
> subgraphs) to do probabilistic matching and cleanup. The 2nd way doesn't
> require a 2 tier vocabulary.
> >
> > Jeff
> >
> >> On Feb 1, 2015, at 3:30 PM, Martynas Jusevičius <[log in to unmask]>
> wrote:
> >>
> >> Jeff,
> >>
> >> I don't think you can avoid 2 tiers, since the lower one is MARC and
> >> the higher one is Linked Data. What I'm saying is that by doing the
> >> mapping indirectly and within RDF, the implementation will be much
> >> more progressive. And I don't even need to know MARC to see that.
> >>
> >> And you should definitely try out Dydra, which is a cloud triplestore:
> >> http://dydra.com
> >>
> >>> On Sun, Feb 1, 2015 at 9:00 PM, Young,Jeff (OR) <[log in to unmask]> wrote:
> >>> Martynas,
> >>>
> >>> I'm skeptical of the 2 tiered mapping approach, but I like the look
> >>> of the tool. :-)
> >>>
> >>> Jeff
> >>>
> >>>
> >>>> On Feb 1, 2015, at 8:57 AM, Martynas Jusevičius <[log in to unmask]>
> wrote:
> >>>>
> >>>> Hey again,
> >>>>
> >>>> I want to illustrate what I mean with the 2 tiers and the mapping
> >>>> between them with an example.
> >>>>
> >>>> I used one of the data samples from Jörg's link (about "Aristotle
> >>>> on mind and the senses") and created a SPARQL query:
> >>>>
> >>>> PREFIX field: <rdfmab:field#>
> >>>> PREFIX dct: <http://purl.org/dc/terms/> PREFIX xsd:
> >>>> <http://www.w3.org/2001/XMLSchema#>
> >>>> PREFIX bf: <http://bibframe.org/vocab/> PREFIX foaf:
> >>>> <http://xmlns.com/foaf/0.1/> PREFIX skos:
> >>>> <http://www.w3.org/2004/02/skos/core#>
> >>>>
> >>>> CONSTRUCT
> >>>> {
> >>>> ?work a bf:Work ;
> >>>>    dct:created ?createdDate ;
> >>>>    dct:modified ?modifiedDate ;
> >>>>    dct:language ?language ;
> >>>>    bf:title ?title ;
> >>>>    bf:creator [ a foaf:Person ; foaf:name ?creatorName ] ;
> >>>>    dct:subject [ a skos:Concept ; skos:prefLabel ?categoryLabel ] .
> >>>> }
> >>>> {
> >>>> ?work field:902__p|field:902__s ?categoryLabel .
> >>>> ?work field:002a_a ?createdString .
> >>>> BIND (STRDT(?createdString, xsd:date) AS ?createdDate) ?work
> >>>> field:003__a ?modifiedString .
> >>>> BIND (STRDT(?modifiedString, xsd:date) AS ?modifiedDate) ?work
> >>>> field:036a_a ?language .
> >>>> ?work field:101__a ?creatorName .
> >>>> ?work field:331__a ?title .
> >>>> }
> >>>>
> >>>> You can try running it here:
> >>>> http://graphity.dydra.com/graphity/marc-test/sparql#marcrdf2bibfram
> >>>> e
> >>>>
> >>>> This query is a declarative, platform-independent mapping between 2
> >>>> "tiers" (levels of abstraction) of bibliographic data:
> >>>> 1. syntactic MARC-RDF, in this case specified by Jörg 2. conceptual
> >>>> real-world representation, as a mix of BIBFRAME, SKOS, Dublin Core
> >>>> and other relevant vocabularies
> >>>>
> >>>> By no means I claim that this is an complete or semantically
> >>>> correct example, but I hope it gives a better idea of my suggestion.
> >>>> Feel free to expand and modify it.
> >>>>
> >>>> Martynas
> >>>> graphityhq.com
> >>>>
> >>>> On Sat, Jan 31, 2015 at 5:16 PM, [log in to unmask]
> >>>> <[log in to unmask]> wrote:
> >>>>> Because MARC is a key/value stream with a string-based encoding
> >>>>> semantics, this does not justify a "direct mapping" to RDF. This is
> problematic.
> >>>>>
> >>>>>  From an implementor's view, a correct migration to RDF means to
> >>>>> parse MARC records, map selected MARC-encoded values to
> >>>>> functions/objects, evaluate contextual information and more
> >>>>> semantic information from other sources (catalog codes, authority
> >>>>> files), and let the mapping functions create RDF graphs with the
> >>>>> transformed information in it, adding datatype information, links etc.
> >>>>>
> >>>>> I invented something similar to your bfm proposal internally back in
> 2010.
> >>>>>
> >>>>> https://wiki1.hbz-nrw.de/display/SEM/RDF-ISO2709+-+eine+RDF-Serial
> >>>>> isierung+fuer+ISO+2709-basierte+bibliografische+Formate+%28MARC%2C
> >>>>> +MAB%29
> >>>>>
> >>>>> When asked for documenting the format, I hesitated and tried to
> >>>>> describe it as an intermediate serialization format and called it
> >>>>> "RDF/ISO2709". But the very bad side effect was that librarians
> >>>>> who were not familiar with RDF and the semantics of RDF thought
> >>>>> RDF was just a wrapper mechanism like XML, they called RDF a
> >>>>> "format" and did not take it seriously as a modern graph model for the
> bibliographic data of future catalogs.
> >>>>>
> >>>>> Karen picked up the idea 2011 in
> >>>>>
> >>>>> http://lists.w3.org/Archives/Public/public-lld/2011Apr/0137.html
> >>>>>
> >>>>> So, from my personal experience, I do not recommend to propose a
> >>>>> MARC-centered "serialization only" Bibframe dialect. It will not
> >>>>> improve Bibframe or ease the migration, it will just add a
> >>>>> truncated RDF without links, without URIs, with another migration path.
> >>>>>
> >>>>> If Bibframe can be seen as the "one-size-fits-all" RDF model for
> >>>>> MARC, is another question. For much of the data I have, Bibframe
> >>>>> is not my first choice.
> >>>>>
> >>>>> Jörg
> >>>>>
> >>>>>
> >>>>> On Sat, Jan 31, 2015 at 1:14 PM, Martynas Jusevičius
> >>>>> <[log in to unmask]>
> >>>>> wrote:
> >>>>>> Jeff,
> >>>>>>
> >>>>>> there is one reality, but it can be described in many different ways.
> >>>>>> And yes, there should be a separate RDF vocabulary for each level.
> >>>>>>
> >>>>>> Here's a completely fictional example to illustrate what I mean:
> >>>>>>
> >>>>>> 1. MARC-syntax level
> >>>>>>
> >>>>>> _:record a bfm:Record ;
> >>>>>> bfm:recordType "Book" ;
> >>>>>> bfm:isbn "123456789" ;
> >>>>>> bfm:title "The Greatest Works" ;
> >>>>>> bfm:author1givenName "John" ;
> >>>>>> bfm:author1familyName "Johnson" ; bfm:author2givenName "Tom" ;
> >>>>>> bfm:author2familyName "Thompson" .
> >>>>>>
> >>>>>> 2. Linked Data level
> >>>>>>
> >>>>>> <books/123456789#this> a bld:Work, bldtypes:Book ; dct:title "The
> >>>>>> Greatest Works" ; bld:isbn "123456789" ; bld:authors
> >>>>>> (<persons/john-johnson#this> <persons/tom-thompson#this>) .
> >>>>>>
> >>>>>> <persons/john-johnson#this> a foaf:Person, bld:Author ;
> >>>>>> foaf:givenName "John" ; foaf:familyName "Johnson".
> >>>>>>
> >>>>>> <persons/tom-thompson#this> a foaf:Person, bld:Author ;
> >>>>>> foaf:givenName "Tom" ; foaf:familyName "Thompson".
> >>>>>>
> >>>>>>
> >>>>>> Both examples contain the same information, but it is encoded
> >>>>>> very differently. Clearly the Linked Data style is preferred, and
> >>>>>> the MARC vocabulary could in theory go away when there are no
> >>>>>> more legacy MARC systems to support.
> >>>>>>
> >>>>>> I haven't seen any actual MARC data, but if someone has a simple
> >>>>>> example, we could work on that.
> >>>>>>
> >>>>>> Martynas
> >>>>>>
> >>>>>>
> >>>>>> On Sat, Jan 31, 2015 at 4:21 AM, Jeff Young
> >>>>>> <[log in to unmask]>
> >>>>>> wrote:
> >>>>>>> Tim,
> >>>>>>>
> >>>>>>> The semantics behind MARC is based on reality. MARC cares (may)
> >>>>>>> too much about which names and codes should be used in various
> >>>>>>> structural positions, but there are real things lurking behind
> >>>>>>> those.
> >>>>>>>
> >>>>>>> Jeff
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Jan 30, 2015, at 9:58 PM, Tim Thompson <[log in to unmask]> wrote:
> >>>>>>>
> >>>>>>> Karen,
> >>>>>>>
> >>>>>>> Aren't the semantics behind MARC just the semantics of card
> >>>>>>> catalogs and ISBD, with its nine areas of bibliographic
> >>>>>>> description? ISBD has already been published by IFLA as a linked
> >>>>>>> data vocabulary
> >>>>>>> (http://metadataregistry.org/schema/show/id/25.html)--although,
> >>>>>>> sadly, they left out the punctuation ;-)
> >>>>>>>
> >>>>>>> Tim
> >>>>>>>
> >>>>>>> --
> >>>>>>> Tim A. Thompson
> >>>>>>> Metadata Librarian (Spanish/Portuguese Specialty) Princeton
> >>>>>>> University Library
> >>>>>>>
> >>>>>>> On Fri, Jan 30, 2015 at 9:01 PM, Young,Jeff (OR)
> >>>>>>> <[log in to unmask]>
> >>>>>>> wrote:
> >>>>>>>> What if it was two different vocabularies, rather than two
> >>>>>>>> different levels of abstraction?
> >>>>>>>>
> >>>>>>>> There is only one reality. A rose by any other name would smell
> >>>>>>>> as sweet.
> >>>>>>>> :-)
> >>>>>>>>
> >>>>>>>> Jeff
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> On Jan 30, 2015, at 8:02 PM, Martynas Jusevičius
> >>>>>>>>> <[log in to unmask]>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> Karen,
> >>>>>>>>>
> >>>>>>>>> lets call those specifications BM (BIBFRAME MARC) and BLD
> >>>>>>>>> (BIBFRAME Linked Data).
> >>>>>>>>>
> >>>>>>>>> What I meant is two different levels of abstractions, each
> >>>>>>>>> with its own vocabulary and semantics. And a mapping between
> >>>>>>>>> the two, for which SPARQL would be really convenient.
> >>>>>>>>>
> >>>>>>>>> In the 2-tier approach, these are the main tasks:
> >>>>>>>>> 1. convert MARC data to RDF at the syntax level (BM) 2. design
> >>>>>>>>> semantically correct bibliographic Linked Data structure
> >>>>>>>>> (BLD)
> >>>>>>>>> 3. define a mapping from BM to BLD
> >>>>>>>>>
> >>>>>>>>> So in that sense I don't think it is similar to profiles, as
> >>>>>>>>> profiles deal with a subset of properties, but they still come
> >>>>>>>>> from the same vocabulary.
> >>>>>>>>>
> >>>>>>>>> A somewhat similar approach is W3C work on relational databases:
> >>>>>>>>> 1. direct mapping to RDF:
> >>>>>>>>> http://www.w3.org/TR/rdb-direct-mapping/
> >>>>>>>>> 2. customizable declarative mapping to RDF:
> >>>>>>>>> http://www.w3.org/TR/r2rml/
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Martynas
> >>>>>>>>> graphityhq.com
> >>>>>>>>>
> >>>>>>>>>> On Fri, Jan 30, 2015 at 10:15 PM, Karen Coyle
> >>>>>>>>>> <[log in to unmask]>
> >>>>>>>>>> wrote:
> >>>>>>>>>> Martynas,
> >>>>>>>>>>
> >>>>>>>>>> I agree that the requirement to accommodate legacy MARC is a
> >>>>>>>>>> hindrance to the development of a more forward-looking RDF
> >>>>>>>>>> vocabulary. I think that your suggest of using SPARQL
> >>>>>>>>>> CONSTRUCT queries is not unlike the concepts of selected
> >>>>>>>>>> views or application profiles -- where you work with
> >>>>>>>>>> different subsets of a fuller data store, based on need.
> >>>>>>>>>>
> >>>>>>>>>> I wonder, however, how an RDF model designed "from scratch"
> >>>>>>>>>> would interact with a model designed to replicate MARC. I
> >>>>>>>>>> know that people find this to be way too far out there, but I
> >>>>>>>>>> honestly don't see how we'll get to "real"
> >>>>>>>>>> RDF
> >>>>>>>>>> if we hang on not only to MARC but to the cataloging rules we
> >>>>>>>>>> have today (including RDA). We'd have to start creating
> >>>>>>>>>> natively RDF data, and until we understand what that means
> >>>>>>>>>> without burdening ourselves with pre-RDF cataloging concepts,
> >>>>>>>>>> it's hard to know what that means.
> >>>>>>>>>>
> >>>>>>>>>> All that to say that I would love to see a test
> >>>>>>>>>> implementation of your idea!
> >>>>>>>>>>
> >>>>>>>>>> kc
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On 1/30/15 9:03 AM, Martynas Jusevičius wrote:
> >>>>>>>>>>
> >>>>>>>>>> Hey,
> >>>>>>>>>>
> >>>>>>>>>> after following discussions and developments in the BIBFRAME
> >>>>>>>>>> space, it seems to me that it tries to be too many things for
> >>>>>>>>>> too many people.
> >>>>>>>>>>
> >>>>>>>>>> I think many of the problems stem from the fact that (to my
> >>>>>>>>>> understanding) BIBFRAME is supposed to accommodate legacy
> >>>>>>>>>> MARC data and be the next-generation solution for bibliographic
> Linked Data.
> >>>>>>>>>> Attempting to address both cases, it fails to address either
> >>>>>>>>>> of them well.
> >>>>>>>>>>
> >>>>>>>>>> In my opinion, a possible solution could be to have 2 tiers
> >>>>>>>>>> of RDF
> >>>>>>>>>> vocabularies:
> >>>>>>>>>> - a lower-level one that precisely captures the semantics of
> >>>>>>>>>> MARC
> >>>>>>>>>> - a higher-level one that is designed from scratch for
> >>>>>>>>>> bibliographic Linked Data
> >>>>>>>>>>
> >>>>>>>>>> The conversion between the two (or at least from the lower to
> >>>>>>>>>> the higher level) could be expressed simply as SPARQL CONSTRUCT
> queries.
> >>>>>>>>>>
> >>>>>>>>>> Any thoughts?
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Martynas
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> Karen Coyle
> >>>>>>>>>> [log in to unmask] http://kcoyle.net
> >>>>>>>>>> m: +1-510-435-8234
> >>>>>>>>>> skype: kcoylenet/+1-510-984-3600
> >>>>>
> 
> --
> Karen Coyle
> [log in to unmask] http://kcoyle.net
> m: +1-510-435-8234
> skype: kcoylenet/+1-510-984-3600