Print

Print


I agree that you can't avoid 2 tiers, but there are other ways to conceptualize those tiers. You suggest mapping to a 1st tier RDF vocabulary and then using a triplestore and SPARQL to do the reconciliation/cleanup into a 2nd vocabulary. Another alternative is to map directly into the target vocabulary and then use Map/Reduce (possibly using SPARQL on transient subgraphs) to do probabilistic matching and cleanup. The 2nd way doesn't require a 2 tier vocabulary.

Jeff

> On Feb 1, 2015, at 3:30 PM, Martynas Jusevičius <[log in to unmask]> wrote:
> 
> Jeff,
> 
> I don't think you can avoid 2 tiers, since the lower one is MARC and
> the higher one is Linked Data. What I'm saying is that by doing the
> mapping indirectly and within RDF, the implementation will be much
> more progressive. And I don't even need to know MARC to see that.
> 
> And you should definitely try out Dydra, which is a cloud triplestore:
> http://dydra.com
> 
>> On Sun, Feb 1, 2015 at 9:00 PM, Young,Jeff (OR) <[log in to unmask]> wrote:
>> Martynas,
>> 
>> I'm skeptical of the 2 tiered mapping approach, but I like the look of the tool. :-)
>> 
>> Jeff
>> 
>> 
>>> On Feb 1, 2015, at 8:57 AM, Martynas Jusevičius <[log in to unmask]> wrote:
>>> 
>>> Hey again,
>>> 
>>> I want to illustrate what I mean with the 2 tiers and the mapping
>>> between them with an example.
>>> 
>>> I used one of the data samples from Jörg's link (about "Aristotle on
>>> mind and the senses") and created a SPARQL query:
>>> 
>>> PREFIX field: <rdfmab:field#>
>>> PREFIX dct: <http://purl.org/dc/terms/>
>>> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
>>> PREFIX bf: <http://bibframe.org/vocab/>
>>> PREFIX foaf: <http://xmlns.com/foaf/0.1/>
>>> PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
>>> 
>>> CONSTRUCT
>>> {
>>> ?work a bf:Work ;
>>>   dct:created ?createdDate ;
>>>   dct:modified ?modifiedDate ;
>>>   dct:language ?language ;
>>>   bf:title ?title ;
>>>   bf:creator [ a foaf:Person ; foaf:name ?creatorName ] ;
>>>   dct:subject [ a skos:Concept ; skos:prefLabel ?categoryLabel ] .
>>> }
>>> {
>>> ?work field:902__p|field:902__s ?categoryLabel .
>>> ?work field:002a_a ?createdString .
>>> BIND (STRDT(?createdString, xsd:date) AS ?createdDate)
>>> ?work field:003__a ?modifiedString .
>>> BIND (STRDT(?modifiedString, xsd:date) AS ?modifiedDate)
>>> ?work field:036a_a ?language .
>>> ?work field:101__a ?creatorName .
>>> ?work field:331__a ?title .
>>> }
>>> 
>>> You can try running it here:
>>> http://graphity.dydra.com/graphity/marc-test/sparql#marcrdf2bibframe
>>> 
>>> This query is a declarative, platform-independent mapping between 2
>>> "tiers" (levels of abstraction) of bibliographic data:
>>> 1. syntactic MARC-RDF, in this case specified by Jörg
>>> 2. conceptual real-world representation, as a mix of BIBFRAME, SKOS,
>>> Dublin Core and other relevant vocabularies
>>> 
>>> By no means I claim that this is an complete or semantically correct
>>> example, but I hope it gives a better idea of my suggestion.
>>> Feel free to expand and modify it.
>>> 
>>> Martynas
>>> graphityhq.com
>>> 
>>> On Sat, Jan 31, 2015 at 5:16 PM, [log in to unmask]
>>> <[log in to unmask]> wrote:
>>>> Because MARC is a key/value stream with a string-based encoding semantics,
>>>> this does not justify a "direct mapping" to RDF. This is problematic.
>>>> 
>>>> From an implementor's view, a correct migration to RDF means to parse MARC
>>>> records, map selected MARC-encoded values to functions/objects, evaluate
>>>> contextual information and more semantic information from other sources
>>>> (catalog codes, authority files), and let the mapping functions create RDF
>>>> graphs with the transformed information in it, adding datatype information,
>>>> links etc.
>>>> 
>>>> I invented something similar to your bfm proposal internally back in 2010.
>>>> 
>>>> https://wiki1.hbz-nrw.de/display/SEM/RDF-ISO2709+-+eine+RDF-Serialisierung+fuer+ISO+2709-basierte+bibliografische+Formate+%28MARC%2C+MAB%29
>>>> 
>>>> When asked for documenting the format, I hesitated and tried to describe it
>>>> as an intermediate serialization format and called it "RDF/ISO2709". But the
>>>> very bad side effect was that librarians who were not familiar with RDF and
>>>> the semantics of RDF thought RDF was just a wrapper mechanism like XML, they
>>>> called RDF a "format" and did not take it seriously as a modern graph model
>>>> for the bibliographic data of future catalogs.
>>>> 
>>>> Karen picked up the idea 2011 in
>>>> 
>>>> http://lists.w3.org/Archives/Public/public-lld/2011Apr/0137.html
>>>> 
>>>> So, from my personal experience, I do not recommend to propose a
>>>> MARC-centered "serialization only" Bibframe dialect. It will not improve
>>>> Bibframe or ease the migration, it will just add a truncated RDF without
>>>> links, without URIs, with another migration path.
>>>> 
>>>> If Bibframe can be seen as the "one-size-fits-all" RDF model for MARC, is
>>>> another question. For much of the data I have, Bibframe is not my first
>>>> choice.
>>>> 
>>>> Jörg
>>>> 
>>>> 
>>>> On Sat, Jan 31, 2015 at 1:14 PM, Martynas Jusevičius <[log in to unmask]>
>>>> wrote:
>>>>> 
>>>>> Jeff,
>>>>> 
>>>>> there is one reality, but it can be described in many different ways.
>>>>> And yes, there should be a separate RDF vocabulary for each level.
>>>>> 
>>>>> Here's a completely fictional example to illustrate what I mean:
>>>>> 
>>>>> 1. MARC-syntax level
>>>>> 
>>>>> _:record a bfm:Record ;
>>>>> bfm:recordType "Book" ;
>>>>> bfm:isbn "123456789" ;
>>>>> bfm:title "The Greatest Works" ;
>>>>> bfm:author1givenName "John" ;
>>>>> bfm:author1familyName "Johnson" ;
>>>>> bfm:author2givenName "Tom" ;
>>>>> bfm:author2familyName "Thompson" .
>>>>> 
>>>>> 2. Linked Data level
>>>>> 
>>>>> <books/123456789#this> a bld:Work, bldtypes:Book ;
>>>>> dct:title "The Greatest Works" ;
>>>>> bld:isbn "123456789" ;
>>>>> bld:authors (<persons/john-johnson#this> <persons/tom-thompson#this>) .
>>>>> 
>>>>> <persons/john-johnson#this> a foaf:Person, bld:Author ;
>>>>> foaf:givenName "John" ;
>>>>> foaf:familyName "Johnson".
>>>>> 
>>>>> <persons/tom-thompson#this> a foaf:Person, bld:Author ;
>>>>> foaf:givenName "Tom" ;
>>>>> foaf:familyName "Thompson".
>>>>> 
>>>>> 
>>>>> Both examples contain the same information, but it is encoded very
>>>>> differently. Clearly the Linked Data style is preferred, and the MARC
>>>>> vocabulary could in theory go away when there are no more legacy MARC
>>>>> systems to support.
>>>>> 
>>>>> I haven't seen any actual MARC data, but if someone has a simple
>>>>> example, we could work on that.
>>>>> 
>>>>> Martynas
>>>>> 
>>>>> 
>>>>> On Sat, Jan 31, 2015 at 4:21 AM, Jeff Young <[log in to unmask]>
>>>>> wrote:
>>>>>> Tim,
>>>>>> 
>>>>>> The semantics behind MARC is based on reality. MARC cares (may) too much
>>>>>> about which names and codes should be used in various structural
>>>>>> positions,
>>>>>> but there are real things lurking behind those.
>>>>>> 
>>>>>> Jeff
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Jan 30, 2015, at 9:58 PM, Tim Thompson <[log in to unmask]> wrote:
>>>>>> 
>>>>>> Karen,
>>>>>> 
>>>>>> Aren't the semantics behind MARC just the semantics of card catalogs and
>>>>>> ISBD, with its nine areas of bibliographic description? ISBD has already
>>>>>> been published by IFLA as a linked data vocabulary
>>>>>> (http://metadataregistry.org/schema/show/id/25.html)--although, sadly,
>>>>>> they
>>>>>> left out the punctuation ;-)
>>>>>> 
>>>>>> Tim
>>>>>> 
>>>>>> --
>>>>>> Tim A. Thompson
>>>>>> Metadata Librarian (Spanish/Portuguese Specialty)
>>>>>> Princeton University Library
>>>>>> 
>>>>>> On Fri, Jan 30, 2015 at 9:01 PM, Young,Jeff (OR) <[log in to unmask]>
>>>>>> wrote:
>>>>>>> 
>>>>>>> What if it was two different vocabularies, rather than two different
>>>>>>> levels of abstraction?
>>>>>>> 
>>>>>>> There is only one reality. A rose by any other name would smell as
>>>>>>> sweet.
>>>>>>> :-)
>>>>>>> 
>>>>>>> Jeff
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On Jan 30, 2015, at 8:02 PM, Martynas Jusevičius
>>>>>>>> <[log in to unmask]>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> Karen,
>>>>>>>> 
>>>>>>>> lets call those specifications BM (BIBFRAME MARC) and BLD (BIBFRAME
>>>>>>>> Linked Data).
>>>>>>>> 
>>>>>>>> What I meant is two different levels of abstractions, each with its
>>>>>>>> own vocabulary and semantics. And a mapping between the two, for
>>>>>>>> which
>>>>>>>> SPARQL would be really convenient.
>>>>>>>> 
>>>>>>>> In the 2-tier approach, these are the main tasks:
>>>>>>>> 1. convert MARC data to RDF at the syntax level (BM)
>>>>>>>> 2. design semantically correct bibliographic Linked Data structure
>>>>>>>> (BLD)
>>>>>>>> 3. define a mapping from BM to BLD
>>>>>>>> 
>>>>>>>> So in that sense I don't think it is similar to profiles, as profiles
>>>>>>>> deal with a subset of properties, but they still come from the same
>>>>>>>> vocabulary.
>>>>>>>> 
>>>>>>>> A somewhat similar approach is W3C work on relational databases:
>>>>>>>> 1. direct mapping to RDF: http://www.w3.org/TR/rdb-direct-mapping/
>>>>>>>> 2. customizable declarative mapping to RDF:
>>>>>>>> http://www.w3.org/TR/r2rml/
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Martynas
>>>>>>>> graphityhq.com
>>>>>>>> 
>>>>>>>>> On Fri, Jan 30, 2015 at 10:15 PM, Karen Coyle <[log in to unmask]>
>>>>>>>>> wrote:
>>>>>>>>> Martynas,
>>>>>>>>> 
>>>>>>>>> I agree that the requirement to accommodate legacy MARC is a
>>>>>>>>> hindrance
>>>>>>>>> to
>>>>>>>>> the development of a more forward-looking RDF vocabulary. I think
>>>>>>>>> that
>>>>>>>>> your
>>>>>>>>> suggest of using SPARQL CONSTRUCT queries is not unlike the concepts
>>>>>>>>> of
>>>>>>>>> selected views or application profiles -- where you work with
>>>>>>>>> different
>>>>>>>>> subsets of a fuller data store, based on need.
>>>>>>>>> 
>>>>>>>>> I wonder, however, how an RDF model designed "from scratch" would
>>>>>>>>> interact
>>>>>>>>> with a model designed to replicate MARC. I know that people find
>>>>>>>>> this
>>>>>>>>> to be
>>>>>>>>> way too far out there, but I honestly don't see how we'll get to
>>>>>>>>> "real"
>>>>>>>>> RDF
>>>>>>>>> if we hang on not only to MARC but to the cataloging rules we have
>>>>>>>>> today
>>>>>>>>> (including RDA). We'd have to start creating natively RDF data, and
>>>>>>>>> until we
>>>>>>>>> understand what that means without burdening ourselves with pre-RDF
>>>>>>>>> cataloging concepts, it's hard to know what that means.
>>>>>>>>> 
>>>>>>>>> All that to say that I would love to see a test implementation of
>>>>>>>>> your
>>>>>>>>> idea!
>>>>>>>>> 
>>>>>>>>> kc
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On 1/30/15 9:03 AM, Martynas Jusevičius wrote:
>>>>>>>>> 
>>>>>>>>> Hey,
>>>>>>>>> 
>>>>>>>>> after following discussions and developments in the BIBFRAME space,
>>>>>>>>> it
>>>>>>>>> seems to me that it tries to be too many things for too many people.
>>>>>>>>> 
>>>>>>>>> I think many of the problems stem from the fact that (to my
>>>>>>>>> understanding) BIBFRAME is supposed to accommodate legacy MARC data
>>>>>>>>> and be the next-generation solution for bibliographic Linked Data.
>>>>>>>>> Attempting to address both cases, it fails to address either of them
>>>>>>>>> well.
>>>>>>>>> 
>>>>>>>>> In my opinion, a possible solution could be to have 2 tiers of RDF
>>>>>>>>> vocabularies:
>>>>>>>>> - a lower-level one that precisely captures the semantics of MARC
>>>>>>>>> - a higher-level one that is designed from scratch for bibliographic
>>>>>>>>> Linked
>>>>>>>>> Data
>>>>>>>>> 
>>>>>>>>> The conversion between the two (or at least from the lower to the
>>>>>>>>> higher level) could be expressed simply as SPARQL CONSTRUCT queries.
>>>>>>>>> 
>>>>>>>>> Any thoughts?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Martynas
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Karen Coyle
>>>>>>>>> [log in to unmask] http://kcoyle.net
>>>>>>>>> m: +1-510-435-8234
>>>>>>>>> skype: kcoylenet/+1-510-984-3600
>>>> 
>>>>