Print

Print


Having worked on library systems, I experienced that a huge effort went 
into trying to extract usable data from strings, and overcoming some of 
the inconsistencies that you find in a mass of data that originated in 
different institutions. If the document "side" of the data was used 
mostly for display and the data "side" for data functions, it might even 
be a savings of design time and programming effort because it would 
eliminate this need. It is very hard to constrain textual entries to 
something that can satisfy algorithms for precise usage. If nothing 
else, it would be worth investigating.

kc

On 8/1/14, 1:17 PM, Simeon Warner wrote:
> This is a great topic for a Friday afternoon!
>
> I strongly agree with Karen that true separation of the human and 
> semantic data would be a dangerous thing for them getting out of sync. 
> Think of program code: comments next to methods are out of date often 
> enough, separate documentation is usually pretty much uselessly out of 
> date/sync (though I'm sure nobody on this list would be guilty ;-) ).
>
> The "mix it up with appropriate technologies" option seems worthy of 
> more consideration. But even then I'm left wondering whether multiple 
> technology stacks make life easier or harder when one likely expects 
> cataloging tools to support update of both halves of the data (along 
> with great tools to semi-automagically suggest the structured form 
> from the human form (and vice versa)). If downstream tools want just 
> one half or the other then one can imagine simple filter tools to pick 
> one or the other portion of the data.
>
> 2c,
> Simeon
>
>
> On 8/1/14 12:54 PM, Karen Coyle wrote:
>> Thanks, Rob.  I have suggested both of these in emails in the past week,
>> although perhaps they got lost in their threads. Taking them in reverse
>> order:
>>
>> 2. OCLC is already doing suggestion #2 through its use of schema.org
>> based on its MARC data. Obviously, a more "data friendly" set of data
>> would make that more efficacious. If we had a "record" that allowed the
>> storage of identifiers in concert with the display strings, then it
>> would be easier to export markup that facilitates linking. I suppose
>> BIBFRAME could have been that record, but it has taken a very different
>> approach.
>>
>> 1. I've contemplated the "catalog 'record' as document" concept at
>> various times. True separation of the display from the coded semantics
>> strikes me as dangerous, for them getting out of sync. As with the #2
>> option, there would need to be hooks between the display forms and the
>> "data forms" so that some automated processes could exist that don't
>> allow one to change without checking the other.
>>
>> In either case, I come to the conclusion that we COULD provide this
>> radical view, but not with the models and records that we have today. So
>> this would entail the development of a new model that supports the
>> solution. I also have said that we probably cannot achieve this with the
>> cataloging rules that we have today. Essentially, new rules would need
>> to take into account the coordination between display and
>> "data/semantics." The current cataloging rules are still overly involved
>> in display and ignore machine processing functions to support retrieval,
>> comparison, and data mining.
>>
>> kc
>>
>> On 8/1/14, 9:18 AM, Robert Sanderson wrote:
>>>
>>> Dear all,
>>>
>>> In my experience, RDF and Linked Data can do both presentation based
>>> information (eg here is content to present directly to the user,
>>> without semantics eg [1]) and it can do semantic, descriptive
>>> information (here is a rich description of the resource, say a book or
>>> annotation eg [2]) but both at once is very challenging without simply
>>> repeating everything in a for-machines way and a for-humans way as per
>>> the current titleStatement, providerStatement, and one assumes
>>> authorStatement, subjectStatement, etc.
>>>
>>> Here are two radical ideas, for which the boat has probably long since
>>> sailed, but I'll throw them out there regardless.
>>>
>>> 1. Don't try to mix them up.  Have two completely separate
>>> descriptions, where one is intended for humans to read, and the other
>>> is intended for machines to reason upon and search.  A machine will
>>> only ever throw a transcribed string through to the user, so make it
>>> easy for them to do that by separating the non-semantic information
>>> from the semantic information, with links between them.
>>>
>>> 2.  Mix them up using the appropriate technology: HTML + RDFA.
>>>  Instead of thinking about triples for everything, instead create the
>>> HTML that you want the user to see.  Then annotate that HTML with RDFA
>>> properties to add the semantics into the record (and really a record
>>> now, not a graph).  This way there's only one record to maintain that
>>> has both, but uses presentation technology for presenting things to
>>> users, and semantic technology for enabling machines to understand the
>>> information.
>>>
>>> Basically -- use the right tools for the job.  RDF has a hard time
>>> representing transcriptions outside of non-semantic strings because it
>>> was never intended to do that.  Order in RDF is a complete pain,
>>> because a graph is inherently unordered, but there are very real use
>>> cases that require order.  On the other hand, RDF is fantastic for
>>> controlled data as that is precisely its intended usage.  We should
>>> make the most appropriate use of the tools that we have available to
>>> us, rather than treating everything as a nail.
>>>
>>> Best,
>>>
>>> Rob
>>>
>>> [1].  The IIIF Presentation API is focused on this approach of giving
>>> information intended for a client to display, while still being useful
>>> linked data by referencing existing semantic descriptions and
>>> following REST and JSON-LD. http://iiif.io/api/presentation/2.0/
>>> [2].  The Open Annotation work is a rich data model that provides
>>> semantics for web annotation, but says almost nothing about
>>> presentation. http://www.openannotation.org/spec/core/
>>>
>>>
>>>
>>> -- 
>>> Rob Sanderson
>>> Technology Collaboration Facilitator
>>> Digital Library Systems and Services
>>> Stanford, CA 94305
>>

-- 
Karen Coyle
[log in to unmask] http://kcoyle.net
m: 1-510-435-8234
skype: kcoylenet