Print

Print


As someone once* asked, "Is this rule necessary?"

For transcribed statements, a good place to begin the inquiry is to look at
the  goals underlying the original rules, and to see if the rules chosen to
achieve those goals are adequate or redundant.

I would argue that the primary goal addressed by rigidly controlled
transcribed statements is to determine whether two descriptions are about
the same thing. This explains why there is a need to record information as
found (or modified by uniformly applied transforms).

There can be a limited amount of benefit to access if the transcribed
statements contain terms that would not otherwise be present, but this
purpose alone would not justify the specifity of the transcription rules.

If the lack of transcribed statements breaks record linkage badly, then the
rules are clearly necessary ; however this _necessity_ would seem to arise
from the act of description rather than the nature of what is described.

[The answer to Lubetzky for this situation is amenable to empirical
solution; one could take a large number of _records_ , mutate them to
simulate the records that might have been recorded if the rule were not
present, then compare matching accuracy using e.g. Fellegi-Sunter linkage
with weightings estimated by EM. It is of course important to account for
the violations of the assumption of independence.
The estimated weights for different fields / subfields in the unmodified
records may suggest which rules are important to match /non match
determination.  ]

Simon // Is it not possible that it is possible that this is not a rule?
On Aug 1, 2014 2:31 PM, "Philip Schreur" <[log in to unmask]> wrote:

>  bf has some very complex use cases:
>
>     - it is the inheritor of MARC and will be expected to find a way of
> representing that data
>
>     - it will need to represent data created according to our latest set
> of cataloging rules (in this case, RDA)
>
>     - it should provide a light, extensible framework for representing all
> the data library patrons may have interest in (which is virtually
> everything) and all of the above as natively in RDF as possible
>
> At some point we have to acknowledge that not all these can be
> accommodated equally as well and compromises will need to be made.  But if
> the point of bf is to integrate this data into the web via RDF, it seems we
> should compromise in this aspect least.  Otherwise what is the point?  I
> think the conversations of the past few weeks have been very helpful in
> this regard.
>
> Philip
>
> On 8/1/2014 9:49 AM, [log in to unmask] wrote:
>
> Thanks, Rob.
>
>  But where is the "radical" thought which caught my eye? ;)
>
>  Here, the ship hasn't sailed yet.
>
>  Let's be radical: for my future work with bibliographical data, I will
> ignore systems that do not support the distinction of core data with RDF
> statements that can be processed by a machine, and the descriptive data,
> required for presentational services, with languages and rules how to
> describe data, e.g. on the web.
>
>  Let's drop all legacy OPACs and discovery systems now. Cataloging of
> URLs  - that's where it all started. The mix of all kinds of control and
> descriptive "web data" in the catalog.
>
>  It's not "MARC must die". It's "Bad data without clear semantics must
> die".
>
>  Just to add one minor thing, beside RDFa,it is also possible to add
> JSON-LD into HTML, Google is using that:
>
>  http://manu.sporny.org/2013/json-ld-google-search/
>
>  Jörg
>
>
>  On Fri, Aug 1, 2014 at 6:18 PM, Robert Sanderson <[log in to unmask]>
> wrote:
>
>>
>> Dear all,
>>
>>  In my experience, RDF and Linked Data can do both presentation based
>> information (eg here is content to present directly to the user, without
>> semantics eg [1]) and it can do semantic, descriptive information (here is
>> a rich description of the resource, say a book or annotation eg [2]) but
>> both at once is very challenging without simply repeating everything in a
>> for-machines way and a for-humans way as per the current titleStatement,
>> providerStatement, and one assumes authorStatement, subjectStatement, etc.
>>
>>  Here are two radical ideas, for which the boat has probably long since
>> sailed, but I'll throw them out there regardless.
>>
>>  1. Don't try to mix them up.  Have two completely separate
>> descriptions, where one is intended for humans to read, and the other is
>> intended for machines to reason upon and search.  A machine will only ever
>> throw a transcribed string through to the user, so make it easy for them to
>> do that by separating the non-semantic information from the semantic
>> information, with links between them.
>>
>>  2.  Mix them up using the appropriate technology: HTML + RDFA.  Instead
>> of thinking about triples for everything, instead create the HTML that you
>> want the user to see.  Then annotate that HTML with RDFA properties to add
>> the semantics into the record (and really a record now, not a graph).  This
>> way there's only one record to maintain that has both, but uses
>> presentation technology for presenting things to users, and semantic
>> technology for enabling machines to understand the information.
>>
>>  Basically -- use the right tools for the job.  RDF has a hard time
>> representing transcriptions outside of non-semantic strings because it was
>> never intended to do that.  Order in RDF is a complete pain, because a
>> graph is inherently unordered, but there are very real use cases that
>> require order.  On the other hand, RDF is fantastic for controlled data as
>> that is precisely its intended usage.  We should make the most appropriate
>> use of the tools that we have available to us, rather than treating
>> everything as a nail.
>>
>>  Best,
>>
>>  Rob
>>
>>  [1].  The IIIF Presentation API is focused on this approach of giving
>> information intended for a client to display, while still being useful
>> linked data by referencing existing semantic descriptions and following
>> REST and JSON-LD.  http://iiif.io/api/presentation/2.0/
>> [2].  The Open Annotation work is a rich data model that provides
>> semantics for web annotation, but says almost nothing about presentation.
>> http://www.openannotation.org/spec/core/
>>
>>
>>
>>  --
>> Rob Sanderson
>> Technology Collaboration Facilitator
>> Digital Library Systems and Services
>> Stanford, CA 94305
>>
>
>
>
> --
> Philip E. Schreur
> Head, Metadata Department
> Stanford University650-723-2454650-725-1120 (fax)
>
>