Print

Print


On Fri, May 31, 2013 at 10:06 AM, Tom Morris <[log in to unmask]> wrote:

> On Thu, May 30, 2013 at 5:27 PM, Young,Jeff (OR) <[log in to unmask]> wrote:
>
>> I disagree. Knowing the name of something, its type(s), and a few other
>> seemingly mundane clues can be enough to identify a thing in a broader
>> context. RDF/Linked Data is not merely a variant record format. Patterns
>> exist in information that extend well beyond records, even if they are only
>> probabilistic. Donít underestimate Hadoop.
>>
> Probalistic matching using text strings (ie literals) can be done using
> MARC too, but I agree with those who say an RDF graph of literals is no
> better than a MARC/XML file full of literals.  The power comes from having
> strong identifiers which, in the case of RDF, means URIs.   It's more work,
> but offers infinitely more value.
>

Probabilistic matching of entities is complicated. Approximate matching of
strings is also complicated.  It can actually be easier to estimate m/u
weights for F/S  matching, and transition properties for edit distance
based methods using the record-as-utterance. [I believe that record as
utterance is an important part of the ontology of the bibliographic
universe].

Probabilistic matching of entities using strings can lead to *really,
really * bad things - thus the Match threshold needs to be set really high.
 Also, the semantics of the reference model need to be extremely well
defined, and identity and equivalence criteria need to be well known and
strongly justified.

Simon