On Fri, May 31, 2013 at 10:06 AM, Tom Morris <[log in to unmask]> wrote:
On Thu, May 30, 2013 at 5:27 PM, Young,Jeff (OR) <[log in to unmask]> wrote:

I disagree. Knowing the name of something, its type(s), and a few other seemingly mundane clues can be enough to identify a thing in a broader context. RDF/Linked Data is not merely a variant record format. Patterns exist in information that extend well beyond records, even if they are only probabilistic. Donít underestimate Hadoop.

Probalistic matching using text strings (ie literals) can be done using MARC too, but I agree with those who say an RDF graph of literals is no better than a MARC/XML file full of literals. †The power comes from having strong identifiers which, in the case of RDF, means URIs. † It's more work, but offers infinitely more value.

Probabilistic matching of entities is complicated. Approximate matching of strings is also complicated. †It can actually be easier to estimate m/u weights for F/S †matching, and transition properties for edit distance based methods using the record-as-utterance. [I believe that record as utterance is an important part of the ontology of the bibliographic universe].

Probabilistic matching of entities using strings can lead to really, really† bad things - thus the Match threshold needs to be set really high. †Also, the semantics of the reference model need to be extremely well defined, and identity and equivalence criteria need to be well known and strongly justified. †