On Fri, May 31, 2013 at 10:06 AM, Tom Morris <[log in to unmask]> wrote:
Probabilistic matching of entities is complicated. Approximate matching of strings is also complicated. †It can actually be easier to estimate m/u weights for F/S †matching, and transition properties for edit distance based methods using the record-as-utterance. [I believe that record as utterance is an important part of the ontology of the bibliographic universe].
Probabilistic matching of entities using strings can lead to really, really† bad things - thus the Match threshold needs to be set really high. †Also, the semantics of the reference model need to be extremely well defined, and identity and equivalence criteria need to be well known and strongly justified. †