Print

Print


I agree, we're heading toward a model where systems are asked not to
compare two heading strings but to compare two sets of properties and
relationships for two resources to determine sameness/difference. By
comparison, the algorithms for matching strings (how much normalization to
apply?) are fairly simple. We can see clustering algorithms doing this kind
of work in various systems already.  Some do it better than others, and
none do it perfectly with the kind of heterogeneous metadata we have; so
we'll want both flexible confidence levels and efficient mechanisms for
intervening to correct clustering errors.

Stephen

On Tue, Jun 30, 2015 at 11:50 AM, Karen Coyle <[log in to unmask]> wrote:

>  Stephen, thanks for the explanation -- I hadn't understood that from
> Mark's reply. (Apologies to Mark.) This is something that I do worry about
> - what will it take to make the connection between manifestations and
> works, and is it worth the effort? What happens if those connections are
> not made by catalogers - can they later be derived using clustering
> algorithms? Surely there will be many work duplicates in the vast world of
> bibliographic data, just like there are duplicate records for
> manifestations today. How much effort should go in to preventing this
> duplication?
>
> And i agree with Mac that title alone doesn't seem like the appropriate
> target. "Selected poems" anyone? Titles are not unique identifiers, never
> have been, and we've never treated them that way for duplicate detection.
>
> I have to say that my gut feeling is that we should minimize the amount of
> searching that humans should do and accept that we'll often identify "same
> work" through algorithms that work in a non-binary way -- it won't be that
> this either is or is not the same work, but that there will be a lot of
> "this might be the same work" or "this is close". That itself isn't hard to
> do - what is hard is how to present that information to users. This is the
> logic behind ranking algorithms, though, so it's very common as a technique.
>
> kc
>
>
> On 6/29/15 8:32 AM, Stephen Hearn wrote:
>
> Mark's example was the first thing that came to my mind, too.  The added
> labor in the new policy is not just formulating a qualifier; it's searching
> every new title to determine whether a conflict exists at the work level.
> In that sense, it does have an impact on every cataloged item.  Such
> searching is necessarily different from the search for usable copy, which
> can narrowly specify its target.  Searching for a work conflict requires a
> broader search and larger result sets to sort through.
>
>  Stephen
>
>
>
> On Mon, Jun 29, 2015 at 9:39 AM, Karen Coyle <[log in to unmask]> wrote:
>
>>  Mark, I don't see, from your examples, the creation of a work title
>> entry for every cataloged item, which is what I was asking about. Trees.
>> Forest.
>>
>> kc
>>
>> On 6/28/15 11:21 AM, Ehlert, Mark K. wrote:
>>
>> But there **has** been a change from past practice for those following the LCRIs/LC-PCC PSs.  My comments here only refer to one aspect of monograph cataloging; there may be others.
>>
>>
>>  --
>> Karen [log in to unmask] http://kcoyle.net
>> m: +1-510-435-8234
>> skype: kcoylenet/+1-510-984-3600
>>
>>
>
>
>  --
>  Stephen Hearn, Metadata Strategist
> Data Management & Access, University Libraries
> University of Minnesota
> 160 Wilson Library
> 309 19th Avenue South
> Minneapolis, MN 55455
> Ph: 612-625-2328
> Fx: 612-625-3428
> ORCID:  0000-0002-3590-1242
>
>
> --
> Karen [log in to unmask] http://kcoyle.net
> m: +1-510-435-8234
> skype: kcoylenet/+1-510-984-3600
>
>


-- 
Stephen Hearn, Metadata Strategist
Data Management & Access, University Libraries
University of Minnesota
160 Wilson Library
309 19th Avenue South
Minneapolis, MN 55455
Ph: 612-625-2328
Fx: 612-625-3428
ORCID:  0000-0002-3590-1242