Print

Print


I agree, we're heading toward a model where systems are asked not to compare two heading strings but to compare two sets of properties and relationships for two resources to determine sameness/difference. By comparison, the algorithms for matching strings (how much normalization to apply?) are fairly simple. We can see clustering algorithms doing this kind of work in various systems already.  Some do it better than others, and none do it perfectly with the kind of heterogeneous metadata we have; so we'll want both flexible confidence levels and efficient mechanisms for intervening to correct clustering errors. 

Stephen

On Tue, Jun 30, 2015 at 11:50 AM, Karen Coyle <[log in to unmask]> wrote:
Stephen, thanks for the explanation -- I hadn't understood that from Mark's reply. (Apologies to Mark.) This is something that I do worry about - what will it take to make the connection between manifestations and works, and is it worth the effort? What happens if those connections are not made by catalogers - can they later be derived using clustering algorithms? Surely there will be many work duplicates in the vast world of bibliographic data, just like there are duplicate records for manifestations today. How much effort should go in to preventing this duplication?

And i agree with Mac that title alone doesn't seem like the appropriate target. "Selected poems" anyone? Titles are not unique identifiers, never have been, and we've never treated them that way for duplicate detection.

I have to say that my gut feeling is that we should minimize the amount of searching that humans should do and accept that we'll often identify "same work" through algorithms that work in a non-binary way -- it won't be that this either is or is not the same work, but that there will be a lot of "this might be the same work" or "this is close". That itself isn't hard to do - what is hard is how to present that information to users. This is the logic behind ranking algorithms, though, so it's very common as a technique.

kc


On 6/29/15 8:32 AM, Stephen Hearn wrote:
Mark's example was the first thing that came to my mind, too.  The added labor in the new policy is not just formulating a qualifier; it's searching every new title to determine whether a conflict exists at the work level.  In that sense, it does have an impact on every cataloged item.  Such searching is necessarily different from the search for usable copy, which can narrowly specify its target.  Searching for a work conflict requires a broader search and larger result sets to sort through.

Stephen



On Mon, Jun 29, 2015 at 9:39 AM, Karen Coyle <[log in to unmask]> wrote:
Mark, I don't see, from your examples, the creation of a work title entry for every cataloged item, which is what I was asking about. Trees. Forest.

kc


On 6/28/15 11:21 AM, Ehlert, Mark K. wrote:
But there *has* been a change from past practice for those following the LCRIs/LC-PCC PSs.  My comments here only refer to one aspect of monograph cataloging; there may be others.

-- 
Karen Coyle
[log in to unmask] http://kcoyle.net
m: +1-510-435-8234
skype: kcoylenet/+1-510-984-3600



--
Stephen Hearn, Metadata Strategist
Data Management & Access, University Libraries
University of Minnesota
160 Wilson Library
309 19th Avenue South
Minneapolis, MN 55455
ORCID:  0000-0002-3590-1242

-- 
Karen Coyle
[log in to unmask] http://kcoyle.net
m: +1-510-435-8234
skype: kcoylenet/+1-510-984-3600



--
Stephen Hearn, Metadata Strategist
Data Management & Access, University Libraries
University of Minnesota
160 Wilson Library
309 19th Avenue South
Minneapolis, MN 55455
Ph: 612-625-2328
Fx: 612-625-3428
ORCID:  0000-0002-3590-1242