On Sat, Jan 14, 2012 at 7:07 PM, Roy Tennant <[log in to unmask]> wrote: > Sigh...I realize my original message was stated in such a way that it > was possible people could construe that I am against all provenance > information everywhere, when nothing could be further from the truth. My > bad. > > I had been responding to a specific example of the use of provenance in > terms of where the title information was taken for a bibliographic record, > for example: > > >- title page title > >- cover title > >- title from jewel case insert > > And the end-user in me shouted “who cares!” Yes, I understand that there > may be variances in those titles, but I wanted to not make the assumption > that such variance would have a detrimental effect on end-user needs > I'm not sure that this usage of provenance is strictly correct. The title on the title page and the title on the cover are two different pieces of information; they can have different values but still have identical provenance. The reason for not collapsing differentiating the two pieces of information is not primarily for user display (although collapsing the distinction may lead to interruptions in sequential displays); the main reason for keeping the two properties distinct is that they serve as identity criteria. This is the case whether applying absolute identity (classical Leibniz's Law), relative identity (RLL), or when using probabilistic record linkage (e.g. Felligi/Sunter). I am not sure of the strength of this effect on F/S using maximum entropy weightings; if you could run the numbers for worldcat would be very useful. BTW, a useful methodological approach to these questions might be useful to measure the effect of different proposed rule changes using this kind of metric. This is also a case where using quantitative models of user information seeking behavior could be effective in selecting which possible models and rules are worth testing with real users. Since a lot of the data fields in MARC are not independent, using information theoretic models to select hypotheses may be essential for cost reasons,.