When we have a personal name that conflicts with another heading, 
LCRI 22.17-22.20 tells us to add subfield $q if some part of the 
heading is an initial or other abbreviation and we have the full form 
for that initial or abbreviation. Note that both parts of this 
condition have to be fulfilled in order for us to add $q at this 
point: we not only have to know what the full name is, but we have to 
have an abbreviation or initial in subfield $a.

(Although it's somewhat beside the point, I can't resist mentioning 
at the outset the careful use in the RI of the terms "initial" and 
"abbreviation."  These term do not, I think, include shortened or 
familiar forms such as "Bill" or "Bea" or "Tom" or "Rudy" or "Greg" 
or "Steve", or nicknames such as "Bull" or "Red".  For shortened or 
familiar forms, $q is not authorized by clause 1a of the rule interpretation.)

If adding subfield $q giving the full form for an initial or 
abbreviation doesn't make the heading unique (or if we don't have 
full forms for abbreviations or initials, or if the name contains no 
abbreviations or initials), we add dates in $d if available.  (I'm 
quite aware that under the separate LCRI 22.17, we will actually have 
already added dates to a new heading if they are available; but this 
if anything reinforces the point I want to make about 22.17-20, and 
$q in particular.)

I have always assumed that this RI presents things in hierarchical 
fashion: you start at the top, and you stop as soon as the exercise 
of one of the possibilities produces a unique heading.  Were this not 
the case (i.e., if we're supposed to apply all of the possibilities 
even where not needed), then there wouldn't be any need for the 
explicit instruction to add $q for abbreviation/initials and $d for 
dates when both are available.

I'll throw in this aside for completeness and to avoid confusion: 
Later on in the rule interpretation, we're told that (if all of the 
above stuff has failed to produce a unique heading) we can add 
subfield $q for parts of the name not present in subfield $a even 
though abbreviations are not involved.  But we would only do this, I 
hasten to emphasize, if the application of foregoing instructions has 
not already given us a unique heading.  (The RI mentions yet other 
possibilities for disambiguating headings, which are beside the point 
of this diatribe.)

There's a reason we shouldn't use $q unless necessary, although I 
have no way of knowing whether this reason was part of the design of 
the RI: it usually makes a better order.  To illustrate, let's assume 
we have an existing, neatly-ordered file along these lines (I'm 
making up this example to protect the guilty; this list should be 
assumed to contain some things that represent 100 fields, and some 
that represent 400 fields; beside the point):

         Strawn, Robert, 1793-1872
         Strawn, Robert, 1915-
         Strawn, Robert, 1945
         Strawn, Robert, 1949-2003
         Strawn, Robert, 1951-
         Strawn, Robert, 1978
         Strawn, Robert A., 1936-
         Strawn, Robert B.
         Strawn, Robert C., 1947-
         Strawn, Robert Conrad, Mrs., 1895-1977
         <dozens more Robert Strawns here>

So far, so lovely.  Now we've got a new Robert Strawn, and at the 
time we're establishing the heading we know that his middle name is 
Michael, and that he was born in 1946.  Applying LCRI 22.17-22.20 
(or, to be more precise, not applying it, because in fact we've 
already added dates according to 22.17 so our heading is already 
unique), we will not add subfield $q, and we will end up with this 
heading, which falls very neatly into the above sequence:

         Strawn, Robert, 1946-

If, on the other hand, we were to throw into the heading everything 
we know about the person even if not necessary, we would end up with 
this heading, which is going to end up at some point in the list 
(given current sorting regimes) that is probably less than helpful:

         Strawn, Robert (Robert Michael), 1946-

(Warning: Don't even get me started on the sort order provided by the 
current group of library automation vendors.)

So far, so clear, I hope.  In the absence of an abbreviation/initial 
we don't use $q if we have $d, unless nothing else will serve to 
produce a unique heading; and that's for a good reason.

A recent traversal through new LC/NACO records issued to date in 2007 
turns up 284 cases of personal names that do not have a full stop in 
subfield $a (and are therefore assumed not to involve an abbreviation 
or initial) and contain both subfield $q and $d.  (I didn't consider 
name/title headings in this tablulation.  We're talking about name 
headings, so things with subject subdivisions don't come into the 
equation, either.)  My working assumption is that these 284 personal 
name headings were constructed in error.

To make things easier (on me if not on you), I concentrated on the 4 
contributing institutions with 10 or more headings in the "likely 
error" pile; there were only 4 of these.  (No, I'm not going to tell 
you who they are, although 2 might be obvious enough.  The point here 
isn't to jump on any particular institution.)  I'll call them A, B, C 
and D.  I manually checked each of the likely errors for these four 
institutions against headings in the LC/NACO authority file.  I found 
that a few instances of co-occurring $q and $d were in fact warranted 
by existing headings.  (In other words, for a few of the "likely 
errors" we have two different people using the same basic name; these 
people were born in the same year but we do not have month and day of 
birth for either; and we know about some unused parts of name for one 
of them.)  I removed these from my counts. (For institution A I 
discarded 3 reported potential errors; for institution D, I discarded 
2; none discarded for B and C.)

In the following tabulation, "contributed" records are: new personal 
name records with no subfield $t.  What I'm trying to tease out is 
the ratio of erroneous headings to the total number of records 
created: the rate for this particular kind of error.  (The count of 
errors doesn't include "likely errors" that turned out to be correct.)

         A: contributed 83,058 records, of which 76 are errors: error 
rate of 0.0915%
         B: contributed 346 records, of which 10 are errors: error 
rate of 2.89%
         C: contributed 957 records, of which 11 are errors: error 
rate of 1.149%
         D: contributed 23,505 records, of which 40 are errors: error 
rate of 0.1702%

For these four institutions taken as a group, the average error rate 
is 0.127%.  So one large contributor is doing a bit better than 
average, another large contributor is not doing quite so well, and 
the two smaller contributors are well above the average.  My 
impression, from spot-checking headings for institutions with a 
smaller number of likely errors (including those produced by my own 
institution, I hasten to add) is--because of the substantial weight 
of the records generated by institution A--that the error rates for 
these would prove in most cases to be above the average as well.

So, finally, I come to my point: could we please restrict the use of 
subfield $q to those cases where it is necessary and called for by 
the rules we're supposed to be following?

Gary L. Strawn, Authorities Librarian, etc.
Northwestern University Library, 1970 Campus Drive, Evanston IL 60208-2300
e-mail: [log in to unmask]   voice: 847/491-2788   fax: 847/491-8306
Forsan et haec olim meminisse iuvabit.          BatchCat version: 2006.51.826