I'd like to just add a word or two to Jon and Michael's excellent discussion.
As far as the quote goes, it comes from Charles Jewett, librarian at the
Smithsonian Institution in the mid-19th century. Here is an accurate citation:
Charles C. Jewett, Smithsonian Report on the Construction of Catalogues of
Libraries and Their Publication by Means of Separate, Stereotyped Titles
(Washington, D.C.: Smithsonian Institution, 1853), p. 9.
And the quote:
"Now, even if the one [system] adopted were that of the worst of our
catalogues, if it were strictly followed in all alike, their uniformity
would render catalogues, thus made, far more useful than the present chaos
of irregularities."
Since stumbling on this, I have thought it a good basic sentiment to hold
while engaged in standards development.
As far as I can determine, both Jon and Michael have the same objective:
community based encoding and content standards that will result in
consistent and uniform descriptive data across repositories. EAD takes us
some of the way there, but a great deal of community-based experimenting
and negotiating remains to be done. And, as Michael points out, economic
realities will always play a role in what we decide to do, much as we all
wish this were not the case.
As Jon points out, the archivists need to take the lead. But the vendors
need to be involved in the experimenting and negotiating, which is to say,
to participate in the community building.
Daniel
At 09:53 AM 10/16/1998 -0500, you wrote:
>Jon makes a very good point about the utility of common conventions for
>indexing and retrieval.
>
>But the issue is more complex than simply deciding which fields/elements
>will be searched and in what combination.
>
>We must all agree as to what data will be included in the record and
>what the level of markup will be. There is no benefit to agreeing to
>index element X unless we all actually include that data in our
>inventories.
>
>Let me draw an analogy to MARC cataloging of monographs.
>
>MARC field 505 provides for a formatted contents note. It has two
>possible content models. In one, the text is simply transcribed with
>minimal ISBD punctuation to separate sections of text. In the other
>model, there is formal content designation of statements of
>responsiblity, title, etc through the specific use of separate
>subfields.
>
>Including information from a book's table of contents in the catalog
>entry would no doubt enhance retrieval. Vendors could add these fields
>to their indexes. However, the benefit to users of MARBI adding this
>functionality to USMARC and vendors indexing of it depends on two local
>decisions. Does a given library actually choose to transcribe this
>data from the book into the catalog record at all? If so, does the
>library choose to do the full markup or just string all the text into
>subfield a? While the community might bring some pressure to bear,
>suggesting that such work is highly beneficial, individual libraries
>will make the decision on what to do based on their own assessment of
>the cost/benefit ratio.
>
>Archives will act in the same way. Indexing schemes and user
>expectations must accomodate that reality and not be predicated on
>assumptions about uniformity of practice in areas where such is unlikely
>to occur. Better that we focus on making a strong case for inclusion
>and completeness in areas that constitute the "core" of archival
>description and to identify the benefits that will accrue from
>additional content and additional content designation beyond that. For
>example, my own personal view is that there would be greater benefit and
>a higher degree of implementation if the community standard was for
>every archives to include key access terms in authority controlled form
>in a single <controlaccess> area of the description than if some choose
>to tag every occurance of a personal name wherever it occured, in the
>form in which it occured, and others marked up none.
>
>Daniel has a wonderful quotation from Cutter or other famous person to
>the effect that the worst scheme of authority control, if universally
>implemented, would be better than what we have now.
>
>Michael
>
>Michael Fox
>Head of Processing
>Minnesota Historical Society
>345 Kellogg Blvd West
>St. Paul MN 55102-1906
>phone: 651-296-1014
>fax: 651-296-9961
>[log in to unmask]
>**NOTE NEW AREA CODE EFFECTIVE JULY 12, 1998**
>
>> ----------
>> From: Jon Riewe[SMTP:[log in to unmask]]
>> Sent: Friday, October 16, 1998 8:23 AM
>> To: Multiple recipients of list EAD
>> Subject: Re: Concern regarding number of "hits"...
>>
>> As an implementer, we've seen this question raised time and
>> again in various communities (i.e., museum, geospatial, social
>> science,
>> bibliographic, etc.).
>>
>> Today within the EAD community Michael is correct in stating
>> that the quality of search results "depends entirely on the search
>> engine that one is using". But I would advocate that this is the
>> wrong
>> answer if the community's goal is to share information resources
>> interoperably across the community. Why should software vendors
>> determine the success of finding relevant information resources? It
>> would seem that the community would be better served by taking the
>> initiative to set standards that software vendors must comply with.
>>
>> While the EAD community has done a wonderful job in creating
>> the
>> indexing structure for its resources, indexing is only one piece of
>> the
>> information management and delivery process. Just as important
>> (perhaps
>> more important), is community agreement on which fields should be
>> searchable and what attributes those fields have. Once there is
>> agreement on these high level concepts, the community (not the
>> software
>> vendor) is in control of the quality of results.
>>
>> Why is interoperablity important? It's not if the goal is
>> only
>> to be able to discover resources at a specific institution. However,
>> if
>> the goal is allow users to discover information across institutions,
>> then it becomes very important.
>>
>> For example, within the EAD community, one of the goals
>> appears
>> to be that users should be able to submit a single request for a
>> desired
>> resource that would simultaneously go to all servers containing EAD
>> resources located anywhere around the world. That request would allow
>> the user to specify a combination of specific author, title, dates of
>> publication, subject matter, etc., etc. The results would come back
>> to
>> the user in one consolidated list including an indicator showing which
>> institutions' server provided the results.
>>
>> All these capabilities are available today if there is a high
>> level community agreement on commonly searchable fields and what
>> attributes that those fields have. Without that agreement,
>> interoperability is just not very practical.
>>
>> Last year we performed an analysis of the Iowa State Library's
>> implementation of a system that was designed to provide electronic
>> resource discovery among state, municipal and university libraries.
>> During the course of several dozen interviews with both librarians and
>> end-users, it became quite clear that the primary problem encountered
>> was a frustration with the incompatibility between vendors and getting
>> consistent results from searches.
>>
>> This occurred because the bibliographic community had not
>> agree
>> on commonly searchable fields and the attributes of those fields. As
>> a
>> result, vendor A made fields 1,2 and 3 searchable and vendor B made
>> fields 3, 4 and 5 searchable. Thus if you were at an institution that
>> installed vendor A software, you could not search on fields 4 and 5
>> and
>> if you were at an institution that installed vendor B software, you
>> would not be able to search on fields 1 and 2. If fact, the only
>> common
>> field that could be searched between both institutions was field 3.
>>
>> This incompatibility was the number one frustration that came
>> out of the Iowa study. I mention it because the same issue has the
>> potential for limiting cross-institution discover for EAD as well.
>> And
>> since EAD is in the initial stages of implementation, it has the
>> ability
>> establish the appropriate guidelines that will eliminate a similar
>> situation from occurring within the EAD community.
>>
>> I would advocate that the community would be well served by
>> creating information discovery standards which are appropriate for the
>> community as a whole and requiring software vendors to accommodate
>> those
>> standards. The alternative to using such a standards-based approach
>> is
>> continued reliance on individual software vendors that will implement
>> their own proprietary solutions. In most case, those solutions will
>> be
>> designed to maximize revenue by locking users into the vendor's
>> proprietary solution and will be at cross-purposes for promoting
>> cross-institutional discovery.
>>
>> Jon Riewe
>>
>> Blue Angel Technologies, Inc.
>> 1220 Valley Forge Road, Unit #44
>> P.O. Box 987
>> Valley Forge, PA 19482-0987
>> Phone: 610-917-9200
>> Fax: 610-917-9958
>> Email: [log in to unmask]
>> Web Site: www.blueangeltech.com
>>
>>
>> > ----------
>> > From: Fox, Michael[SMTP:[log in to unmask]]
>> > Sent: Thursday, October 15, 1998 3:01 PM
>> > To: Multiple recipients of list EAD
>> > Subject: Re: Concern regarding number of "hits"...
>> >
>> > An excellent question.
>> >
>> > The answer depends entirely on the search engine that one is using
>> to
>> > access this EAD inventory. Consider a parallel question. Can your
>> > library online catalog find all the books published in Philadelphia?
>> > The data is there in the MARC record in field 260,subfield a. But
>> > can
>> > a Notis or Innovative Interfaces or GEAC or Dynix system search on
>> > this
>> > data? The answer is specific to the way each vendor has programmed
>> > search criteria into their system (often with some user
>> customization
>> > possible).
>> >
>> > When we design and purchase online library catalogs, we have many
>> > years'
>> > experience in user requirements to know what features in this area
>> we
>> > might want.
>> >
>> > Alas, we have no such body of knowledge- maybe some quesses- as to
>> > what
>> > would be useful for retrieving archival records. The other
>> > variability
>> > in search systems will be the extent to which we tag content in the
>> > EAD
>> > document. Do we mark up every instance of a personal name where
>> ever
>> > it
>> > occurs in the text of the finding aid? With catalog records, MARC
>> > pretty much defines the level of granularity that we must apply to
>> > fields that are commonly thought of as access points- names,
>> subjects,
>> > titles, etc. We have no concensus yet on the level of granularity
>> for
>> > content designation within EAD.
>> >
>> > There are at least three issues that play out here
>> >
>> > One is the chicken and egg situation- we don't know what works
>> because
>> > we don't have anything to test because we don't know what works
>> > because
>> > we haven't encoded data because we don't know what's needed. A few
>> > brave institutions are venturing out there with search engines that
>> > are
>> > trying different approaches. Until the results are in, and I hope
>> > someone out there in archival studies programs is going to do some
>> > user
>> > testing of these systems, we must make some guesses. The
>> University
>> > of
>> > Toronto for one has begun such an investigation.
>> >
>> > The other side of the coin is the economic aspect of this- what is
>> the
>> > cost-benefit of more detailed markup? We have to consider more
>> than
>> > just the first question- is detailed markup and detailed retrieval a
>> > good and useful thing? Lots of things are useful but is the
>> benefit
>> > worth the added labor we would have to invest?
>> >
>> > Finally, there is the question of how we present the search options
>> to
>> > users who may not understand the nature of the materials in the
>> > collection or the structure of finding aids. An OPAC search for a
>> > bibliographic title works for two reasons- the user has some idea of
>> > what a book title is and the fact that book titles tend to be mostly
>> > unique and may be known in advance of the search. Few know what
>> the
>> > concept of an archival series might mean and what the significance
>> > would
>> > be to limiting a search to the content of a single series. Search
>> > engines can do that now conceptually but how would we build a user
>> > interface for such a inquiry? Would it make any sense to the
>> > average
>> > user?
>> >
>> > One of the benefits of content markup like MARC or EAD is the
>> > possibility of more refined and integrated inforamtional retrieval.
>> > We
>> > need to offer soemthing more useful than the strong arm approach
>> that
>> > key word retrieval affords. As proof of that, I offer the Web.
>> >
>> > Michael
>> >
>> > Michael Fox
>> > Head of Processing
>> > Minnesota Historical Society
>> > 345 Kellogg Blvd West
>> > St. Paul MN 55102-1906
>> > phone: 651-296-1014
>> > fax: 651-296-9961
>> > [log in to unmask]
>> > **NOTE NEW AREA CODE EFFECTIVE JULY 12, 1998**
>> >
>> > > ----------
>> > > From: Yax, Maggie (YAXME)[SMTP:[log in to unmask]]
>> > > Sent: Wednesday, October 14, 1998 2:22 PM
>> > > To: Multiple recipients of list EAD
>> > > Subject: Concern regarding number of "hits"...
>> > >
>> > > Please forgive this theoretical, naive and possibly silly concern.
>> > I
>> > > am
>> > > processing the Albert B. Sabin (developer of the live, oral polio
>> > > vaccine)
>> > > papers at the Cincinnati Medical Heritage Center. I have not yet
>> > > begun to
>> > > markup my inventory but am anticipating doing so when the
>> processing
>> > > of this
>> > > large (ca. 400 l. ft.) collection is completed. I have taken the
>> > EAD
>> > > workshop and have been lurking on this list for a while as well as
>> > > having
>> > > visited sites with inventories in EAD. I understand that one of
>> the
>> > > benefits of EAD is the precise retrieval the user will enjoy.
>> When
>> > I
>> > > try to
>> > > imagine how that might work for an inventory of this size (being
>> > > described
>> > > at folder level detail), my mind boggles at the number of "hits"
>> > (tho'
>> > > precise) one might get when searching for, say, poliomyelitis.
>> This
>> > > problem
>> > > could be minimized if one could search only one series or
>> subseries.
>> > > I have
>> > > not been able to determine if this is possible with EAD or if such
>> a
>> > > capability is planned. It's quite possible (probable!) I don't
>> > > understand
>> > > this well enough -- am I worried about nothing? Or is this a
>> > > potential
>> > > problem for large collections described at folder level detail?
>> > Many
>> > > thanks
>> > > for any light folks can shed on this.
>> > >
>> > > Maggie
>> > >
>> > > Maggie Yax, Albert B. Sabin Archivist
>> > > Cincinnati Medical Heritage Center
>> > > University of Cincinnati's Medical Center AIT&L
>> > > 121 Wherry Hall
>> > > Cincinnati, OH 45267-0574
>> > > Phone: (513) 558-5121
>> > > Fax: (513) 558-0472
>> > > Email: [log in to unmask]
>> > >
>> >
>>
>
>
Daniel V. Pitti Project Director
Institute for Advanced Technology in the Humanities
Alderman Library University of Virginia Charlottesville, Virginia 22903
Phone: 804 924-6594 Fax: 804 982-2363 Email: [log in to unmask]
|