Jon makes a very good point about the utility of common conventions for
indexing and retrieval.
But the issue is more complex than simply deciding which fields/elements
will be searched and in what combination.
We must all agree as to what data will be included in the record and
what the level of markup will be. There is no benefit to agreeing to
index element X unless we all actually include that data in our
Let me draw an analogy to MARC cataloging of monographs.
MARC field 505 provides for a formatted contents note. It has two
possible content models. In one, the text is simply transcribed with
minimal ISBD punctuation to separate sections of text. In the other
model, there is formal content designation of statements of
responsiblity, title, etc through the specific use of separate
Including information from a book's table of contents in the catalog
entry would no doubt enhance retrieval. Vendors could add these fields
to their indexes. However, the benefit to users of MARBI adding this
functionality to USMARC and vendors indexing of it depends on two local
decisions. Does a given library actually choose to transcribe this
data from the book into the catalog record at all? If so, does the
library choose to do the full markup or just string all the text into
subfield a? While the community might bring some pressure to bear,
suggesting that such work is highly beneficial, individual libraries
will make the decision on what to do based on their own assessment of
the cost/benefit ratio.
Archives will act in the same way. Indexing schemes and user
expectations must accomodate that reality and not be predicated on
assumptions about uniformity of practice in areas where such is unlikely
to occur. Better that we focus on making a strong case for inclusion
and completeness in areas that constitute the "core" of archival
description and to identify the benefits that will accrue from
additional content and additional content designation beyond that. For
example, my own personal view is that there would be greater benefit and
a higher degree of implementation if the community standard was for
every archives to include key access terms in authority controlled form
in a single <controlaccess> area of the description than if some choose
to tag every occurance of a personal name wherever it occured, in the
form in which it occured, and others marked up none.
Daniel has a wonderful quotation from Cutter or other famous person to
the effect that the worst scheme of authority control, if universally
implemented, would be better than what we have now.
Head of Processing
Minnesota Historical Society
345 Kellogg Blvd West
St. Paul MN 55102-1906
[log in to unmask]
**NOTE NEW AREA CODE EFFECTIVE JULY 12, 1998**
> From: Jon Riewe[SMTP:[log in to unmask]]
> Sent: Friday, October 16, 1998 8:23 AM
> To: Multiple recipients of list EAD
> Subject: Re: Concern regarding number of "hits"...
> As an implementer, we've seen this question raised time and
> again in various communities (i.e., museum, geospatial, social
> bibliographic, etc.).
> Today within the EAD community Michael is correct in stating
> that the quality of search results "depends entirely on the search
> engine that one is using". But I would advocate that this is the
> answer if the community's goal is to share information resources
> interoperably across the community. Why should software vendors
> determine the success of finding relevant information resources? It
> would seem that the community would be better served by taking the
> initiative to set standards that software vendors must comply with.
> While the EAD community has done a wonderful job in creating
> indexing structure for its resources, indexing is only one piece of
> information management and delivery process. Just as important
> more important), is community agreement on which fields should be
> searchable and what attributes those fields have. Once there is
> agreement on these high level concepts, the community (not the
> vendor) is in control of the quality of results.
> Why is interoperablity important? It's not if the goal is
> to be able to discover resources at a specific institution. However,
> the goal is allow users to discover information across institutions,
> then it becomes very important.
> For example, within the EAD community, one of the goals
> to be that users should be able to submit a single request for a
> resource that would simultaneously go to all servers containing EAD
> resources located anywhere around the world. That request would allow
> the user to specify a combination of specific author, title, dates of
> publication, subject matter, etc., etc. The results would come back
> the user in one consolidated list including an indicator showing which
> institutions' server provided the results.
> All these capabilities are available today if there is a high
> level community agreement on commonly searchable fields and what
> attributes that those fields have. Without that agreement,
> interoperability is just not very practical.
> Last year we performed an analysis of the Iowa State Library's
> implementation of a system that was designed to provide electronic
> resource discovery among state, municipal and university libraries.
> During the course of several dozen interviews with both librarians and
> end-users, it became quite clear that the primary problem encountered
> was a frustration with the incompatibility between vendors and getting
> consistent results from searches.
> This occurred because the bibliographic community had not
> on commonly searchable fields and the attributes of those fields. As
> result, vendor A made fields 1,2 and 3 searchable and vendor B made
> fields 3, 4 and 5 searchable. Thus if you were at an institution that
> installed vendor A software, you could not search on fields 4 and 5
> if you were at an institution that installed vendor B software, you
> would not be able to search on fields 1 and 2. If fact, the only
> field that could be searched between both institutions was field 3.
> This incompatibility was the number one frustration that came
> out of the Iowa study. I mention it because the same issue has the
> potential for limiting cross-institution discover for EAD as well.
> since EAD is in the initial stages of implementation, it has the
> establish the appropriate guidelines that will eliminate a similar
> situation from occurring within the EAD community.
> I would advocate that the community would be well served by
> creating information discovery standards which are appropriate for the
> community as a whole and requiring software vendors to accommodate
> standards. The alternative to using such a standards-based approach
> continued reliance on individual software vendors that will implement
> their own proprietary solutions. In most case, those solutions will
> designed to maximize revenue by locking users into the vendor's
> proprietary solution and will be at cross-purposes for promoting
> cross-institutional discovery.
> Jon Riewe
> Blue Angel Technologies, Inc.
> 1220 Valley Forge Road, Unit #44
> P.O. Box 987
> Valley Forge, PA 19482-0987
> Phone: 610-917-9200
> Fax: 610-917-9958
> Email: [log in to unmask]
> Web Site: www.blueangeltech.com
> > ----------
> > From: Fox, Michael[SMTP:[log in to unmask]]
> > Sent: Thursday, October 15, 1998 3:01 PM
> > To: Multiple recipients of list EAD
> > Subject: Re: Concern regarding number of "hits"...
> > An excellent question.
> > The answer depends entirely on the search engine that one is using
> > access this EAD inventory. Consider a parallel question. Can your
> > library online catalog find all the books published in Philadelphia?
> > The data is there in the MARC record in field 260,subfield a. But
> > can
> > a Notis or Innovative Interfaces or GEAC or Dynix system search on
> > this
> > data? The answer is specific to the way each vendor has programmed
> > search criteria into their system (often with some user
> > possible).
> > When we design and purchase online library catalogs, we have many
> > years'
> > experience in user requirements to know what features in this area
> > might want.
> > Alas, we have no such body of knowledge- maybe some quesses- as to
> > what
> > would be useful for retrieving archival records. The other
> > variability
> > in search systems will be the extent to which we tag content in the
> > EAD
> > document. Do we mark up every instance of a personal name where
> > it
> > occurs in the text of the finding aid? With catalog records, MARC
> > pretty much defines the level of granularity that we must apply to
> > fields that are commonly thought of as access points- names,
> > titles, etc. We have no concensus yet on the level of granularity
> > content designation within EAD.
> > There are at least three issues that play out here
> > One is the chicken and egg situation- we don't know what works
> > we don't have anything to test because we don't know what works
> > because
> > we haven't encoded data because we don't know what's needed. A few
> > brave institutions are venturing out there with search engines that
> > are
> > trying different approaches. Until the results are in, and I hope
> > someone out there in archival studies programs is going to do some
> > user
> > testing of these systems, we must make some guesses. The
> > of
> > Toronto for one has begun such an investigation.
> > The other side of the coin is the economic aspect of this- what is
> > cost-benefit of more detailed markup? We have to consider more
> > just the first question- is detailed markup and detailed retrieval a
> > good and useful thing? Lots of things are useful but is the
> > worth the added labor we would have to invest?
> > Finally, there is the question of how we present the search options
> > users who may not understand the nature of the materials in the
> > collection or the structure of finding aids. An OPAC search for a
> > bibliographic title works for two reasons- the user has some idea of
> > what a book title is and the fact that book titles tend to be mostly
> > unique and may be known in advance of the search. Few know what
> > concept of an archival series might mean and what the significance
> > would
> > be to limiting a search to the content of a single series. Search
> > engines can do that now conceptually but how would we build a user
> > interface for such a inquiry? Would it make any sense to the
> > average
> > user?
> > One of the benefits of content markup like MARC or EAD is the
> > possibility of more refined and integrated inforamtional retrieval.
> > We
> > need to offer soemthing more useful than the strong arm approach
> > key word retrieval affords. As proof of that, I offer the Web.
> > Michael
> > Michael Fox
> > Head of Processing
> > Minnesota Historical Society
> > 345 Kellogg Blvd West
> > St. Paul MN 55102-1906
> > phone: 651-296-1014
> > fax: 651-296-9961
> > [log in to unmask]
> > **NOTE NEW AREA CODE EFFECTIVE JULY 12, 1998**
> > > ----------
> > > From: Yax, Maggie (YAXME)[SMTP:[log in to unmask]]
> > > Sent: Wednesday, October 14, 1998 2:22 PM
> > > To: Multiple recipients of list EAD
> > > Subject: Concern regarding number of "hits"...
> > >
> > > Please forgive this theoretical, naive and possibly silly concern.
> > I
> > > am
> > > processing the Albert B. Sabin (developer of the live, oral polio
> > > vaccine)
> > > papers at the Cincinnati Medical Heritage Center. I have not yet
> > > begun to
> > > markup my inventory but am anticipating doing so when the
> > > of this
> > > large (ca. 400 l. ft.) collection is completed. I have taken the
> > EAD
> > > workshop and have been lurking on this list for a while as well as
> > > having
> > > visited sites with inventories in EAD. I understand that one of
> > > benefits of EAD is the precise retrieval the user will enjoy.
> > I
> > > try to
> > > imagine how that might work for an inventory of this size (being
> > > described
> > > at folder level detail), my mind boggles at the number of "hits"
> > (tho'
> > > precise) one might get when searching for, say, poliomyelitis.
> > > problem
> > > could be minimized if one could search only one series or
> > > I have
> > > not been able to determine if this is possible with EAD or if such
> > > capability is planned. It's quite possible (probable!) I don't
> > > understand
> > > this well enough -- am I worried about nothing? Or is this a
> > > potential
> > > problem for large collections described at folder level detail?
> > Many
> > > thanks
> > > for any light folks can shed on this.
> > >
> > > Maggie
> > >
> > > Maggie Yax, Albert B. Sabin Archivist
> > > Cincinnati Medical Heritage Center
> > > University of Cincinnati's Medical Center AIT&L
> > > 121 Wherry Hall
> > > Cincinnati, OH 45267-0574
> > > Phone: (513) 558-5121
> > > Fax: (513) 558-0472
> > > Email: [log in to unmask]
> > >