As an implementer, we've seen this question raised time and
again in various communities (i.e., museum, geospatial, social science,
bibliographic, etc.).
Today within the EAD community Michael is correct in stating
that the quality of search results "depends entirely on the search
engine that one is using". But I would advocate that this is the wrong
answer if the community's goal is to share information resources
interoperably across the community. Why should software vendors
determine the success of finding relevant information resources? It
would seem that the community would be better served by taking the
initiative to set standards that software vendors must comply with.
While the EAD community has done a wonderful job in creating the
indexing structure for its resources, indexing is only one piece of the
information management and delivery process. Just as important (perhaps
more important), is community agreement on which fields should be
searchable and what attributes those fields have. Once there is
agreement on these high level concepts, the community (not the software
vendor) is in control of the quality of results.
Why is interoperablity important? It's not if the goal is only
to be able to discover resources at a specific institution. However, if
the goal is allow users to discover information across institutions,
then it becomes very important.
For example, within the EAD community, one of the goals appears
to be that users should be able to submit a single request for a desired
resource that would simultaneously go to all servers containing EAD
resources located anywhere around the world. That request would allow
the user to specify a combination of specific author, title, dates of
publication, subject matter, etc., etc. The results would come back to
the user in one consolidated list including an indicator showing which
institutions' server provided the results.
All these capabilities are available today if there is a high
level community agreement on commonly searchable fields and what
attributes that those fields have. Without that agreement,
interoperability is just not very practical.
Last year we performed an analysis of the Iowa State Library's
implementation of a system that was designed to provide electronic
resource discovery among state, municipal and university libraries.
During the course of several dozen interviews with both librarians and
end-users, it became quite clear that the primary problem encountered
was a frustration with the incompatibility between vendors and getting
consistent results from searches.
This occurred because the bibliographic community had not agree
on commonly searchable fields and the attributes of those fields. As a
result, vendor A made fields 1,2 and 3 searchable and vendor B made
fields 3, 4 and 5 searchable. Thus if you were at an institution that
installed vendor A software, you could not search on fields 4 and 5 and
if you were at an institution that installed vendor B software, you
would not be able to search on fields 1 and 2. If fact, the only common
field that could be searched between both institutions was field 3.
This incompatibility was the number one frustration that came
out of the Iowa study. I mention it because the same issue has the
potential for limiting cross-institution discover for EAD as well. And
since EAD is in the initial stages of implementation, it has the ability
establish the appropriate guidelines that will eliminate a similar
situation from occurring within the EAD community.
I would advocate that the community would be well served by
creating information discovery standards which are appropriate for the
community as a whole and requiring software vendors to accommodate those
standards. The alternative to using such a standards-based approach is
continued reliance on individual software vendors that will implement
their own proprietary solutions. In most case, those solutions will be
designed to maximize revenue by locking users into the vendor's
proprietary solution and will be at cross-purposes for promoting
cross-institutional discovery.
Jon Riewe
Blue Angel Technologies, Inc.
1220 Valley Forge Road, Unit #44
P.O. Box 987
Valley Forge, PA 19482-0987
Phone: 610-917-9200
Fax: 610-917-9958
Email: [log in to unmask]
Web Site: www.blueangeltech.com
> ----------
> From: Fox, Michael[SMTP:[log in to unmask]]
> Sent: Thursday, October 15, 1998 3:01 PM
> To: Multiple recipients of list EAD
> Subject: Re: Concern regarding number of "hits"...
>
> An excellent question.
>
> The answer depends entirely on the search engine that one is using to
> access this EAD inventory. Consider a parallel question. Can your
> library online catalog find all the books published in Philadelphia?
> The data is there in the MARC record in field 260,subfield a. But
> can
> a Notis or Innovative Interfaces or GEAC or Dynix system search on
> this
> data? The answer is specific to the way each vendor has programmed
> search criteria into their system (often with some user customization
> possible).
>
> When we design and purchase online library catalogs, we have many
> years'
> experience in user requirements to know what features in this area we
> might want.
>
> Alas, we have no such body of knowledge- maybe some quesses- as to
> what
> would be useful for retrieving archival records. The other
> variability
> in search systems will be the extent to which we tag content in the
> EAD
> document. Do we mark up every instance of a personal name where ever
> it
> occurs in the text of the finding aid? With catalog records, MARC
> pretty much defines the level of granularity that we must apply to
> fields that are commonly thought of as access points- names, subjects,
> titles, etc. We have no concensus yet on the level of granularity for
> content designation within EAD.
>
> There are at least three issues that play out here
>
> One is the chicken and egg situation- we don't know what works because
> we don't have anything to test because we don't know what works
> because
> we haven't encoded data because we don't know what's needed. A few
> brave institutions are venturing out there with search engines that
> are
> trying different approaches. Until the results are in, and I hope
> someone out there in archival studies programs is going to do some
> user
> testing of these systems, we must make some guesses. The University
> of
> Toronto for one has begun such an investigation.
>
> The other side of the coin is the economic aspect of this- what is the
> cost-benefit of more detailed markup? We have to consider more than
> just the first question- is detailed markup and detailed retrieval a
> good and useful thing? Lots of things are useful but is the benefit
> worth the added labor we would have to invest?
>
> Finally, there is the question of how we present the search options to
> users who may not understand the nature of the materials in the
> collection or the structure of finding aids. An OPAC search for a
> bibliographic title works for two reasons- the user has some idea of
> what a book title is and the fact that book titles tend to be mostly
> unique and may be known in advance of the search. Few know what the
> concept of an archival series might mean and what the significance
> would
> be to limiting a search to the content of a single series. Search
> engines can do that now conceptually but how would we build a user
> interface for such a inquiry? Would it make any sense to the
> average
> user?
>
> One of the benefits of content markup like MARC or EAD is the
> possibility of more refined and integrated inforamtional retrieval.
> We
> need to offer soemthing more useful than the strong arm approach that
> key word retrieval affords. As proof of that, I offer the Web.
>
> Michael
>
> Michael Fox
> Head of Processing
> Minnesota Historical Society
> 345 Kellogg Blvd West
> St. Paul MN 55102-1906
> phone: 651-296-1014
> fax: 651-296-9961
> [log in to unmask]
> **NOTE NEW AREA CODE EFFECTIVE JULY 12, 1998**
>
> > ----------
> > From: Yax, Maggie (YAXME)[SMTP:[log in to unmask]]
> > Sent: Wednesday, October 14, 1998 2:22 PM
> > To: Multiple recipients of list EAD
> > Subject: Concern regarding number of "hits"...
> >
> > Please forgive this theoretical, naive and possibly silly concern.
> I
> > am
> > processing the Albert B. Sabin (developer of the live, oral polio
> > vaccine)
> > papers at the Cincinnati Medical Heritage Center. I have not yet
> > begun to
> > markup my inventory but am anticipating doing so when the processing
> > of this
> > large (ca. 400 l. ft.) collection is completed. I have taken the
> EAD
> > workshop and have been lurking on this list for a while as well as
> > having
> > visited sites with inventories in EAD. I understand that one of the
> > benefits of EAD is the precise retrieval the user will enjoy. When
> I
> > try to
> > imagine how that might work for an inventory of this size (being
> > described
> > at folder level detail), my mind boggles at the number of "hits"
> (tho'
> > precise) one might get when searching for, say, poliomyelitis. This
> > problem
> > could be minimized if one could search only one series or subseries.
> > I have
> > not been able to determine if this is possible with EAD or if such a
> > capability is planned. It's quite possible (probable!) I don't
> > understand
> > this well enough -- am I worried about nothing? Or is this a
> > potential
> > problem for large collections described at folder level detail?
> Many
> > thanks
> > for any light folks can shed on this.
> >
> > Maggie
> >
> > Maggie Yax, Albert B. Sabin Archivist
> > Cincinnati Medical Heritage Center
> > University of Cincinnati's Medical Center AIT&L
> > 121 Wherry Hall
> > Cincinnati, OH 45267-0574
> > Phone: (513) 558-5121
> > Fax: (513) 558-0472
> > Email: [log in to unmask]
> >
>
|