The searching tool I'm totally sold on is sgrep. It's
free and implements "region-based" searching, similar
to what OpenText does. You can index HTML, SGML, or XML
documents and search by element, attribute, etc. It will
also retrieve key words in context. It handles recursive
elements easily (which most other "tag-based" indexers won't
do) but your SGML must be well-formed. It's a search engine
only, you have to build your own display tool and web
interface. This is not hard to do and there are a lot
of good examples out there to copy. Sgrep is described
in one of Charles Goldfarbs XML books. I don't remember
which one.
I recently indexed around 600 SGML documents totalling
about 70 Mb and searching was very fast. It's still
currently in beta.
http://www.cs.helsinki.fi/~jjaakkol/sgrep.html
Alvin Pollock
Lead Programmer
Online Archive of California
http://www.oac.cdlib.org
P.S. We're using Dynaweb for the OAC, not sgrep.
At 12:16 PM 07/30/1999 -0700, you wrote:
>Hello Bill,
>
>For what it's worth, here's another tool for searching EAD on the web. We
>use a search engine called Isite/Isearch at our museum to serve up EAD
>finding aids for our museum collections (see
>http://www.bampfa.berkeley.edu/search/collectionguides.html). It can handle
>XML-like documents, including searching fielded text, attribute values, and
>full-text. It's fairly rough in terms of interface and documentation, but
>that's partially because it's free :) We use the Unix version, but there is
>a Windows version out there too. See http://www.etymon.com/Isearch/ to get
>the software.
>
>I'm not sure I'd recommend it for a long-term sophisticated solution
>because of it's lack of interface tools, and we're looking for a new tool,
>but it has served us well for a couple of years and helped us get into the
>game with very little cash cost.
>
>
>Richard Rinehart
>----------------
>Information Systems Manager & Education Technology Specialist
>Berkeley Art Museum/Pacific Film Archive
>@ University of California
>http://www.bampfa.berkeley.edu/
>----------------
>& President, Museum Computer Network, http://www.mcn.edu/
>
>
>
>> -----Original Message-----
>>From: Bill Sees [mailto:[log in to unmask]]
>>Sent: Friday, July 30, 1999 10:26 AM
>>To: Multiple recipients of list EAD
>>Subject: Search Engines For XML Documents
>>
>>Can someone tell me what, if any, Web search engines exists
>>that are specifically designed
>>to search XML documents that are stored on a Web server? I
>>am specifically interested in
>>search engines that are not part of an expensive document
>>management production system,
>>and that could be used on NT's IIS. What is the present
>>state of search software?
>>
>>Thanks,
>>
>>Bill Sees
>>[log in to unmask]
>
>
>
>Richard Rinehart
>----------------
>Information Systems Manager & Education Technology Specialist
>Berkeley Art Museum/Pacific Film Archive
>@ University of California
>http://www.bampfa.berkeley.edu/
>----------------
>& President, Museum Computer Network, http://www.mcn.edu/
>
>
|