A couple of quick answers...
1. We do indeed keep the original MODS section intact within the METS
record. One of our disseminations is "show metadata" and this is done via
an XLT applied over the MODS unit with dmdSec. The only data we're
currently mapping into our SQL tables are the MODS sections. Once we have
a rights management solution, we'll probably map that as well. But the
fileSec and StructMap sections work far too well in METS to even consider
converting.
2. I use PHP to maintain an array of found ids for each term queried. If
we query multiple terms, each subsequent query adds its found ids to that
array, or increments the id if it already exists in the hash. By the end,
I have a list of all found ids, and the number of terms through wich they
were found. For AND searches, I pull those ids whose hash value is equal
the number for terms queried. For OR searches I pull all found ids, and
for PHRASE searches, I use the id set pulled for the AND search and use it
as the basis of a final search of the phrases table. Since the terms
table contains unique single words as its data, its quite indexable and
should perform well.
3. Didn't mean to diminish Cheshire II! Its now on the list of things to
look into. I hadn't seen it since a demonstration of Cheshire I several
years back and hadn't thought to see how its evolved.
patrick
Patrick M. Yott
Center for Digital Initiatives
Brown University Library
|