Print

Print


> Alvin Pollock wrote:
>
        [...] Our largest finding aids, ca. 10-15 Mb, usually require
extensive structural manipulation.
> I'm curious to know whether anyone has applied XSLT or DOM to this
> problem.
>
        ------------

        The Access to Archives (A2A) project based at the UK Public Record
Office (an EAD XML application) so far encompasses the finding aids of 100
separate archival institutions. So not suprisingly we have already dealt
with a broad spectrum of cataloguing style, practice and depth (and as a
testament to EAD, have managed all of it).

        We have also had to confront the "large finding aid" conundrum on
several occasions, in some instances dealing with single finding aids nearly
50 Mb in size! Our approach so far has been two-fold:

        1.
        Like others, we have had to split or chunk these large catalogues
(at source) into smaller physical units. This process is in part automated
(by Perl script) but always also requires a not insignificant amount of
manual intervention. Our experience has shown that we cannot rely on any
single formula for the chunking mechanism (eg. by specific level, by
specific no. of bytes or by specific no. of lines). The enormous variation
between finding aids means that we simply cannot predicate which levels will
be present in any one finding aid, nor the quantity of data to be found in
any one level. Recently we had a finding aid with the following breakdown:

              1    <ARCHDESC LEVEL="fonds">
              1    <C LEVEL="otherlevel" OTHERLEVEL="sub-fonds">
              2     <C LEVEL="series">
            66     <C LEVEL="otherlevel" OTHERLEVEL="sub-series">
            20     <C LEVEL="otherlevel" OTHERLEVEL="sub-sub-series">
          692    <C LEVEL="file">
        5532    <C LEVEL="item">

        The total size of this finding aid was 3.25 Mb, of which 2.9 Mb
(89%) was just *ONE* of the 692 <C LEVEL="file">s!

        The other point to note about our chunking system is that we
duplicate and carry down all the salient top level information (the "path")
to each individual chunked section, so that its context remains and it can
continue to be viewed as a valid sibling rather than an invalid orphan.

        2.
        Our second approach to solving the "large finding aid" problem
involves the implementation of the DOM and XSLT.

        Since users always approach the A2A application via a search, we
view the whole access process in reverse and start with the hits as the
primary goal, rather than the whole finding aid as an entity in itself.

        Thus via an XSLT stylesheet, we present the user with a 'filtered'
view of the entire finding aid which primarily displays the search hits -
but also includes the immediate context of those hits (paragraph, level
title, reference and date) plus an indication of the wider context of those
hits (the "path" backwards from there to the top of the finding aid).

        In this way, a large single finding aid can be 'filtered' to a
couple of screens' worth of salient information (ie. a few Kb's rather
Mb's).

        Of course, we also offer the traditional, 'browsable' option for
viewing the whole finding aid or its table of contents (via 2nd and 3rd
stylesheets respectively) but recent usage statistics indicated that fewer
than 8% of users actually made use of these options.

        To see both of these strategies in action, please visit our website
at  http://www.a2a.pro.gov.uk
        Any keyword search will illustrate the 'filtering' strategy; most of
the British Library finding aids also illustrate our 'chunking' strategy.

        Matt Hillyard,
        A2A at the Public Record Office
        London