The Access to Archives (A2A) project based at the UK National Archives now
encompasses over 70,000 finding aids from all over England. We have had to
confront the "large finding aid" conundrum on several occasions, in some
instances dealing with single finding aids nearly 50 Mb in size!
During the input stages (keying, checking, editing) we continue to treat the
finding aid as a single unit.
When it comes to online delivery however (indexing, searching, retrieving)
we have to treat large finding aids differently, mindful of the restrictions
of bandwidth, speed and timely response. Our approach so far has been
We split or chunk the largest catalogues (at source) into smaller physical
units. This process is in part automated (by Perl script) but always also
requires a not insignificant amount of manual intervention. Our experience
has shown that we cannot rely on any single formula for the chunking
mechanism (eg. by specific level, by specific no. of bytes or by specific
no. of lines). The enormous variation between finding aids means that we
simply cannot predict which levels will be present in any one finding aid,
nor the quantity of data to be found in any one level.
We also duplicate and carry down all the salient top level information (the
"path") to each individual chunked section, so that its context remains and
it can continue to be viewed as a valid sibling rather than an invalid
Our second strategy involves the implementation of the DOM and XSLT.
Since users always approach the A2A application via a search, we view the
whole access process in reverse and start with the hits as the primary goal,
rather than the whole finding aid as an entity in itself.
Thus via an XSLT stylesheet, we present the user with a 'filtered' view of
the entire finding aid which primarily displays the search hits - but also
includes the immediate context of those hits (paragraph, level title,
reference and date) plus an indication of the wider context of those hits
(the "path" backwards from there to the top of the finding aid).
In this way, a large single finding aid can be 'filtered' to a couple of
screens' worth of salient information (ie. a few Kb's rather Mb's).
And although we do still also offer the traditional, 'browsable' option for
viewing the whole finding aid or its table of contents (via 2nd and 3rd
stylesheets respectively) usage statistics indicate that fewer than 4% of
users actually make use of these...
To see both of our strategies in action, please visit our website at
http://www.a2a.pro.gov.uk Any keyword search will illustrate the
'filtering' strategy; most of the British Library finding aids also
illustrate our 'chunking' strategy.
A2A at the UK National Archives
From: Jodi Allison-Bunnell [mailto:[log in to unmask]]
Sent: 05 September 2003 18:17
To: [log in to unmask]
Subject: Encoding really long finding aids
I am part of the Northwest Digital Archives (NWDA) project, and am looking
for some advice on encoding really long finding aids.
I am about to send a long (1200 page word processing document) document to
our conversion vendor, and have the option of making this one long document,
or having the 36 series encoded separately and linked to the main document.
I can think of advantages and disadvantages both ways; obviously I need to
make up my mind. Our consortium guidelines do not have specific
recommendations either way, and I don't find any in other guidelines (OAC or
We have a single stylesheet for the consortium.
Thanks for any assistance/advice you can provide.
Archives Grant Administrator
Maureen and Mike Mansfield Library
The University of Montana
Missoula, MT 59812
[log in to unmask]
"Books are easy! Ninety-five percent of them exist in multiple copies and
are now easily accessible through international databases. It is the
scholarly resources hidden in archives that we need to make more visible."
-David Stam, librarian emeritus, Syracuse University
This e-mail message (and attachments) may contain information that is confidential to The National Archives.
If you are not the intended recipient you cannot use, distribute or copy the message or attachments. In such a case,
please notify the sender by return e-mail immediately and erase all copies of the message and attachments.
Opinions, conclusions and other information in this message and attachments that do not relate to the official business
of The National Archives are neither given nor endorsed by it.