LISTSERV mailing list manager LISTSERV 16.0

Help for EAD Archives


EAD Archives

EAD Archives


EAD@LISTSERV.LOC.GOV


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

EAD Home

EAD Home

EAD  February 2010

EAD February 2010

Subject:

Re: Long finding aids and when is a finding aid appropriate?

From:

Bill Parod <[log in to unmask]>

Reply-To:

Encoded Archival Description List <[log in to unmask]>

Date:

Tue, 9 Feb 2010 11:19:19 -0600

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (248 lines)

We're using a similar technology mix for a 3.6M finding aid made up of  
76 individual EAD files describing around 8500 photographs. The  
combined index/site includes a little over 10K EAD c0ns which also  
manifest as Fedora objects and Solr/Lucene documents.

Here's the site: http://www.library.northwestern.edu/africana/winterton

Here's a brief description of the moving parts:
There are 76 individual EAD files representing physical sections of  
the archive (albums, folders, scrapbooks - http://repository.library.northwestern.edu/winterton/about.html) 
. Each of these EADs in ingested into Fedora resulting, initially, in  
76 EAD objects. Our EAD content model supports an access service  
(bound as a Fedora disseminator) that indexes the file for text  
extraction and encapsulates queries supporting a variety of structural  
access methods. Here is a list of that disseminator's methods:

These methods require no parameters and return the associated  
structural material for the EAD object they're invoked on:
getEADHeader
getComponentTOC
getComponents
getArchDescNoComponents
getAsHTML
getChildrenAsHTML

These require a 'unitid' parameter and return the associated  
structural material for the corresponding 'c0n' having that unitid  
within the EAD object they're invoked on:
getComponent(unitid)
getComponentStructure(unitid)
getChildComponents(unitid)
getAncestorComponents(unitid)
getEmbeddedComponent(unitid)
getComponentAsMODS(unitid)
getComponentAsDC(unitid)
getComponentAsHTML(unitid)
getComponentAsEmbeddedHTML(unitid)
getComponentChildrenAsHTML(unitid)
getComponentChildrenAsJSON(unitid)

These are just general purpose xml queries that return a given element  
by xml:id attribute value or a set of elements of a given name:
getElementById(xmlid)
getElementsByName(name)

To build the combined EAD file, we use the getComponent(unitid) method  
for each of the finding aids, passing the unitid for each EAD's top  
level c01. These urls are used in xml entity declarations and then  
referenced in an xml file for the combined EAD:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE ead PUBLIC "+//ISBN 1-931666-00-8//DTD ead.dtd (Encoded  
Archival Description (EAD) Version 2002)//EN" "http://www.library.northwestern.edu/ead/dtd/ead.dtd 
" [
<!ENTITY wc01 SYSTEM "http://repository.library.northwestern.edu/fedora/get/inu:inu-ead-afri-wc01/inu:sdef-ead/getComponent?unitid=1 
">
<!ENTITY wc02 SYSTEM "http://repository.library.northwestern.edu/fedora/get/inu:inu-ead-afri-wc02/inu:sdef-ead/getComponent?unitid=2 
">
<!ENTITY wc03 SYSTEM "http://repository.library.northwestern.edu/fedora/get/inu:inu-ead-afri-wc03/inu:sdef-ead/getComponent?unitid=3 
">
...
<!ENTITY wc76 SYSTEM "http://repository.library.northwestern.edu/fedora/get/inu:inu-ead-afri-wc03/inu:sdef-ead/getComponent?unitid=76 
">
]>
<ead>
	<eadheader .../>
	<archdesc>
	...
	<dsc>
	&wc01;
	&wc02;
	&wc03;
	...
	&wc76;
	</dsc>
	</archdesc>
</ead>


The combined file is ingested in Fedora. Having the combined EAD file  
in Fedora, we can leverage the above disseminator, making all  
structural components for the entire archive available efficiently  
through the above access methods.

There are also Image and Crop objects in Fedora for each scan and crop  
(when multiple photographs on a scan) for the collection. These image  
objects are associated with their corresponding EAD c0n description  
through unitid/pid convention. Image/Crop objects get their  
description (DC and MODS) as disseminations  
(getComponentAsMODS(unitid)) on the combined EAD object above. These  
are also indexed in Solr/Lucene for searching in the site.  Image/Crop  
objects have their own image disseminator bound to a jp2 service for  
efficiently getting scaled/cropped views on the images.

This is a brief description given perhaps in shorthand a little too  
packed. I didn't say anything about the Image/Crop disseminators or  
the site's web presentation mechanisms. I can say more if there's  
specific interest.

I just wanted to chime in on the 'divide and combine' approach with a  
description of the mechanisms we're using, especially when I heard  
Jennie's mention of Fedora and Lucene.

Thanks,
Bill






On Feb 9, 2010, at 8:26 AM, Jennie Levine Knies wrote:

> Ethan,
> Your question about how we plan to create the finding aid is a good  
> one.  We have your standard "Finding Aid" site at the University of  
> Maryland.  <http://www.lib.umd.edu/archivesum>. For that photograph  
> collection, we definitely wanted a record in that system.   
> Originally, we were thinking "traditional" finding aid.  Other  
> options available to us *right now* would be something like putting  
> in a basic "abstract" finding aid and linking out to a PDF or some  
> other form of the Access database.
>
> However, the bigger question we've been asking (perhaps just to  
> procrastinate? Although I like to think it's because we are trying  
> to be thorough... ;)) is when is the finding aid not enough?  We  
> have asked this question, as well as "when is the finding aid  
> appropriate?"  We have done a good job at UM getting people to  
> understand that ArchivesUM is where you go for archival finding  
> aids, but what about our rare book collections?  People don't always  
> understand that those are in the catalog, and the question has come  
> up asking if we couldn't put some of our non-archival special  
> collections into an EAD and include them in ArchivesUM for discovery.
>
> With the photograph collection, we have also asked ourselves if it  
> might not make sense instead to put the metadata for the folder  
> descriptions into our Fedora digital repository as discrete items.   
> That would boost our repository's size from a modest 10,000 or so  
> records to about 75,000 records.  The problem there is that we  
> obviously don't have the entire collection digitized, so would that  
> be confusing to people.  It seems with this type of photograph  
> collection, a true database, rather than an XML file, might be a  
> better form of discovery.
>
> I think I would like both.  With links between and levels of  
> discovery all over the place.  And I don't think we're too far away  
> from that, in the scheme of things.  All we need is some technical  
> support and a will to succeed.
>
> Some other comments - I agree (I forget who mentioned this), the  
> creation of the EAD is not so difficult. With this particular  
> photograph collection, the information is already in a database, and  
> we create our finding aids by starting from a database, so making  
> the actual XML file is trivial.  We could mount it online tomorrow.   
> And, as I type this, I am wondering why we haven't just gone ahead  
> with a stop-gap measure and used the abstract/PDF model to get  
> started, instead of waiting for everything to get perfect.  The  
> presentation is always the challenge. Our system works great for 95%  
> of our finding aids.  It's just the oddballs that keep us on our toes.
>
> Also, another comment/question - we use Lucene to index our finding  
> aids .  I forget what the limit is, but there apparently is a size  
> limit. We've known this since the beginning.  So, with our very  
> large finding aids, a search from within our site is going to miss  
> some of that stuff in the depths.  Maybe breaking down things into  
> separate files, as Ethan suggested, would be a way to get around  
> this.  Will have to experiment...
>
> Jennie
>
> ~*~
> Jennie Levine Knies
> Manager, Digital Collections
> 2216 Hornbake Library
> University of Maryland
> College Park, MD 20742
> (301)314-2558 TEL (301)314-2709 FAX
> [log in to unmask] E-MAIL
> http://www.lib.umd.edu/digital
>
> Ethan Gruber wrote, On 2/8/2010 3:30 PM:
>> I have found that Saxon processes anything that is 5mb or under  
>> fairly efficiently, and load times aren't so bad as long as you're  
>> not on dialup. Jennie,
>> Your photograph collection in an Access database--do you plan on  
>> making a traditional type of EAD finding aid that will go into a  
>> collection of other finding aids and served through a typical type  
>> of finding aid website, or do you want to create a site that puts  
>> emphasis on the item level?  I have done work on several projects  
>> where the focus is on item-level information.  I am gotten around  
>> the issue of having a 10 mb finding aid by making each item as a  
>> standalone XML file that contains only a <c>.  The <c>'s can be  
>> reassembled into a full finding aid, if necessary, but processing  
>> is only done on the small, singular XML file that has only several  
>> kilobytes of information that describes an item.
>> I think dealing with massive finding aids is not such a big deal if  
>> you put aside the notion that all the data must reside in the same  
>> XML file at processing time.  As long as you can extract all the  
>> data into a single XML file at the time of migration, it doesn't  
>> really matter how you store the files under normal circumstances.
>> Ethan
>> On Mon, Feb 8, 2010 at 3:10 PM, Wick, Ryan  
>> <[log in to unmask] <mailto:[log in to unmask]>> wrote:
>>   Our finding aid for the Ava Helen and Linus Pauling Papers is
>>   currently at 13.8MB of XML.
>>    From very early on I put each series into it's own XML file. They
>>   weren't intended to stand on their own so there was nothing "above"
>>   <c01>. There wasn't a specific link to them, and I just modified  
>> our
>>   stylesheet to pull them in where appropriate. Last year we switched
>>   to using XML's external entities referencing local files to "link"
>>   to the series and are happy with the results. See
>>   http://www.javacommerce.com/displaypage.jsp?name=entities.sql&id=18238
>>   <http://www.javacommerce.com/displaypage.jsp?name=entities.sql&id=18238 
>> >
>>   for more information on XML's entities.
>>   For web delivery, we have always split the display of the finding
>>   aid into smaller pieces. We generate static HTML files and divide
>>   the series and box listings into smaller chunks for ease of
>>   navigation and retrieval. There is also an option to view the  
>> entire
>>   series in one file. (The 17 series pages total about 16.4 MB of
>>   HTML. The hundreds of smaller pages combined would have a greater
>>   total, but most of that is overhead of duplicate navigation). The
>>   majority of our traffic comes from search engines, so we've tried
>>   our best to make our content easily indexable.
>>   http://osulibrary.oregonstate.edu/specialcollections/coll/pauling/index.html
>>   On another note, in 2006 we published a print version of the  
>> Pauling
>>   Papers. This included some additional content but the entire  
>> package
>>   ended up being 1800 pages in 6 volumes. http:// 
>> paulingcatalogue.org/
>>   Mark, thanks for posting about UNC's Hugh Morton collection, I
>>   wasn't aware of it before.
>>   Ryan Wick
>>   Information Technology Consultant
>>   Special Collections
>>   Oregon State University Libraries
>>   http://osulibrary.oregonstate.edu/specialcollections
>

Bill Parod
Library Technology Division - Enterprise Systems
Northwestern University Library
[log in to unmask]
847 491 5368

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

February 2020
January 2020
December 2019
November 2019
October 2019
September 2019
August 2019
July 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
December 2006
November 2006
October 2006
September 2006
August 2006
July 2006
June 2006
May 2006
April 2006
March 2006
February 2006
January 2006
December 2005
November 2005
October 2005
September 2005
August 2005
July 2005
June 2005
May 2005
April 2005
March 2005
February 2005
January 2005
December 2004
November 2004
October 2004
September 2004
August 2004
July 2004
June 2004
May 2004
April 2004
March 2004
February 2004
January 2004
December 2003
November 2003
October 2003
September 2003
August 2003
July 2003
June 2003
May 2003
April 2003
March 2003
February 2003
January 2003
December 2002
November 2002
October 2002
September 2002
August 2002
July 2002
June 2002
May 2002
April 2002
March 2002
February 2002
January 2002
December 2001
November 2001
October 2001
September 2001
August 2001
July 2001
June 2001
May 2001
April 2001
March 2001
February 2001
January 2001
December 2000
November 2000
October 2000
September 2000
August 2000
July 2000
June 2000
May 2000
April 2000
March 2000
February 2000
January 2000
December 1999
November 1999
October 1999
September 1999
August 1999
July 1999
June 1999
May 1999
April 1999
March 1999
February 1999
January 1999
December 1998
November 1998
October 1998
September 1998
August 1998
July 1998
June 1998
May 1998
April 1998
March 1998
February 1998
January 1998
December 1997
November 1997
October 1997
September 1997
August 1997
July 1997
June 1997
May 1997
April 1997
March 1997
February 1997
January 1997
December 1996
November 1996
October 1996
September 1996
August 1996
July 1996
June 1996
May 1996
April 1996
March 1996
February 1996
December 1995

ATOM RSS1 RSS2



LISTSERV.LOC.GOV

CataList Email List Search Powered by the LISTSERV Email List Manager