Hi Chris,
That is interesting. Sounds similar in scale to us - 18,000 descriptions of which the majority are
collection-level and several hundred are multi-level. You don't have any problems with the bots
following all dynamically generated links within the interface, e.g. for us the refine search links,
the hyperlinked index terms, the browse links. My understanding was that this would mean they would
effectively be crawling through hundreds of thousands of pages.
cheers,
Jane.
Chris Prom wrote:
> Hi Jane,
>
> At the University of Illinois our system has been open to Google and
> other bots for several years. Over 7,000 collection-level records and
> several hundered full finding aids are routinely harvested by Google and
> other bots. Our system is a PHP-driven database application, not static
> HTML.
>
> We have never run into an issue with server overload. I suspect it
> would not be a problem for you, since server load is significantly
> higher to serve up a PHP using our system than it would be to serve up
> an equivalent page in static HTML.
>
> Best,
>
> Chris Prom
> Fulbright Scholar
> University of Dundee
> United Kingdom
>
> Jane Stevenson wrote:
>> Hi all,
>>
>> >>Basically what I'm trying to do is get away from creating static
>> html pages to store on our server and just present the view and print
>> options through xml and xsl.
>>
>> This has prompted me to think about a rather different question -
>> we're actually thinking of creating static html pages in addition to
>> our XSL generated pages because we want our descriptions to be exposed
>> to Google. Alternatively we could create pre-generated searches. We
>> don't just open up our system to robots due to problems with
>> overloading the system. Has anyone had any experience of this kind of
>> thing? It would be useful to get your thoughts.
>>
>> cheers,
>> Jane.
>>
>> **************************************************************
>> Jane Stevenson
>>
>> Archives Hub Co-ordinator
>> Mimas
>> University of Manchester
>> Email: [log in to unmask]
>> http://www.archiveshub.ac.uk
>>
>>
>>
>>
>> Fox, Michael wrote:
>>> There is a proof of concept stylesheet on the EAD help pages that
>>> does this, namely use XSL:FO to generate an XML document in
>>> Formatting Objects syntax (XSL:FO) that an FO processor could
>>> subsequently convert to PDF. If you use Oxygen, the necessary
>>> tools (an XSLT engine and the FOP processor) are already bundled
>>> in. I believe the same is true of the XML Spy software.
>>>
>>>
>>>
>>> There is also another stylesheet at the same location that goes from
>>> EAD directly to WordML using a standard XSL transformation, though I
>>> do not know if the syntax of this stylesheet still reflects
>>> Microsoft's current schema for Word. After transformation, the
>>> output can be opened directly in a recent version of Word and edited
>>> or printed as required.
>>>
>>>
>>>
>>> In these scenarios, your EAD XML instance could truly serve as your
>>> canonical version.
>>>
>>>
>>>
>>> Finally, there is another option with one of the EAD Cookbook
>>> stylesheets that produces an HTML page that has no links and so could
>>> be imported into Word and printed from there. A bit messier but far
>>> easier to pull off.
>>>
>>>
>>>
>>> Michael Fox
>>>
>>>
>>>
>>>
>>>
>>> *From:* Encoded Archival Description List [mailto:[log in to unmask]] *On
>>> Behalf Of *Ethan Gruber
>>> *Sent:* Wednesday, December 09, 2009 3:37 PM
>>> *To:* [log in to unmask]
>>> *Subject:* Re: use of <otherfindaid> tag
>>>
>>>
>>>
>>> You can create XSLT stylesheets that contain Formatting Objects
>>> specifications, and then serialize to PDF dynamically with calls to
>>> the fop processor.
>>>
>>> Ethan
>>>
>>> On Wed, Dec 9, 2009 at 4:25 PM, Franks, Russell
>>> <[log in to unmask] <mailto:[log in to unmask]>> wrote:
>>>
>>> Thank you Jane and Michele for the clarifications and thoughts.
>>>
>>> Michele, I like your method of generating a print page via the style
>>> sheet. Is this done with javascript? Or is it an xsl template that
>>> dynamically creates a new printer friendly page?
>>>
>>> Basically what I'm trying to do is get away from creating static html
>>> pages to store on our server and just present the view and print
>>> options through xml and xsl.
>>>
>>> Thanks - Russ
>>>
>>>
>>> Russell Franks
>>> Librarian
>>> Special and Archival Collections
>>> Phillips Memorial Library
>>> Providence College
>>> 1 Cunningham Square
>>> Providence, RI 02918-0001
>>> 401-865-2578
>>> [log in to unmask] <mailto:[log in to unmask]>
>>> [log in to unmask] <mailto:[log in to unmask]>
>>> http://www.providence.edu/archives
>>>
>>> -----Original Message-----
>>> From: Encoded Archival Description List [mailto:[log in to unmask]
>>> <mailto:[log in to unmask]>] On Behalf Of Jane Stevenson
>>> Sent: Wednesday, December 09, 2009 4:04 AM
>>> To: [log in to unmask] <mailto:[log in to unmask]>
>>>
>>> Subject: Re: use of <otherfindaid> tag
>>>
>>> Hi there,
>>>
>>> We (our contributors) use <archref> to link to separately described
>>> parts of a finding aid, such as
>>> where a description is extremely large and benefits from being
>>> divided up. This is in line with the
>>> guidance: 'Examples of such materials include a record group and one
>>> of its large series (which
>>> might have separate EAD-encoded finding aids)'
>>>
>>> However, our contributors can also use it to link to other parts of
>>> the same finding aid, which may
>>> not be strictly within the guidelines, but it seems to be the best
>>> choice for this.
>>> http://www.archiveshub.ac.uk/arch/archref.shtml. It is not totally
>>> straightforward for us to
>>> implement these links, due to the way the Archives Hub is set up as a
>>> distributed system with
>>> machine interfaces.
>>>
>>> We use <otherfindaid> to indicate other finding aids for the same
>>> material:
>>> http://www.archiveshub.ac.uk/arch/other.shtml
>>>
>>> So that differs from linking to other finding aids that are related
>>> but not representing the same
>>> material. The guidelines do say that it is for 'Information about
>>> additional or alternative guides
>>> to the described material'. When contributors use this tag, they are
>>> usually pointing to a more
>>> detailed resource rather than the same content in a different format,
>>> but I would assume that it
>>> could be the same.
>>>
>>> Jane.
>>>
>>> **************************************************************
>>> Jane Stevenson
>>>
>>> Archives Hub Co-ordinator
>>> http://www.archiveshub.ac.uk
>>>
>>>
>>> Michele R Combs wrote:
>>> > My understanding of archref is that it's for links to other
>>> collections of archival material. We
>>> > use archref to link to related collections. For example, we would
>>> use archref to link from the
>>> > finding aid for the papers of John Smith Jr. to the finding aid
>>> for the papers of his father,
>>> > John Smith Sr., or the papers of his son, John Smith III.
>>> >
>>> > My understanding of otherfindaid is that it's for links to finding
>>> aids that are different in
>>> > content, not just in file format. For example, we might include
>>> in the otherfindaid section a
>>> > link to an Excel spreadsheet that provides a finer level of detail
>>> for a set of John Smith Jr's
>>> > photographs, or a link to a published catalog of John Smith Jr's
>>> letters, or similar.
>>> >
>>> > To simply point to another version of the online finding aid, we
>>> have a link at the top of each
>>> > one that says "Printer friendly version." This link is generated
>>> by our XSL style sheet and is
>>> > not hard-coded into our EAD.
>>> >
>>> > Michele
>>> >
>>> > (be green - don't print this email!) ~~~~~~~~~~~~~~~~~~ Michele
>>> Combs Manuscripts Librarian
>>> > Special Collections Research Center Syracuse University Libraries
>>> 222 Waverly Ave. Syracuse, NY
>>> > 13244 315-443-2081 [log in to unmask] <mailto:[log in to unmask]>
>>> ~~~~~~~~~~~~~~~~~~
>>> >
>>> >
>>> >
>>> >
>>> > -----Original Message----- From: Encoded Archival Description List
>>> [mailto:[log in to unmask] <mailto:[log in to unmask]>] On Behalf
>>> > Of Franks, Russell Sent: Tuesday, December 08, 2009 4:00 PM To:
>>> [log in to unmask] <mailto:[log in to unmask]> Subject: use
>>> > of <otherfindaid> tag
>>> >
>>> > Hello,
>>> >
>>> > Is anyone using the <otherfindaid> tag to describe or point to
>>> another version of a finding aid,
>>> > such as a PDF version of the finding aid for patrons to dl or print?
>>> >
>>> > According to the tag library <otherfindaid> "is used to indicate
>>> the existence of additional
>>> > finding aids;" and that "The <archref> element may be used to give
>>> a formal citation to the other
>>> > finding aid or to link to an online version of it."
>>> >
>>> > It doesn't appear that the <otherfindaid> tag is limited to other
>>> finding aids created by
>>> > differing institutions or to legacy versions of the same finding aid.
>>> >
>>> > Also do I have to use the <archref> tag to link to the PDF? Since
>>> the PDF version of the finding
>>> > aid is not "separately described archival materials of special
>>> interest", it seems to me that the
>>> > use of the <extref> would be better suited for this purpose.
>>> >
>>> >
>>> > Thanks in advance for your thoughts -
>>> >
>>> > Russell Franks Librarian Special and Archival Collections Phillips
>>> Memorial Library Providence
>>> > College 1 Cunningham Square Providence, RI 02918-0001 401-865-2578
>>> [log in to unmask] <mailto:[log in to unmask]>
>>> > [log in to unmask] <mailto:[log in to unmask]>
>>> http://www.providence.edu/archives
>>> >
>>>
>>>
>>>
>
>
|