On Thu, Dec 10, 2009 at 8:44 AM, Jane Stevenson
<[log in to unmask]> wrote:
Hi Chris,
That is interesting. Sounds similar in scale to us - 18,000 descriptions of which the majority are collection-level and several hundred are multi-level. You don't have any problems with the bots following all dynamically generated links within the interface, e.g. for us the refine search links, the hyperlinked index terms, the browse links. My understanding was that this would mean they would effectively be crawling through hundreds of thousands of pages.
cheers,
Jane.
Chris Prom wrote:
Hi Jane,
At the University of Illinois our system has been open to Google and other bots for several years. Over 7,000 collection-level records and several hundered full finding aids are routinely harvested by Google and other bots. Our system is a PHP-driven database application, not static HTML.
We have never run into an issue with server overload. I suspect it would not be a problem for you, since server load is significantly higher to serve up a PHP using our system than it would be to serve up an equivalent page in static HTML.
Best,
Chris Prom
Fulbright Scholar
University of Dundee
United Kingdom
Jane Stevenson wrote:
Hi all,
>>Basically what I'm trying to do is get away from creating static html pages to store on our server and just present the view and print options through xml and xsl.
This has prompted me to think about a rather different question - we're actually thinking of creating static html pages in addition to our XSL generated pages because we want our descriptions to be exposed to Google. Alternatively we could create pre-generated searches. We don't just open up our system to robots due to problems with overloading the system. Has anyone had any experience of this kind of thing? It would be useful to get your thoughts.
cheers,
Jane.
**************************************************************
Jane Stevenson
Archives Hub Co-ordinator
Mimas
University of Manchester
Email: [log in to unmask]
http://www.archiveshub.ac.uk