Just an update for everyone. Before working on tweaking the server I started by contacting the system administrator for the vm server and asked him about any potential bottlenecks. He found that there wasn't enough CPU power available. He Jumped from 2 to 4 vCPU's, and 4 GB to 6GB RAM. Things are very speedy now. Thanks all!
Mike
________________________________________
From: Michael Beccaria
Sent: Monday, July 08, 2013 12:07 PM
To: Data, API, website, and code of the Chronicling America website
Subject: RE: Slow Page Load Times and Production Settings
You folks have been very helpful. I will try the django-debug-toolbar app and the other suggestions shortly.
As for images, we are rendering png files. The load times seemed similar to standard jpgs and much faster than tifs and png's had a higher quality level for bi-tonal images which is what we are rendering up. We don't have the fast jpeg2000 decoder libraries. I personally haven't experiences very slow load times with images as of yet. I don't think it's as fast as LOC but I thought that might be because of your jpeg2000 library you are using to render the images.
I have 4GB ram allocated to this server VM and currently (using top command) it says I have 2.2GB used and 1.7GB free with 4GB of swap of which ~2GB is being used. I allocated 1.5GB of ram to Jetty via the JAVA_OPTIONS config settings. This used to be lower but solr choked when loading a large batch at one point so I bumped it up.
Mysql has the defaults:
key_buffer = 16M
max_allowed_packet = 16M
thread_stack = 192K
thread_cache_size = 8
query_cache_limit = 1M
query_cache_size = 16M
I added some lines to the my.cnf after investigating that these values were WAY low for innodb tables.
innodb_additional_mem_pool_size = 16M
innodb_buffer_pool_size = 2048M
It's still slow but I'm starting to think it's a mysql index/memory/configuration performance issue. Vmstat 5 shows occasional swap usage in the si/so columns when trying to view the slow page. Innotop shows a really long query with a state of "Copying to tmp". The query doesn't wrap the txt so I can't see the whole thing. I don't have time now, but I'll explore more on the particular queries that are happening to generate this swapping.
I don't know of any other relevant settings that would indicate RAM usage. Is there any?
Any other Mysql advice?
Mike Beccaria
Systems Librarian
Head of Digital Initiative
Paul Smith's College
518.327.6376
[log in to unmask]
Become a friend of Paul Smith's Library on Facebook today!
-----Original Message-----
From: Data, API, website, and code of the Chronicling America website [mailto:[log in to unmask]] On Behalf Of Summers, Ed
Sent: Monday, July 08, 2013 9:42 AM
To: [log in to unmask]
Subject: Re: Slow Page Load Times and Production Settings
Hi Michael,
In addition to the good advice you got from David and Chris, you might want to try the Django Debug Toolbar:
https://github.com/django-debug-toolbar/django-debug-toolbar
Once enabled you get handy information about database queries, template rendering, etc as you browse through your web app. It is a generally useful tool for any Django project, if you happen to have others.
One thing I noticed is that images take a long time to render as well. This can prove to be a problem even when you put Varnish in front of Chronam, since researchers often interact with a long tail of content. Are you using the TIFF files? Also, how much memory does your server make available to Chronam?
//Ed
-----Original Message-----
From: Data, API, website, and code of the Chronicling America website [mailto:[log in to unmask]] On Behalf Of Michael Beccaria
Sent: Tuesday, July 02, 2013 6:50 PM
To: [log in to unmask]
Subject: Slow Page Load Times and Production Settings
I'm using the new code base and have a collection of newspapers that total over 200,000 pages. I know the LOC uses caching software (varnish) to speed up page loads and we currently don't have that running. On our site, the load time for the newspaper list page is probably close to 5 minutes (http://nyshistoricnewspapers.org/newspapers/) but the other pages load relatively quickly.
Is there something wrong in the code or is that expected given the queries that django is trying to execute and I need to install a cache to make it work faster?
This is part of a bigger question. What general recommendations do any of you have for putting this software into production with multi-million page collections? This could be server/network specific (Ram, virtualization, multi-server, etc.) or software specific (caching, settings, etc.). We want to release this site to the public relatively soon and want to gear it up to be ready to get hit by users.
Thanks so much for your suggestions and guidance.
Mike Beccaria
Systems Librarian
Head of Digital Initiative
Paul Smith's College
518.327.6376
[log in to unmask]
Become a friend of Paul Smith's Library on Facebook today!
|