I would love to store all this stuff with the open-oni github organization: 


We could start another repo to collect scripts. I'd be happy to start a new repo that others could push to. Any thoughts on organization? If we create a repo for scripts, it will get it's own wiki, and you could use that to start documentation. Or just to link to documentation elsewhere. 

(for those wondering what open-oni is, see Karen Estlund's June 30 email to this group about the meeting where we created this new github organization and fork of the chronam software.)

Karin Dalziel
Digital Design/Development Specialist
Center for Digital Research in the Humanities, University of Nebraska-Lincoln
[log in to unmask]

On Tue, Aug 4, 2015 at 9:38 AM, Michael Bolton <[log in to unmask]> wrote:

Thanks for the update!

I now have two instances of institutions using locally developed scripts and procedures for ingesting local content.  I think we may have an opportunity here.  I am thinking we can build on these local scripts and come up with a package, of sorts, that helps prepare batches for ingest.

I have been researching this for a while and the information on the Guidelines and Resources page at LOC has been very helpful ( http://www.loc.gov/ndnp/guidelines/ ).  I also have been reading "Guidelines for Digital Newspaper Preservation Readiness" by Katherine Skinner and Matt Schultz ( downloaded from Educopia Institute http://educopia.org/publications/gdnpr ).  I like the way the document lays out the digitization process from identifying and inventorying content all the way to packaging.  They make a number of recommendations and suggestions as well as pointing out tools that would help at each stage of the process.  As a starting point, I would suggest we use the paper as a guide for developing a workflow.  

Stephanie, I would be interested in seeing a sample of the spreadsheet you use to prepare the batches. Using spreadsheets seems to be a common way of collecting metadata. I would also be interested in seeing how you convert that to METS files.  I have a copy of the process used at UO and am reviewing it now.  I think they use XML files to prepare the batches.  I will followup on that.

If we think it will help, I will also see about starting a Google Doc to keep up with what we find.

And thanks again for volunteering.  I think this is going to be a fun project.

On Mon, Aug 3, 2015 at 10:27 AM, Williams, Stephanie <[log in to unmask]> wrote:
Hi, Michael!

We're in the same boat here in NC, I think--we're not NDNP awardees, but we're creating batches on our own, according to NDNP standards.

We do have some scripts to help us with this process, but (fortunately or unfortunately, depending on your perspective*) it all starts with batch-level spreadsheets. These serve as the base for generating METS files, issue-level directories, and batch manifests.  We've never done any updating of the Chronam MySQL by hand, because we're still letting Chronam pull in MARC data/populate its own lists.  This isn't without problems, but it's ok. The one major change we've made there is to use a Worldcat API as the source of MARC data instead of chroniclingamerica.loc.gov--most of our newspapers fall outside of NDNP selection guidelines and aren't represented there. For items without LCCNs (student newspapers, small community papers, corporate papers) we assign them: we're lucky to have a very helpful cataloging department one floor up.

If hearing any more about our process sounds like it might be helpful, please contact me--we'd be happy to talk.

Thanks, and good luck,

Stephanie Williams
North Carolina Digital Heritage Center
[log in to unmask]

*It works for us. It's time-intensive, but we've been experimenting with tools to help us generate page-level data while we scan, which is a huge help. We preserve the spreadsheets alongside the end-result batches; when changes are made, we make them in the spreadsheets and regenerate the METS/manifests rather than edit by hand.

From: Data, API, website, and code of the Chronicling America website [[log in to unmask]] on behalf of Michael Bolton [[log in to unmask]]
Sent: Monday, August 03, 2015 10:49 AM
To: [log in to unmask]
Subject: Deplolying Chronam for local holdings

Hello All,

The Texas A&M University Libraries is working on a project to digitize our campus newspapers and we believe Chronam would be a great system for viewing and managing the collection.  We have the viewer installed and have ingested a couple of sample batches and the system appears to be working very well.  We would now like to start adding our local content.

We are looking for some guidance on how to prepare batches for a local ingest, that is, a non-NDNP submission as I have learned its called.  I would be interested in hearing how other institutions prepare their batches and just what is required for an ingest of a batch.  All our experience has been with sample batches downloaded from LOC.  We have been using the technical guidelines for the NDNP project as a roadmap and those have been very helpful.

We are starting with TIFFs and based on the information from the guidelines, we are creating the compressed JPEG2000 files as well as the OCR files.  If there are scripts or programs that help with this process, such as appending the metadata to the JP2 files or creating the METS files, I would be happy to hear about them.  I also believe we probably need to update the MySQL database with information for our site, possibly the "titles" table.  The folks at the University of Oregon Libraries have been very helpful and they suggested I post to this list for any additional information.


Michael W. Bolton  |  Assistant Dean, Digital Initiatives
Sterling C. Evans Library  |  Texas A&M University
5000 TAMU  |  College Station, TX  77843-5000

Michael W. Bolton  |  Assistant Dean, Digital Initiatives
Sterling C. Evans Library  |  Texas A&M University
5000 TAMU  |  College Station, TX  77843-5000