Following up on this, I’ve been working every now and again on another way to convert an Excel spreadsheet to a DSC container list. Here it is:
Unfortunately I still haven’t written up a document that explains this process in full detail, with step-by-step instructions on how to use it (but since it’s just one XSLT file that’s converting the flat file to a hierarchical EAD document, you don’t even need anything like oXygen to run it, you just need to have an XSLT parser on hand, like Saxon, which has the following free version, http://sourceforge.net/projects/saxon/files/Saxon-HE/). In any event, this process takes advantage of the fact that Excel is just XML in the background, so you can save the file as an XML Spreadsheet 2003 and then convert it to EAD via XSLT. How the hierarchy is created is a bit strange, since it forces you to indicate the level in the first column (in other words, this approach doesn’t force a fixed hierarchy and you can go as deep or as shallow as you want, mixing the two up in a single file without issue, but to do this you’re forced to number the first column with the desired depth, whether 1 – 12, corresponding with the c01 – c12 levels in EAD). Of course, EAD can go even deeper than 12 levels if you just use the “c” elements instead of c01-c12, but since I’ve never seen a finding aid go deeper than level 10, I just kept it at 12 levels (although to make this go deeper, all that you’d need to do is to change the Excel file to allow numbers higher than 12 to be entered into that column).
Aside from allowing varying types of hierarchies, this process also retains “inline tagging.” You can just open up the Excel document to see some examples, but here are a few:
· To wrap something in <emph render=”bold”/>, just put it in bold in the Excel file
· To wrap something in a <title> element, change the font color to red.
· To create paragraphs in a scopecontent note, put a double line break in the cell (hitting alt+Enter two times)
Again, this is pretty early and rough around the edges (hint: hide any columns that you don’t need!), but it’s come in handy for me quite a number of times, especially with auto-numbering containers since Excel is really good with stuff like that.
If anyone uses this approach and has any questions, feedback, suggestions, etc., do let me know!
Users enter folder (or item) data in fields as indicated without punctuation. Date expression is only needed if different from the date range which should be four-digit years only (that is, in order to say "circa" or "March-April 1878").
After entering data, users copy the appropriate shaded cells and paste them into an XML editor at the correct point for the insertion of a folder or item list. (There is no need for an intermediate step, as with previous versions, to get rid of extra tab characters or empty elements.)
· This spreadsheet does a little error-checking (see far right columns) for Harvard University Archives' workflow purposes.
· If you enter a note, you get it in a single paragraph only. Any further customizations or markup are done in the XML editor.
· The "Container" is rendered as a single element with the labels as part of the content, which works well for Harvard University Archives stuff in OASIS but which we would change for use with AT or ArchivesSpace ingest)
· For use with items that have unit IDs (such as our photographs, which have photo numbers) we would have to make an alternative spreadsheet, but it would not be difficult.
Collections Services Archivist
Harvard University Archives
Cambridge, MA 02138
voice: (617) 384-7787
fax: (617) 495-8011