Back to basics -- this general requirement needs to be broken down
into at least two cases, with a wide grey zone between them:
1. Generating EAD or MARC from an Excel spreadsheet (or any
spreadsheet or tabular, i.e. relational, database) that happens to
have (accidentally or by design) all the necessary information and
meta-information correctly captured and encoded. I say
"meta-information" since, crucially, this includes information not
only regarding the records to be converted, but about their relations
(groupings, links etc.).
2. Generating EAD or MARC out of information that is not so
controlled, and requires "judgment" to determine how it should be
represented (in EAD or MARC).
The first thing you need to do is determine which of these you have,
since 1 can be more or less easy, and is supported by lots of tools
and technologies that may not require "programming" to operate, while
2 is not. In the worst cases of 2, automated conversion is not
possible or practical, but has to be supplemented with work on the
data set itself (to make it more like case 1).
(I put "programming" in quotes here since the line between programming
and not programming is itself at issue. On the one had, I am not a
"programmer" according to many programmers; on the other, much of what
archivists actually do -- defining procedures to exploit regularity --
could be described as a kind of programming.)
Essentially Mark's point is "since you need Case 1 in any case, why
not start with a format, e.g. EAD, that gives you the control you need
from the outset, and spares you having to convert it". I agree that
this is preferable whenever possible.
If, however, you're stuck with data in Excel and you need to get it
into some other form, how to do so is a legitimate question -- with
lots of answers, depending critically on where you are on the spectrum
between 1 and 2.
In the optimistic case, a pathway has already been built for you and
all you need to do is figure out which buttons to push. But these
solutions are at the far end of Case 1 -- data converted into EAD or
MARC from Excel files already designed and controlled for that
purpose. (Some of the advice we've seen so far entails how to use such
If you're in the grey zone, Excel itself supports mapping spreadsheet
data into a lightweight XML format, suitable for further processing.
Or export the Excel data to CSV (comma-separated-values) or variants
such as tab-separated or pipe-separated values, which (since they're
plain text) are very generic. Tools for dealing with CSV have also
been mentioned in this thread. How good this conversion is will depend
on how well-fitted the data is to the requirements (expectations) of
the tool, how well the tool is configured, and how good is the data
Since spreadsheet dumps are structurally flat, however, and your
target format generally has some hierarchy, more work is often
necessary to build the pathway for "upconversion" (as we call it).
I.e., "programming". Then too, you almost inevitably have to work
around anomalies that complicate the conversion.
(I know all these points are all elementary to most readers of this
list. But it is worthwhile revisiting elementary principles from time
XSLT and XQuery are both suitable technologies for this work. When I
have a job like this, these days I usually get the data into some form
of XML first (using Excel, or a CSV-mapping tool like the one in the
oXygen XML IDE, or code I write myself), then load it into an XML
database such as BaseX (free and open source), then use XQuery or XSLT
from there to convert it into the target format. (The database is nice
since it makes analysis easier, but strictly speaking unnecessary;
sometimes it's only a development scaffold especially if I'm building
a pipeline for future use.)
So the bad news is you (actually or hypothetically) probably need a
"programmer" to help. It's sort of like financial planning: while
there are some basics that apply generally, making an actual plan
requires knowledge peculiar to you. So either you learn enough about
finances to do the job for yourself, or you provide a financial
planner who works for you with the information they need to do a good
The good news is that the problems and solutions are well understood,
the knowledge is available, and many tools are easily acquired.
Wendell Piez | http://www.wendellpiez.com
XML | XSLT | electronic publishing
Eat Your Vegetables
On Mon, Dec 2, 2013 at 1:44 PM, Mark R. Carlson
<[log in to unmask]> wrote:
> Hello Marsha
> The question I would have is why you can't just encode you data in a
> structured way to begin with (i.e. XML)? I can't imagine that would be any
> less complex than trying to establish complex relationships and having to
> create delimiters for multiple values in a single cell (and verify that they
> are all correct). I realize that Excel is more "user-friendly" to those
> that don't understand cataloging, series/subseries or hierarchies, but my
> experience is that those type of people are not going to be able to create
> anything other than a simple flat container list anyways.
> On 11/29/2013 4:15 PM, Marsha Maguire wrote:
> Greetings, all. If I may, I wanted to revisit a thread from a couple of
> weeks ago (I was out of town when it first came up).
> Kate Bowers said, "Starting with a spreadsheet, you can generate both MARC
> records and EAD components from the same source data."
> But how, exactly (I ask as a non-programmer), is this done? Spreadsheet
> structure being flat and seemingly one-to-one (unless a lot of data is
> repeated in succeeding rows), I'm wondering how to indicate in the
> spreadsheet that a given title (say, of a book or a sound recording)
> described in a spreadsheet row has more than one creator or subject. In
> other words, how should a spreadsheet be set up to handle one-to-many
> relationships, so that its data can be exported and converted to EAD XML or
> MARC XML? May I ask anyone who is kind enough to consider answering this to
> please be specific or provide specific examples?
> Apologies if this is obvious to many of you. Thanks so much.
> Marsha Maguire
> [log in to unmask]