Iíve been working with data from the Smithsonian Institution Archives (SIA) and also example files published by Staats Bibliothek and have looked at SNAC as well. Iíve written some code in the Ruby language to parse EAC-CPF files to retrieve names and some bibliographic information. I expected the standard to easily enable this work, but am finding that I need to create special-case code that parses names differently depending on where the EAC-CPF file was created because the standard does not specify certain details, but rather leaves them open to definition by the author of the file.
Here are some examples that I hope will illustrate this unexpected challenge:
An EAC file from Yale helpfully separates the first name, last name and birth/death dates, labeling these parts with localType based on MARC codes:
<nameEntry scriptCode="Latn" xml:lang="eng">
<part localType="100a">Cadell, T.,</part>
An EAC file from SIA helpfully separates the first name, last name and birth/death dates, labeling these parts with descriptive english words:
<part localType="forename">Charles D.</part>
These are just two examples. Iíve found quite a few other variations in my explorations. In the documentation of the standard, the localType is optional:
and there are no recommendations that I have seen that attempt to cause people or programs that generate EAC-CPF files to conform to any particular naming convention.
For now, I just read the parts and provide the localType as-is, but it would strike me as much more useful to establishing some conventions, so that these records would become truly machine-readable and useful for automated analysis and sensible display to humans.
Iím relatively new to the archive and library world, so please correct me if Iíve misunderstood anything. I would be interested in hearing how anyone else is solving this problem, since I would guess that others must have run into this before.
Thanks in advance,
p.s. if anyone is interested the code Iím working on can be found here