The Social Networks and Archival Context (SNAC) project is pleased to announce the release of open source code that extracts names and related identifying information from MARC21 XML archival descriptions and assembles Encoded Archival Context-Corporate Bodies, Persons, and Families (EAC-CPF) identity records. We are also pleased to announce a Web Service to perform the same function taking as input MARC21 or MARC21 XML records and returning EAC-CPF records for download for local use.
The SNAC project has been generously funded by the U.S. National Endowment for the Humanities (2010-2012) and the Andrew W. Mellon Foundation (2012-2014).
The MARC to EAC-CPF code can be downloaded here: https://github.com/twl8n/snac_eac_cpf_utils
The MARC to EAC-CPF Web Service will be found here: http://socialarchive.iath.virginia.edu/dev/
The open source code and Web Service are based on code developed to process WorldCat 2.2M MARC encoded archival descriptions made available to the SNAC project by OCLC. From the WorldCat records, the SNAC project has created 4.5M EAC-CPF records. The code processes each record one-by-one in order to create the EAC-CPF records, and thus many of the resulting records describe the same entity (duplicates). The records extracted and assembled in this first stage of SNAC processing will now be matched against one another and records in the Virtual International Authority File (VIAF). Records deemed to be for the same entity will be combined in this process, with data in matching VIAF records used to enhance data extracted from the WorldCat data. Finally, the resulting set of records will be made available in the public SNAC prototype access and resource. The SNAC project is in the final stages of refining the merge/combine processing, and so records created from the WorldCat data will not be available for at least a few weeks. The project will announce the availability of the data in the SNAC prototype, and we will very much appreciate your exploring the prototype and providing feedback.
Currently under development is code to convert agency history records that were created using the MARC bibliographic format when no dedicated format for such existed. While only a few archives and libraries participated in this activity, the SNAC project will also make the code developed to convert these MARC records into EAC-CPF records as open source and to provide a Web Service. The Smithsonian Institution and the New York State Archives provided the MARC-encoded records used to develop this code.
Also under development is code that extracts names and related identifying information from EAD instances and assembles EAC-CPF records. This code, when completed, will also be made available as open source.
For more information on the SNAC project, please visit http://socialarchive.iath.virginia.edu/index.html
Institute for Advanced Technology in the Humanities
University of Virginia