I am interested in creating a collection of EAD 2002 (DTD or w3c schema) XML specimens that are made available with an open source license for use in the testing of EAD systems. I'd like to capture specimens that represent a wide diversity of valid encoding practice.
In 2012 CDL is going to be working on an XTF4.0; and I'd like to totally overhaul of the EAD display XSLT to be based on the XSLT we use for the Online Archive of California. I was looking around for EAD that I could include in a test suite, and EAD from the library of congress seem to be the only EAD that are openly available to include in an open source project because they are the product of the US Government and no copyright applies. The problem with those EAD is that they are too consistent for my purposes.
Anticipating that potential contributors of EAD specimens would not want to see their actual finding aids on-line at random sites on the internet, I created a python script that finds all the nouns in text() nodes of an XML document and replaces them with some scrambled non-sense (either pig-latin or a scheme mbklein suggested on the code4lib listserv). I would run any contributions though this greeking procedure. The greeked versions would be made available as part of the open source EAD specimen collection, not the original EAD.
( the greeking script: https://github.com/tingletech/greeker.py )
If you are interesting in collaborating on this project by contributing specimens, or if you have any feedback or suggestions, please email me on [log in to unmask]
Thanks -- Brian