I get the feeling that my response won't help if you're not interested in
helping people work with bibliographic data outside of a particular ILS.
Here are the 4 lines of code I included in my previous email--python reads
pretty well so it's not surprising you didn't see them previously:
#!/usr/bin/env python
from urllib import urlopen
from sys import argv
url = 'http://lccn.loc.gov/%s/marcxml' % argv[1]
print urlopen(url).read()
Ok, so with whitespace and the shebang it's 7 lines :-) The point isn't that
this code is somehow wonderful or special, but that it's possible to perform
similar operations in many different languages using standard libraries.
As with HTTP, there are so many tools available for processing XML it's
sometimes overwhelming. It would depend on the application I'm writing what
data gets extracted. So how long extraction would take is not an question I
can answer generally.
As for diacritics, everything seems to be its right place, at least xmllint
says so. I was surprised to see ISO-8859-1 being used with XML character
entities--but it seems fine. At least it's not MARC-8 :-) Take a look at
these if you are curious:
http://lccn.loc.gov/75960069/marcxml
http://lccn.loc.gov/2004386351/marcxml
I'm not suggesting this new lccn service replaces or is even better than
Z39.50 or SRU. It's clearly just a means for identifying a bibliographic
record with a predictable URL--whereas Z39.50 and SRU are about querying a
set of bibliographic records.
I hope that this didn't muddy the waters more :-)
//Ed
|