Matt, is this one big set and the receiver sort out the ones they want based on URI? Sally
-----Original Message-----
From: LC Linked Data Service Discussion List <[log in to unmask]> On Behalf Of Miller, Matthew
Sent: Friday, October 25, 2019 2:10 PM
To: [log in to unmask]
Subject: Re: [ID.LOC.GOV] New bulk LCSH export pilot
Hi Steven,
I've added a new download to http://id.loc.gov/download/ called "Cataloging Vocabularies *NEW Pilot*" which contains all the vocabularies from the front page under cataloging as bulk download in MADS, SKOS and both MADS & SKOS serialized as NT, XML and JSON-LD. Let me know if you see any problems or have any questions.
Thanks,
Matt
-----Original Message-----
From: LC Linked Data Service Discussion List <[log in to unmask]> On Behalf Of Steven Michael Folsom
Sent: Thursday, October 24, 2019 2:34 PM
To: [log in to unmask]
Subject: Re: [ID.LOC.GOV] New bulk LCSH export pilot
Not to nag, but do you think this download option for the "smaller" vocabs might happen in the foreseeable future? We're starting to get more and more requests for these in the QA service.
If not, can we forward some of the use cases we're hearing to improve the LC APIs? E.g. folks are asking for display values for http://id.loc.gov/vocabulary/descriptionConventions to include both the label and code. This is something we could do in QA through the label (plus other context information) we provide.
On 10/2/19, 12:17 PM, "Steven Michael Folsom" <[log in to unmask]> wrote:
Yes, exactly. One archive would be great!
On 10/2/19, 11:26 AM, "LC Linked Data Service Discussion List on behalf of Miller, Matthew" <[log in to unmask] on behalf of [log in to unmask]> wrote:
Thanks for the feedback. When you say smaller vocabularies you are referring to, for example, vocabularies under the "Cataloging" section on the homepage? http://id.loc.gov/
I can look into seeing if we can gather up these smaller ones not available on the download page into one archive available to download.
Thanks,
Matt
-----Original Message-----
From: LC Linked Data Service Discussion List <[log in to unmask]> On Behalf Of Steven Michael Folsom
Sent: Tuesday, October 01, 2019 3:46 PM
To: [log in to unmask]
Subject: Re: [ID.LOC.GOV] New bulk LCSH export pilot
Hi Matt,
Yay to more frequent downloads... now we just need to be ready to act on them locally. :)
The efforts to make the files more legible are helpful too.
Along with more frequent downloads, are you considering adding some/all of the smaller vocabularies as downloads? I don't know if others have a need to have these as downloads, or if because we're trying to provide normalized lookup services for these datasets (and others) via QA (https://github.com/samvera/questioning_authority) our needs are unique.
Thanks for progressing id.loc.gov,
Steven
On 9/18/19, 12:19 PM, "LC Linked Data Service Discussion List on behalf of Matt Miller" <[log in to unmask] on behalf of [log in to unmask]> wrote:
Hello,
We are testing a new bulk export process for LCSH and would like to hear any feedback from anyone who uses the bulk downloads. The new bulk files can be found http://id.loc.gov/download/ with the titles LC Subject Headings (LCSH) *NEW Pilot*
New:
- New compacted JSON-LD serialization
- The JSON-LD and XML files are now newline delimited meaning each line in the file is a completely self-contained record
- There are void files for each download with the date the export was created, title, description and MD5 hash of the unzipped download.
- The N-Triple file has records separators now as comments, each group of triples start with “# Start of sh12345678”
- Increased updated frequency, should be updated when new LCSH updates are released monthly.
The same:
- The new LCSH export will contain the same data as before, including broader and narrower relationship but is slightly more verbose.
- It is available in MADSRDF, SKOS and both combined MADSRDF and SKOS in all serializations.
Thinking of removing:
- The current XML dump is one large XML file, the new XML is each record as RDF XML on its own individual line. The current XML file could be used for bulk loading into a triple store, but the current and future NT file could be used in the same way. Is anyone using the current XML dump file for bulk loading?
- The Turtle serialization
Samples:
The first 10 records for MADSRDF & SKOS in all serializations:
https://gist.github.com/thisismattmiller/0691f815478a5dc337e2e140becfc549
Thanks for any feedback,
Matt Miller
|