LISTSERV mailing list manager LISTSERV 16.0

Help for ID Archives


ID Archives

ID Archives


ID@LISTSERV.LOC.GOV


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

ID Home

ID Home

ID  December 2011

ID December 2011

Subject:

Re: LCNAF & HTTP requests to id.loc.gov

From:

Ross Singer <[log in to unmask]>

Reply-To:

Authorities and Vocabularies Service Discussion List <[log in to unmask]>

Date:

Mon, 12 Dec 2011 21:14:45 -0500

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (104 lines)

While we're asking questions first and shooting later...

Can we also get another dump of the NAF that is formatted in the same
was the SKOS dumps?

As it stands, you need to load the entire dataset into something
(memory, database, something) to get all of the variant labels (and
whatnot) as a result of the blank nodes.

Ok, so it doesn't need to be SKOS, necessarily, but can we get
something that we can sort on subject URI and stream?

Thanks,
-Ross.

On Mon, Dec 12, 2011 at 5:40 PM, Ford, Kevin <[log in to unmask]> wrote:
> Trevor,
>
> The story here is you need to inquire first and code second.
>
> You saw a deficiency in the bulk downloads.  That's a good thing, and something that had been missed on our end.  But, instead of inquiring about this, you unleashed an irresponsible amount of traffic at ID.  And, quizzically, based on my reading of your email, you believe one of the better solutions is to slow your crawl versus engaging us and passing on the very valuable information that blocking your heedless crawl unintentionally elicited.  You note the "slow" nature of crawling; you should be mindful of the numbers: it would take 23 days of retrieving 4 names per second to crawl the entire Names file at ID.  That strikes me as inefficient, even if a computer is doing all the work.
>
> This is in contrast to the three days, or so, that it takes to generate the bulk downloads, which it is high time we did.  And, I assure you, learning this *before* we do that work means that it'll get fixed in a timely manner.  That is why communicating a problem with the ID service and/or data should be the first course of action, and is often the best course.  I'm confident other users of the bulk download files will benefit from our addressing this issue also.
>
> So, thanks for drawing our attention to this problem, even if it was in a rather circuitous manner requiring gigabytes of network traffic and far, far more effort than exchanging a couple of emails would have.  You'll be unblocked at some point in the next couple of days.
>
>
> Kevin
>
> --
> Kevin Ford
> Network Development and MARC Standards Office
>
> Library of Congress
>
>
>> -----Original Message-----
>> From: Ford, Kevin
>> Sent: Monday, December 12, 2011 4:05 PM
>> To: Ford, Kevin
>> Subject: RE: [ID.LOC.GOV] LCNAF & HTTP requests to id.loc.gov
>>
>>
>>
>> From: Authorities and Vocabularies Service Discussion List
>> [mailto:[log in to unmask]] On Behalf Of Trevor Thornton
>> Sent: Monday, December 12, 2011 2:59 PM
>> To: [log in to unmask]
>> Subject: [ID.LOC.GOV] LCNAF & HTTP requests to id.loc.gov
>>
>> To Whom It may Concern-
>>
>> (Apologies in advance if you receive 2 versions of this message - I
>> submitted it via your web form also)
>>
>> I am an applications developer at the New York Public Library. We are
>> creating a tool to assist our metadata catalogers in using terms from
>> authorized sources (currently just LC authorities and Getty thesauri).
>> The first step is to get all of the terms into a centralized database.
>>
>> I've been working from your RDF downloads, and have been able to get
>> all of the information I need for LCSH and LCGFT from those. I had a
>> problem, however, with the data included with the LCNAF downloads. The
>> MADS/RDF file does not include the type of name (e.g. personal,
>> corporate, conference, title, etc.). This is specified in the
>> individual records with a distinct wrapper element (e.g.
>> 'madsrdf:CorporateName').
>>
>> It's important for us to be able to easily differentiate between types
>> of names, therefor I need to record this our the database. Since it is
>> not included in the download, I've been using the LCNAF to VIAF RDF
>> file as a sort of manifest, and sending an HTTP request for each LCNAF
>> URI to retrieve the full record, then extracting the name type by
>> evaluating the name of the wrapper element.
>>
>> This was working pretty well, though it was slow. Today I tried to
>> multithread this task, effectively doubling the number of hits to your
>> server. This resulted in my being blocked. I was afraid that would
>> happen, and I'm sorry that I did not notify you in advance.
>>
>> So I have 2 questions:
>>
>> 1. Is there a better way to retrieve name types for the records in
>> LCNAF, one that doesn't involve fetching each individual record from
>> id.loc.gov? Perhaps another extract of the data that isn't listed on
>> the site?
>>
>> 2. If there is no better way to get this data, can you unblock me, on
>> the condition that I go back to my slow, single-thread procedure? My IP
>> address is 65.88.88.234.
>>
>> Please let me know if you need any more info from me, and thanks in
>> advance for your help.
>>
>> Sincerely,
>> Trevor Thornton
>>
>> --
>> Trevor Thornton
>> Applications Developer, Information Technology Group The New York
>> Public Library
>> phone: 212-621-0607
>> email: [log in to unmask]

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

November 2019
October 2019
September 2019
August 2019
July 2019
June 2019
April 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2017
July 2016
February 2016
January 2016
December 2015
November 2015
September 2015
August 2015
June 2015
March 2015
February 2015
October 2014
August 2014
July 2014
June 2014
March 2014
January 2014
November 2013
September 2013
August 2013
June 2013
May 2013
April 2013
March 2013
December 2012
November 2012
October 2012
September 2012
August 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
April 2011
March 2011
February 2011
January 2011
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
November 2009
June 2009
May 2009

ATOM RSS1 RSS2



LISTSERV.LOC.GOV

CataList Email List Search Powered by the LISTSERV Email List Manager