Print

Print


Dear All,

 

New bulk downloads for LCSH have been generated and will be posted this week.  We'll let you know when this happens. We're working on generating new downloads for Names.

 

Looking to the future, we are strongly considering removing derived relationships, such as narrower relationships in LCSH, from the bulk downloads.  Expected benefits include (but are not necessarily limited to):

 

                1) More sustainable design.  It is worth noting that the original MARC source records for the larger datasets restrict most relationships to a single direction.  For example, the MARC source records for LCSH only contain "broader" relationships.

                2) Easier and more frequently produced bulk downloads.

                3) Slimmer downloads, which, while still retaining all that is needed to generate derived relationships, will be more manageable to consumers who may not be interested in those elements of the dataset.

                4) Update feed accurately reflects updated resources.

 

The last point requires more explanation:  Since we derive the 'narrower' relationship as a matter of convenience for the community, there is no mechanism in place to track which 'broader' Concepts are affected whenever a related 'narrower' Concept changes.  For example, if a new dog breed Concept is added to LCSH, the new Concept will include a 'broader' relationship to the more general Concept "Dogs."  The update feed will inform you that a new concept (the new dog breed) now exists but it will not communicate that the "Dogs" resource has also changed (it should now have a new 'narrower' relationship to the new dog breed).

 

Including these derived relationships in the bulk downloads means that downloads must be produced in their entirety each time versus incrementally maintained.  Although we have not fully decided how exactly we'll go about producing bulk downloads in the future, having to include derived relationships currently restricts our options significantly.  This is one of the main reasons we would like to stop including these derived relationships in the bulk downloads.  

 

To address this change, we will make it clear that the bulk downloads do not contain these derived triples and augment the documentation at ID with SPARQL queries that will permit users to derive the relationships themselves.   We may also explore providing the derived relationships as separate downloads, but that is yet to be determined.

 

Obviously, these changes most affect LCSH, but any derived relationships in Names and other large datasets would receive similar treatment.

 

The current web-based service would remain unchanged.

 

Are there users of ID.LOC.GOV that would be negatively impacted by these changes to bulk downloads and who would not find the offset of more frequent bulk downloads mitigating?

 

All the best,

Kevin

 

--

Kevin Ford

Network Development and MARC Standards Office

Library of Congress

Washington, DC