First of all, very well said Roy! 

I think we need to quickly get past the MARC21 conversion conversation and immediately start to understand what metadata management looks like when we have a clear focus on systems that serve the user's convenience in finding things that match their needs.  Let's stop the obsession with this or that format and recognize that we are going to have to experiment with lots of expressions of the data for different consumers we care about.

That doesn't contradict Roy's emphasis on entities; on the contrary it supports it--let's entify the data and then figure out the right channels to most effectively expose it.

I think a focus on this answers most of James' questions below.  I don't think there is a magical linked library catalog with lots of flowing connections from print to video to data sets--that's just incremental improvement on the tool that isn't used much anymore for discovery. 

Building our data in such a way that it can be consumed on the web like the most useful way to get the most value out of library collections.  We should assume that for the moment discovery happens in social networks and on the web.  The library catalog will remain a useful source for harvesting of data and as the system of record for inventory--so it can continue in its role of satisfying offers. 

If we can expose library collection data appropriately library users will continue to use their preferred search engines and the catalog can continue to be an agent for fulfillment.  These are legitimate roles, but we can forget about building magical catalogs that will bring back unicorns to our midst.  ;-)

-Ted

[log in to unmask]" type="cite">
James Weinheimer
February 22, 2016 at 8:33 AM
On 2/19/2016 10:15 PM, Tennant,Roy wrote:
You created a plausible outline that I'm afraid is missing a rather large and important step. For the lack of a better term I'll call it "entification," which is what we call it around here.
...
I get the sense sometimes that the library community doesn't fully grasp the nature of this transition yet, and it worries me. We need to shake off the shackles of our record-based thinking and think in terms of an interlinked Bibliographic Graph. As long as we keep talking about translating records from one format to another we simply don't understand the meaning of linked data and both the transformative potential it has for our workflows and user interfaces as well as the plain difficult and time consuming work that will be required to get us there.

Sure, we at OCLC are a long way down a road that should do a lot to help our member libraries make the transition, but there will be plenty of work to go around. The sooner we fully grasp what that work will be, the better off we will all be in this grand transition. No, let's call it what it really is: a bibliographic revolution. Before this is over there will be broken furniture and blood on the floor. But at least we will be free of the tyrant.

I completely agree that the library community doesn't fully grasp the nature of the transition. We are only at the beginning of a "long, strange trip"--and the resources of some libraries (and librarians themselves!) are almost exhausted already.

All of this in the pursuit of a highly abstract goal: an interlinked bibliographic graph. I haven't come across that term before, but I guess it is a take on the "Giant Global Graph" of Tim Berners-Lee that many people consider to be the ultimate goal of linked data. To achieve this goal of an interlinked bibliographic graph, we see that much will have to be sacrificed, but the revolution will be worthwhile because we will be free of the "tyrant". Once again, I am not sure precisely what you mean here, but I assume the tyrant is the MARC record, which is a "unified bibliographic record" that contains all of the information for a bibliographic item.  (I prefer to call it the "unit record" or the traditional catalog card, which was made to deal with the 19th-century transition from the earlier book catalogs, which were structured quite differently)

The unified bibliographic record found in MARC must undergo "entification," which again, I assume means to turn as much as possible of the current, unified bibliographic record into entities, i.e. URIs, that in turn can be linked to--by anyone, I guess. (that is, if it is to be linked OPEN data. Linked closed data is an entirely different matter) In any case, if all this is done, I completely agree that the data that is now in our bibliographic records will become almost infinitely flexible.

There are a few questions of course. Chief among them, the obvious one:

1) Is this what libraries signed up for? What will be the final costs in terms of budgets, careers, redoing so much yet again? And how long will it take?

2) It remains to be seen whether any of this is what the public wants. I guess I'm just an old-fashioned kind of guy, or maybe just naive, but it seems to me that when people come to a library (either virtually or physically) they come to use the items in the collection, and not to use the catalog. In other words, people do not come to a library, or the library's website, just to look up something in the catalog and then.... go home. They use the catalog to get into the materials in the library's collection. If they already know what they want and where it is, they ignore the catalog. (Maybe they shouldn't but they do)

The best catalogs are those that I can use as quickly and as easily as possible so that I can spend the least amount of time with the catalog and spend the most amount of time in the items I find in the collection. This is why I personally prefer Google. It is not that I spend a great deal of time on Google, but paradoxically, I spend the *least* amount of time there compared to the other search engines. That's why I prefer it.

So, even if we make the "100% entified, interlinked bibliographic graph tool" that brings in information from hither and yon, that gives me charts from the IMF and images from Flickr, videos from YouTube, the latest news from Bing, plus of course, all the Wikipedia info, along with the library materials--and I'll assume here that it will even be on the specific topics I want, that might be great. Pardon my skepticism: I think lots of people would still like to see it in action before concluding that it really is great.

It may be that the idea is to get rid of or replace the catalog completely, but I think the public will continue to demand a quick and easy-to-use list to get into the materials in a library's collection. The proposed linked data tools do not provide this but only adds complexity to the catalog by adding more and more stuff into a search result. It seems to me that we can entify things until Doomsday and it still won't make it one bit easier for the public to find materials in library collections.

The problem is: our catalogs have never been easy-to-use, and they blew out even worse when they went online with keyword. There are tons of problems and those issues have yet to be addressed. But just because the public doesn't like to use library catalogs doesn't mean that they do not want a "listing of materials" in the collection they are using. And that list should be made as simple to use as possible. Such a listing  is also called a catalog. A lot could be done to make it easier to use than it is today. But nobody seems to be talking about that.

But maybe I'm wrong. Maybe the public doesn't want an easy-to-use listing of materials in a library's collection. Like I said, maybe I'm just an old-fashioned kind of guy, or just naive.

James Weinheimer [log in to unmask]
First Thus http://blog.jweinheimer.net
First Thus Facebook Page https://www.facebook.com/FirstThus
Personal Facebook Page https://www.facebook.com/james.weinheimer.35
Google+ https://plus.google.com/u/0/+JamesWeinheimer
Cooperative Cataloging Rules http://sites.google.com/site/opencatalogingrules/
Cataloging Matters Podcasts http://blog.jweinheimer.net/cataloging-matters-podcasts
The Library Herald http://libnews.jweinheimer.net/

[delay +30 days]
Tennant,Roy
February 19, 2016 at 4:15 PM
Eric,
You created a plausible outline that I'm afraid is missing a rather large and important step. For the lack of a better term I'll call it "entification," which is what we call it around here. This might encompass the creation of your own linked data entities or the use of those created by others (such as, dare I say it, OCLC). In other words, Step 5 is deceivingly simple when in fact it is devilishly complex.

We witnessed this recently when we took a look at some BIBFRAME records produced by a large research university and they were punting on the entification. That is, by simply taking records in MARC and translating them to BIBFRAME in a one-to-one operation, you are basically left with a BIBFRAME record that really isn't linked data at all. You have assertions that are basically meaningless, as they link to nothing and nothing links to them. How many URIs do you think Washington, DC should have? I would argue one, at the very least within your own dataset, but that isn't what you end up with without taking a great deal of time and trouble to do the entification step -- whether using your own data or reconciling your data against someone else's entities, such as LCSH.

I get the sense sometimes that the library community doesn't fully grasp the nature of this transition yet, and it worries me. We need to shake off the shackles of our record-based thinking and think in terms of an interlinked Bibliographic Graph. As long as we keep talking about translating records from one format to another we simply don't understand the meaning of linked data and both the transformative potential it has for our workflows and user interfaces as well as the plain difficult and time consuming work that will be required to get us there.

Sure, we at OCLC are a long way down a road that should do a lot to help our member libraries make the transition, but there will be plenty of work to go around. The sooner we fully grasp what that work will be, the better off we will all be in this grand transition. No, let's call it what it really is: a bibliographic revolution. Before this is over there will be broken furniture and blood on the floor. But at least we will be free of the tyrant.
Roy Tennant
OCLC Research




Eric Lease Morgan
February 19, 2016 at 3:15 PM


Very interesting. Thank you, and based on this input, I’ve outlined a possible workflow for creating, maintaining, and exposing bibliographic description in the form of BIBFRAME linked data:

1. Answer the questions, "What is bibliographic
description, and how does it help facilitate the goals
of librarianship?"

2. Understand the concepts of the Semantic Web,
specifically, the ideas behind Linked Data.

3. Embrace & understand the strengths & weaknesses of
BIBFRAME as a model for bibliographic description.

4. Design or identify and then install a system for
creating, storing, and editing your bibliographic data.
This will be some sort of database application whether
it be based on SQL, non-SQL, XML, or a triple store. It
might even be your existing integrated library system.

5. Using the database system, create, store, import/edit
your bibliographic descriptions. For example, you might
simply use your existing integrated library for these
purposes, or you might transform your MARC data into
BIBFRAME and pour the result into a triple store.

6. Expose your bibliographic description as Linked Data
by writing a report against the database system. This
might be as simple as configuring your triple store, or
as complicated as converting MARC/AACR2 from your
integrated library system to BIBFRAME.

7. Facilitate the discovery process, ideally through
the use of a triple store/SPARQL combination, or
alternatively directly against integrated library
system.

8. Go to Step #5 on a daily basis.

9. Go to Step #1 on an annual basis.

If the profession continues to use its existing integrated library systems for maintaining bibliographic data (Step #4), then the hard problem to solve is transforming and exposing the bibliographic data as linked data in the form of BIBFRAME. If the profession designs a storage and maintenance system rooted in BIBFRAME to begin with, then the problem is accurately converting existing data into BIBFRAME and then designing mechanisms for creating/editing the data. I suppose the later option is “better”, but the former option is more feasible and requires less retooling.


Eric Lease Morgan
Joy Nelson
February 19, 2016 at 1:12 PM
Eric-
I am starting to explore this same issue.  It seems that there are two (probably more) 'road humps' in the process of moving data from marc/marcxml to RDF triples in Bibframe.  The first is the idea of Garbage in/Garbage out.  If the data isn't clean to begin with the transformation to triples will fail (remain as literals, not uris).  The first step in the process should probably involve cleaning the data.

Secondly, the issue of what URI's to use.  Will your system create it's own URI's to use for works?  Or will you reference an existing URI at LOC.  The second benefits the LOC with 'pingback' but doesn't benefit your institution.  It seems that you would be creating your own URI's for works and instances and using existing URI's for things like authors, publishers,etc.    But the choice of URI's will be an issue. 

In our system we store marc as marc and as marcxml.  In my initial thoughts into this process, I'm wondering if the system just needs to become more 'agnostic' in the data format.  If I provide BIBFRAME in RDF/XML then the system should be able to pull out the bits it needs for display.  We would need some logic in the innerworkings to deal with various types of XML data.  And using an indexer on the system that can handle various XML formats would help in searching by users.  (I'm thinking Elastic Search here).   Right now I tend to think of the BIBFRAME descriptions as distinct units that would be similar to a marcxml record.  It is concievable to think that there would be an additional layer on top that would store ALL the triples and use some kind of SPARQL querying/searching???  I don't know about that yet.  An ILS has need for relational database structure since it is transactional.  But...could there be  component that is a graph database??? 

All in all, it's a fun concept to play around with in my head. Where it ends up or how it looks will be an interesting journey.


Joy Nelson
Director of Migrations
ByWater Solutions






Eric Lease Morgan
February 19, 2016 at 12:10 PM


Use case? Hmm… Okay. Say I’m a library who is convinced that BIBFRAME is the way to go. How might I get from where I am with my MARC/AACR2 data to a discovery system rooted in a triple store and “kewl” SPARQL front-end?

Maybe I could put my question a different way. Of the folks who have created sample implementations, what was the process used? [1] Actually, I can (sort of) answer my own question by reading the implementation descriptions. That said, I believe the process to create and maintain new triples for new content is/will be a difficult one.

[1] sample implementations - https://www.loc.gov/bibframe/implementation/register.html


Eric Morgan
Still Lost In Philadelphia