"And with linked data, I am very skeptical about the usefulness of mixing content data with our directional data"

I often hear that mixing content data (full text) with directional data (I understand this as descriptions about full text) lead to bad results.

The question is why Google started a Google Book Scan project and invested many millions of dollars instead of relying merely on the catalog data librarians compiled for over hundred years and handed over to them? Google have had access to library catalogs, and it seems the catalogs are in such a bad shape for the Web they do not appear in Google's services until today. Google is just an example.

We know about the importance of text & data mining (TDM). Librarians want to have the right for unrestricted TDM. Only "full text" can give the full scope for contextualization about what information is stored in libraries. Without context, information is useless. The challenge is that today's catalogs still follow the model of ancient Callimachus' pinakes, codes for inventory lists, designed for librarians, often lacking contextual information and public exposure. Good for identifying items by scholars and experts, not so good for patrons who are looking for extra knowledge and services by linking library items to the Web. With RDF, all kinds of statements / assertions can be recorded: assertions about things in the full text, or in the descriptions, or things on the Web, or descriptions about things on the Web.


On Sun, Mar 8, 2015 at 8:07 PM, James Weinheimer <[log in to unmask]> wrote:
On 3/7/2015 9:37 PM, Martynas Jusevičius wrote:
I find these statements hard to believe. Data is just data. Data,
metadata - there is no difference.

People are using RDF to describe proteins, semiconductor products,
horoscope signs, antique coins and who knows what else. What makes you
think libraries are special? Again, I mean real technical limitations
-- all the history and the "traditional ways of doing things" are
irrelevant here.

There are different types of data, and we experience it in all kinds of ways every day. I have gone into greater detail in those podcasts and presentations I mentioned, but I'll try to redo a little of it here. The differences are subtle, but clear.

Before I begin however, what you have claimed to be history, and traditional ways of doing things, is not history at all. Whether we like it or not, what I described is the way libraries still work. It is what users are supposed to do when they use a library, and if people don't do it, they will get bad results. Of course, few people do it and this explains a lot of the frustration that people currently have with library catalogs.

The solution that libraries have tried is called "information literacy" and "bibliographic instruction" which, instead of fixing library tools to work in a modern environment, means to teach everybody how to use our tools the way they are. In my own opinion, this hasn't worked and everything needs to be rethought, but what I described is not history--unfortunately it is still happening today.

About catalog data, it isn't that it is special, but it is different from the other types of data that you point out. When someone comes to a library, they don't come specifically to search the catalog (or at least, those that do are exceedingly rare). Instead, the vast majority are there because they have a question and want information. My example has been "What were the causes of the War of the Spanish Succession". The catalog does not contain the information I want--the information that can answer my question is contained in the books, journal articles, and other materials in the collection--but if I use the catalog correctly, it can direct me to the resources that have the information I want. In this way, the information found in a catalog is similar to information found on ... traffic signs.

If you want to drive from Rome to Paris, you need signs to help you get there. The better the signs, the better, the easier, and the more enjoyable the trip. Poor signs, or the absence of them (which happens in Italy all the time), can lead to frustration, anger or even disasters.

So, people want and need decent and reliable road signs, but they are very rarely interested in the signs themselves: who made them, where and when, what materials they are composed of and so on. Still, those in charge of the road signs need to know that information, so that they can replace them, update them, add to them, etc.

Using this same reasoning with catalogs and how things are changing, compare this with the person who is interested in the "War of the Spanish Succession" and searches the library catalog. They can sit there quite literally, all day long and not have learned anything about the War. All they see are *catalog records* and if they are to learn about the war itself, they need to get into the books of the collection. But when they search Google, in just a half-an-hour they have gotten some real information. This leads them to expect that library tools will work similar to what works (apparently) so easily and simply on Google, which seems logical but is completely wrong.

Google works with a different type of information: content; library catalogs work by giving people directional information: so even when the searcher does everything correctly, all they see are directions: for general books on the War, look here, For books on the politics look there, For battles, look here, etc.

For those who use catalogs incorrectly, they are practically doomed to disaster and for them it is similar to a driver who hasn't seen a road sign for hours, and ends up at the end of a road in the middle of a field at midnight.

Believe me, this happens to students all the time when they are researching their papers at the last minute! Both end up in tears and/or almost screaming.

Catalogers see this difference in information clearly because they work with the actual materials that people want: the books, the recordings, the maps, etc. all go through their hands. The mistake that many catalogers make (again in my opinion) is that they believe people, who care about the information in the collection (i.e. who want to learn about The War of the Spanish Succession), also care about the catalog records they make. Of course for the public, these records are the equivalent of road signs that help them get where they want to go. They don't care about the road signs and once they reach their destination, they completely forget about all the helpful road signs. I confess I remember only the frustrations and anger during the trips that had lousy signs. I think the same thing happens with catalog records.

While our methods still "work" in a sense, they are strange for people in the 21st century. They need to be, in a sense, translated so that they work in today's environment.

So, all data is definitely not equal. I think there is still a need for our type of data but it needs to be reconsidered. Tools that work well for content data, don't work so well with directional data. And with linked data, I am very skeptical about the usefulness of mixing content data with our directional data. Nevertheless, we should try it, to find out what happens. I would be very happy to be proven wrong.

There are other options, too.