Ross, in this case you're operating a good old data silo, with an RDF export capability. Besides solving the integration issue, RDF also opens a way to gradually move business rules and user interaction logic from (imperative) code to (declarative) data. It enables new generic software design patterns, which I hesitated to go into on this list. But you can take a look at our SWAT4LS about Graphity platform architecture: http://ceur-ws.org/Vol-1320/paper_30.pdf Martynas On Tue, Apr 14, 2015 at 9:21 AM, Ross Singer <[log in to unmask]> wrote: > Martynas, > > You have to understand, *we abandoned using a triple store after using one > for many years*. Advancing RDF and Linked Data was the primary tenet of the > company and everything we produced used the triple store in some capacity. > This wasn't an issue of not understanding RDF or not having used SPARQL in > anger. > > But in the end we abandoned using a native triple store because the things > that they're good at: storing and retrieving data from various sources with > unbounded shapes and performing ad hoc queries on it accounted for less than > 1% of our use cases. > > For the other 99%, triple stores proved inefficient and awkward and required > far more operational scaffolding to scale. > > To answer your question about querying our data, we pre-compute the vast > majority of our common joins into multi-describe graphs, somewhat analogous > to a RDBMS view. We also pre-compute tabular data out of the graphs for > SPARQL select-like functionality. The rest we can just query in Mongo, if we > wanted, but we rarely have a need to (tracking down a support problem, > maybe). It does not support SPARQL, no. There's no use case for it. > > The point is that our data is all modeled to ingest and export RDF quads, so > we're not locked into anything and when we need to run ad hoc queries, we > can ingest our data into a Fuseki instance we run: but that's for a specific > need, which is completely different than the general workflow and operation > of everything else. > > -Ross. > > > On Monday, April 13, 2015, Martynas Jusevičius <[log in to unmask]> > wrote: >> >> Ross, >> >> I wonder how you query your MongoDB store? I don't suppose it supports >> SPARQL? >> >> On Mon, Apr 13, 2015 at 11:51 PM, Ross Singer <[log in to unmask]> >> wrote: >> > Kelley, >> > >> > I stick by the notion that as long as a system can ingest/export data as >> > well-formed RDF graphs, how you store it internally makes no difference. >> > As >> > I've said before, our biggest product is modeled entirely using RDF, but >> > we >> > store the data in a MongoDB document database, because it fits our >> > actual >> > needs better than a triple store. >> > >> > It is possible, btw, to produce an ordered list for author/editor names >> > in >> > RDF, but it's horribly ugly: you can use rdf:Seq >> > http://www.w3.org/TR/rdf-schema/#ch_seq or rdf:List >> > http://www.w3.org/TR/rdf-schema/#ch_list. They each have their pluses >> > and >> > minuses: rdf:List is absolutely awful to work with in any serialization >> > except Turtle (where it's super easy! see: >> > http://www.w3.org/2007/02/turtle/primer/#L2986), but has the downside of >> > being semantically open. That is, you cannot definitively say "these >> > are >> > *all* of the authors and there are no more". >> > >> > rdf:Seq (which is an rdf:Container) is considered closed (i.e. there is >> > no >> > assumption that there would be anything else in the current container >> > that >> > appears somewhere else) but, unfortunately has no syntactic sugar like >> > Collections in Turtle. >> > >> > Containers and Collections being such major pain points in RDF, JSON-LD >> > threw all of it away for a *much* simpler implementation: >> > http://www.w3.org/TR/json-ld/#sets-and-lists >> > >> > All that said, as long as you can serialize your author lists as one of >> > these, model it however suits your needs the best for your regular >> > workflows/needs. >> > >> > -Ross. >> > >> > On Mon, Apr 13, 2015 at 6:12 AM Kelley McGrath <[log in to unmask]> >> > wrote: >> >> >> >> Although much of the discussion on storing bibframe data went over my >> >> head, some things have been niggling at me for a while that maybe are >> >> related to this thread. >> >> >> >> I get that it would be good for us to publish our data as linked data. >> >> I >> >> get that it would be good for us to consume linked data. I get that we >> >> should re-use other people's URIs in our data to save time and reduce >> >> maintenance. I get that we should match our identifiers to other >> >> people's >> >> URIs in order to connect more information. >> >> >> >> However, it has not been clear to me that it makes sense for us to >> >> store >> >> and maintain our data as linked data. And yet, I don't see any >> >> alternative >> >> plan being developed. >> >> >> >> This may be sacrilege, but from what I understand there seem to be >> >> things >> >> that linked data isn't good at. For example, retaining the order of >> >> things >> >> like authors' names or connecting a specific place and publisher >> >> written on >> >> the title page or a book. Sometimes when this has been discussed on >> >> this >> >> list, I get the impression that we shouldn't want to do those things; >> >> that >> >> they're somehow obsolete. >> >> >> >> I can't get my head around that. Maybe you don't need those things for >> >> linking, but I don't think linking is the only thing that we want to do >> >> with >> >> our data. For example, it emerged recently, when MPOW changed to a >> >> discovery >> >> layer that didn't do such a good job with this initially, that the >> >> ability >> >> to generate citations is hugely important to a significant portion of >> >> our >> >> patrons. If you want to generate an accurate citation, you need to know >> >> the >> >> order of the author's names. >> >> >> >> It has been suggested to me that we shouldn't be generating citations, >> >> but >> >> rather storing them as strings. However, again I seem to be missing >> >> something because that doesn't seem very optimal to me. Do you store a >> >> separate string for every format: APA, MLA, Chicago, etc.? What do you >> >> do >> >> when a style guide gets updated? It might not be very easy to update >> >> millions of strings. What if a new citation style is invented and >> >> becomes >> >> popular? It just seems to me to be more flexible and powerful to store >> >> the >> >> individual pieces of data and generate the citations. On the other >> >> hand, >> >> publishing citations as strings might be okay for most use cases. >> >> >> >> MARC records are a single unit. If a record has been edited by multiple >> >> parties, you can't tell who changed what when, which is a challenge for >> >> trouble-shooting and quality control. Linked data statements are >> >> atomistic, >> >> but it sounds to me like it is still hard to say anything much *about* >> >> the >> >> statement other than maybe the domain name used by whoever made it. It >> >> would >> >> be useful to track more about individual statements, such as when they >> >> are >> >> made and whether or not they are currently considered valid (one of the >> >> problems with bad data in the OCLC master record environment is that >> >> even if >> >> you take erroneous information out, all it takes is one batchload to >> >> put it >> >> right back in). >> >> >> >> As some of you know, I have been working on a project to crowdsource >> >> the >> >> parsing of film credits in catalog records (help us out at >> >> http://olac-annotator.org/ ). One result of this is that we have links >> >> between transcribed names in records and their authorized form. It >> >> occurs >> >> to me that this might be a useful thing to record proactively. For >> >> example, >> >> even in a world of identifiers, we still need to choose one of many >> >> possible >> >> versions of a name to display to the user (unless you're going to >> >> display >> >> them all at once in some kind of cluster, which is not very >> >> user-friendly in >> >> many situations). In library cataloging, traditionally, for people the >> >> most >> >> common or the most recent variation is chosen as the preferred one. >> >> However, >> >> if the math changes, you have to wait for a person with NACO powers to >> >> notice this and fix it. This doesn't always happen in a timely fashion. >> >> In >> >> his earliest movies, Laurence Fishburne was credited as Larry Fishburne >> >> so >> >> this is how his name was established. It then persisted in library >> >> catalogs >> >> as Larry Fishburne for long, long after after he made the change (I >> >> think >> >> ten years) . If you had data like this, the computer could do the math >> >> and >> >> display the most current form. >> >> >> >> Name on piece Year and work >> >> Larry Fishburne 1984 The Cotton Club >> >> Larry Fishburne 1985 The Color Purple >> >> Laurence Fishburne 1993 Searching for Bobby Fischer >> >> Laurence Fishburne 1999 The Matrix >> >> Laurence Fishburne 2006 Mission: Impossible III >> >> >> >> (if you look at IMDb's Laurence Fishburne page, they do track all this, >> >> along with the names of the characters he played: >> >> http://www.imdb.com/name/nm0000401/ ) >> >> >> >> With linked data, you can say >> >> >> >> Movie1 -- has actor -- LF123 >> >> Movie1 -- has actor's name credited as -- "Laurence Fishburne" >> >> LF123 -- has been credited as -- "Laurence Fishburne" >> >> >> >> But you can't get all three of those things to connect up, at least not >> >> without using blank nodes, which then makes your data not so shareable. >> >> So >> >> far as I can see, anytime you want to connect the dots between more >> >> than two >> >> pieces of information or say something about a statement, it doesn't >> >> work so >> >> well with triples. This might not be such a problem for linking, but I >> >> think >> >> there are other things we want to do with our data where we may want >> >> this >> >> ability. >> >> >> >> What happens if we implement bibframe and we don't store and maintain >> >> our >> >> data as bibframe triples? We could just keep generating bibframe from >> >> MARC >> >> records, but then we haven't really replaced MARC or gotten more >> >> flexible, >> >> structured data than we already have. >> >> >> >> Alternatively, ILS vendors could come up with another internal format >> >> for >> >> us to store data in. However, I don't know that they have the right >> >> expertise for this nor any economic incentives. If this happened, we >> >> would >> >> also end up with much less portable data. Imagine if bib records were >> >> like >> >> item records and every system had its proprietary format and unique >> >> combination of fields. Anytime you do an ILS migration, there is a lot >> >> of >> >> item data that can't be moved to the new system, either because it's >> >> structured differently or because there is no equivalent field. >> >> >> >> This may be completely wrong-headed and I think I'm using the wrong >> >> vocabulary some places, but I thought I'd throw it out there in case >> >> someone >> >> can enlighten me. >> >> >> >> Kelley