Just mucking about, I seem to get from 100-200 triples from non-extreme
MARC->BIBFRAME records. And my recent count of potential "data elements"
in MARC is around 1400 if you include fixed fields. All but a few
elements are repeatable. It definitely adds up.
That said, not all libraries are OCLC-scale. A medium-sized public
library's catalog of 500,000 would probably have some tens of millions
of triples. That's assuming, of course, that we continue with the local
The other consideration is the low level of redundancy in some areas of
our data (titles especially). We've always designed for certain kinds of
high-impact searches on that data.
On 4/10/15 7:32 AM, LeVan,Ralph wrote:
> Millions of triples? Puhleeze.
> At OCLC we've got >300M bib records, around a billion article records and billions of holdings records. That's going to be a *lot* of triples.
> -----Original Message-----
> From: Bibliographic Framework Transition Initiative Forum [mailto:[log in to unmask]] On Behalf Of Martynas Jusevicius
> Sent: Friday, April 10, 2015 4:25 AM
> To: [log in to unmask]
> Subject: Re: [BIBFRAME] How is Bibframe data stored?
> Usage patterns do matter, I agree. But if we're only talking about an order of millions of triples, there is no reason to believe that a triplestore could not perform adequately in real-world usage scenarios. This has been done already many times.
> To my knowledge, BIBFRAME was designed as an RDF vocabulary and intended for Linked Data use. So it's not me prescribing RDF. RDF is the natural data model of choice here, while RDBMS a source of data from legacy systems. Putting RDF data into an RDBMS makes no sense.
> On Fri, Apr 10, 2015 at 11:09 AM, Ross Singer <[log in to unmask]> wrote:
>> Martynas, the number of triples don't matter. It's the usage pattern
>> that does.
>> To answer your question about "Because when you decide a DESCRIBE is
>> not enough and you want a custom CONSTRUCT instead, you simply do that
>> with the triplestore and you're stuck in the RDBMS setup that you suggest."
>> In that scenario, export to a triplestore and do what you need. The
>> vast majority of operations (and libraries) will never need or want to
>> do this, though.
>> And I disagree about using a triplestore for everything and I think it
>> will be a very long time (if ever) that you'll see it embraced by the enterprise.
>> Regardless, it's counter-productive to prescribe a specific technology
>> if the goal is to increase adoption. Especially if you don't know
>> what the needs and constraints are.
>> On Fri, Apr 10, 2015 at 8:58 AM Martynas Jusevičius
>> <[log in to unmask]>
>>> I question I've already asked here, but received no answer to: what's
>>> the ballpark number of triples we're talking about?
>>> Here's a list of large triplestore setups to compare with:
>>> On Fri, Apr 10, 2015 at 10:47 AM, Bernhard Eversberg
>>> <[log in to unmask]>
>>>> 10.04.2015 09:18, Ross Singer:
>>>>> But if you're just using DESCRIBEs, why bother with a triplestore?
>>>>> Why bother storing it natively as RDF at all?
>>>>> BIBFRAME only touches part of a library's data, and it doesn't
>>>>> make much sense to model the rest as RDF. ... Even more
>>>>> unnecessary if it's sole purpose is to enable queries that are not
>>>>> even particularly useful.
>>>> It is unfortunate that not much can be said up to now about the use
>>>> cases to be expected. Though it was very likely deliberate by LC to
>>>> not specify anything about use scenarios when they commissioned the
>>>> development of BIBFRAME - so as not to anticipate anything that
>>>> might be too library specific or backward.
>>>> But one thing is clear: storage methods will have to scale well.
>>>> Very well indeed. Both in terms of data volume and access traffic volume.
>>>> What's the attitude of OCLC in that regard, and the vendors'? I
>>>> mean, they should have some views, based on their experience.
[log in to unmask] http://kcoyle.net