-----BEGIN PGP SIGNED MESSAGE-----
Am 14.07.2014 22:34, schrieb [log in to unmask]:
> For implementations regarding the processing of identifiers, there are
> several steps to take care of:
> - parsing library catalog data for identifiers for finding equivalences.
> Bibliographic identifiers could be found from before the World Wide Web
> epoch and do not obey known semantics, like ISBN, originating from early
> 1970s on, and identifiers which do obey strict semantics and are genuine
> "new" web resources (as denoted by URIs)
Just for the record:
ISBN, especially since 2007, have very strict semantics, at least when
you ask the isbn agency (their FAQ will gladly tell).
Sadly especially libraries took their part in blurring that semantics,
but that is a different story.
Real world identifiers like ISBN play an important role and it would be
bold to deny that or to assume that this role has become obsolete because
there is now the new and shiny semantic web and its URIs solving any
problems the legacy identifiers are stuck with.
In a sense, real world identifiers are quite different from semantic
web identifiers. ISBNs for instance have an internal structure
reflecting the delegation of the identifier space to agencies and
publishers, and they have a check digit for processing. And contrary
to semantic web principles these properties are "leaked" on purpose
to the community. This helps recognizing ISBNs in different contexts,
validating them and hunting down ressources - very practical tasks
in the real world but of no relevance to the semantic web.
> - the canonicalization of identifiers; this challenge is not new, it is the
> problem of building internal representations of identifiers in a computer,
> for example so they can be used as keys denoting the same value
Emphasis on *internal* representation. Obviously live gets much
easier when I settle onto one form of ISBNs or make a decision
to use info:isbn or urn:isbn URIs for my processing purposes. But
since these different representation systems do exist, canonicalization
cannot be universal.
This is connected with the advice formulated last week on this list,
namely not to make statements with "foreign" identifiers in
The "semantic web" way of communications seems to be aligned according
to the following pattern:
Hi, this is facebook(tm) speaking:
< http://www.facebook.com/X > is a user (account) with some
associated properties as follows
Hi, this is Yahoo speaking:
< http://www.yahoo.com/Y > is a mail user (account) with some
Hi, this is me speaking:
I interpret < http://www.facebook.com/X > and < http://www.yahoo.com/Y >
as foaf:Persons and they should be considered the same.
Last week, this was also the recommended way to deal with "books",
i.e. bibframe expressions representing FRBR expression records
for typical library holdings: Even if the LC URI for a
resource is known, library X should not add custom statements with
that URI in subject posistion, but rather craft an URI under its
own command, express the statements with that URI and then make
a statement to express equivalence of that URI with the LC one.
If I recall correctly the main motivation for that "indirect" way
were concerns about graph pollution in absence of universally
employed mechanisms for keeping track of provenance.
ISBN talk in contrast to this:
I created a resource and assigned ISBN 123 to it.
Please notice that the resource with ISBN 123 is now
available through me
We acquired the resource with ISBN 123 for immediate use
Please obtain the resource with ISBN 123 for me
The real world identifier here serves to purpose to transcend
the individual identifiers (in the publishers database, the
distributor's, the library's - I'm certain they do exist)
beyond their narrow context and does this without recurrence
on connecting statements like 3. above: By using ISBNs in
communications you adhere to the general semantics issued by
the ISBN agency and implicitly accept the blanket statement
that all resources with that ISBN are considered equivalent
in their aspects corresponding to FRBR expressions.
The challenge for semantic web applications is now, to model
this quite flexible behaviour of real world identifier systems
without twisting reality (like stating that ISBNs are URIs
or only URI reprensentations of ISBNs are proper identifiers ...)
> Not every identifier must represent a resource. With help of the Semantic
> Web, identifiers can easily be dereferenced and can uniquely identify
> resources on the web. Such a resource may consist of all these
> parsing/canonicalizing/formatting. So it is possible to build algorithms to
> reach consensus about identity of resources on the web.
Not sure that I can follow you:
* In the semantic web there may be ressources without URIs,
but URIs (they have got an R inside?) which do not represent
a resource identify what?
* Equally there are ISBNs reserved for a publisher but not yet
assigned and therefore one can say that this ISBN does not
* On the other hand there are namespaces and number spaces thus
future URIs and real world identifiers from some identifier
scheme will fit into some pattern: A feature important for
real world identifiers and meaningless for URIs (don't even try
to infer from the URI that a resource implicitly belongs to
a certain dataset)
* Algorithms stating the equivalence of differently formatted
numbers can only apply to real world identifiers, or strings
not acting as URIs.
* If one would like to translate between urn:isbn URIs and
the corresponding info:isbn URIs one should set up a web
service feigning a gigantic store of individual statements
Probably thats the way to go, maybe even in a centralized
way: Whenever an identifier system is common enough bot
does not have an universally accepted way of expressing it
exclusively in URIs a "bibframe ecosystem" could provide
the transformations necessary to perform comparisons between
> Not surprisingly, we can apply a consensus algorithm also to legacy
> identifiers, but with weaker semantics. For example, it is possible to
> implement recognizers for all known forms of ISBN, and write conversions
> from one form to another, without losing the context that all forms shall
> describe the same resource. This is not Semantic Web, just theoretical
> computer science.
Not sure how computer science enters the stage: Isn't dealing
with real world identifiers in a semantic web context a problem
of (applied) semiotics?
I really think that there is an abstract ISBN identifier space
(as a discrete, bounded or at least enumerable set) allowing
different string representations (think of abstract numbers and
their representation in the decimal system or the binary system
with certain flavors (how to provide dots and commas in different
cultures or endianness issues in binary reprentations). These
representations may happily coexist as long as there is not one
string corresponding to two different abstract identifiers depending
on the representation scheme.
These abstract identifiers can also be used to parameterize the URI
space some agent uses to denote the appropriate resources.
Thus a mapping between abstract identifiers and URIs is possible.
I think the abstraction introduced by bf:Identifier goes exactly
in the right direction, however there still major points to solve
before making it useful. Namely
* also URIs should be permitted as identifierValue , not only literals
* identifierAssigner and/or identifierScheme are important when it
comes to deciding wether I want or can compare two bf:Identifier s.
Thus we need a bibframe registry for this or delegate this to
the existing ISIL registry or authority files of the users choice?
* identifierStatus presupposees that all business cases for an
otherwise unknown number of identifier systems are known in
advance and thoroughly understood by us and therefore we can
create a bf:vocabulary for that element?
Certainly this path also has some dangers: It is way outside the scope
of bibframe to construct a comprehensive model for the library universe
with all its agents and services as a consistent dataset in the
semantic web just to deal with ISBNs the proper way...
> An interesting challenge for library catalogs would be if e.g. the
> publishing industry started to move their ISBN numbering system into the
> web and introduced URIs, also for existing ISBNs. Then we'd have two
> aspects of the same thing - the web resource of the ISBN and the legacy
> ISBN thing, the "string thing". In such cases, a skilled programmer must
> perform the "heavy lifting" so that a catalog still can ensure that
> equivalent ISBNs of whatever semantics are still identifying the same thing
> - since it is the user of the library catalog that is looking for the one
> and only result that may be denoted by many different forms of an
A more interesting question would be why the ISBN agency never introduced
official URIs for their numbers or endorsed the URN:ISBN scheme: Maybe
the too feared some kind of pollution: Imagine what would happen if
publishers started to print URIs on the book jackets, probably in
addition to the existing representations as strings and bar codes. And
the potential for confusion in the presence of misspellings, typos or
general cluelessness. Also I think that current business applications
would not gain from URI versions of ISBNs: The already /know/ what they
are dealing with.
> My perception is that Bibframe is not providing any consensus mechanisms
> except for "bibframe entities", which are only a small excerpt of the
> Semantic Web. An open story is when e.g. catalogs are used outside the
> library community scope, or merged into new data pools, and entities have
> to be matched, Bibframe and non-Bibframe ones. This is not a new topic and
> it is not specific to Semantic Web, but it is a strong advantage of the
> Semantic Web. Maybe there is high hope for improved library catalog data
> consensus by using Bibframe, but I am about to lose my optimism.
Bibframe will not be in a position to cut the bounds of libraryland with
real world phenomena. Thus even if it could impose exactly one kind of
string or - even better - URI representation for ISBNs and other identifiers
for all libraries in the world - what would be gained by that? Publishers
and patrons will continue to confront us with real world forms of real
world identifiers and continuously transforming between different
representations of identifiers or denominations of ressources will remain
one of the main tasks of our applications.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
-----END PGP SIGNATURE-----