Print

Print


Thomas,

You seem to have missed part of the discussion about identifiers. I point you to:

http://listserv.loc.gov/cgi-bin/wa?A2=ind1407&L=bibframe&T=0&P=8029

The thread begins here, but unfortunately the archive is not working correctly and some posts (esp. those from LC) do not display:

http://listserv.loc.gov/cgi-bin/wa?A2=ind1407&L=bibframe&T=0&X=39B4FE6748B566AFC0&Y=lists%40kcoyle.net&P=1926

And one of those from LC, 7/11/14, from Ray Denenberg, states:
"·         I believe it has been clearly demonstrated by this discussion that a URI should not be one of the “identifier schemes” for bf:Identifier."

And in the thread that begins on 7/10/14 with a post by Karen Smith-Yoshimura, I believe that we demonstrate that using as subject a URI from a third part does NOT imply that the statement was made by that party. This is one of the fundamental "truths" of the semantic web - that anyone can say anything about anything (AAA), and the URI does NOT indicate provenance of the statement (triple).

kc

On 7/16/14, 8:41 AM, Thomas Berger wrote:
[log in to unmask]" type="cite">
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Joerg,

Am 14.07.2014 22:34, schrieb [log in to unmask]:
For implementations regarding the processing of identifiers, there are
several steps to take care of:

- parsing library catalog data for identifiers for finding equivalences.
Bibliographic identifiers could be found from before the World Wide Web
epoch and do not obey known semantics, like ISBN, originating from early
1970s on, and identifiers which do obey strict semantics and are genuine
"new" web resources (as denoted by URIs)
Just for the record:
ISBN, especially since 2007, have very strict semantics, at least when
you ask the isbn agency (their FAQ will gladly tell).

Sadly especially libraries took their part in blurring that semantics,
but that is a different story.

Real world identifiers like ISBN play an important role and it would be
bold to deny that or to assume that this role has become obsolete because
there is now the new and shiny semantic web and its URIs solving any
problems the legacy identifiers are stuck with.

In a sense, real world identifiers are quite different from semantic
web identifiers. ISBNs for instance have an internal structure
reflecting the delegation of the identifier space to agencies and
publishers, and they have a check digit for processing. And contrary
to semantic web principles these properties are "leaked" on purpose
to the community. This helps recognizing ISBNs in different contexts,
validating them and hunting down ressources - very practical tasks
in the real world but of no relevance to the semantic web.


- the canonicalization of identifiers; this challenge is not new, it is the
problem of building internal representations of identifiers in a computer,
for example so they can be used as keys denoting the same value
Emphasis on *internal* representation. Obviously live gets much
easier when I settle onto one form of ISBNs or make a decision
to use info:isbn or urn:isbn URIs for my processing purposes. But
since these different representation systems do exist, canonicalization
cannot be universal.

This is connected with the advice formulated last week on this list,
namely not to make statements with "foreign" identifiers in
subject position.

The "semantic web" way of communications seems to be aligned according
to the following pattern:

1.
Hi, this is facebook(tm) speaking:
< http://www.facebook.com/X > is a user (account) with some
   associated properties as follows

2.
Hi, this is Yahoo speaking:
< http://www.yahoo.com/Y > is a mail user (account) with some
   associated properties

3.
Hi, this is me speaking:
I interpret < http://www.facebook.com/X > and < http://www.yahoo.com/Y >
as foaf:Persons and they should be considered the same.


Last week, this was also the recommended way to deal with "books",
i.e. bibframe expressions representing FRBR expression records
for typical library holdings: Even if the LC URI for a
resource is known, library X should not add custom statements with
that URI in subject posistion, but rather craft an URI under its
own command, express the statements with that URI and then make
a statement to express equivalence of that URI with the LC one.
If I recall correctly the main motivation for that "indirect" way
were concerns about graph pollution in absence of universally
employed mechanisms for keeping track of provenance.

ISBN talk in contrast to this:

a.
Publisher:
I created a resource and assigned ISBN 123 to it.

b.
Distributor:
Please notice that the resource with ISBN 123 is now
available through me

c.
Libray:
We acquired the resource with ISBN 123 for immediate use
in circulation

d.
Me:
Please obtain the resource with ISBN 123 for me

The real world identifier here serves to purpose to transcend
the individual identifiers (in the publishers database, the
distributor's, the library's - I'm certain they do exist)
beyond their narrow context and does this without recurrence
on connecting statements like 3. above: By using ISBNs in
communications you adhere to the general semantics issued by
the ISBN agency and implicitly accept the blanket statement
that all resources with that ISBN are considered equivalent
in their aspects corresponding to FRBR expressions.

The challenge for semantic web applications is now, to model
this quite flexible behaviour of real world identifier systems
without twisting reality (like stating that ISBNs are URIs
or only URI reprensentations of ISBNs are proper identifiers ...)


Not every identifier must represent a resource. With help of the Semantic
Web, identifiers can easily be dereferenced and can uniquely identify
resources on the web. Such a resource may consist of all these
parsing/canonicalizing/formatting. So it is possible to build algorithms to
reach consensus about identity of resources on the web.
Not sure that I can follow you:
* In the semantic web there may be ressources without URIs,
  but URIs (they have got an R inside?) which do not represent
  a resource identify what?
* Equally there are ISBNs reserved for a publisher but not yet
  assigned and therefore one can say that this ISBN does not
  (yet) exist
* On the other hand there are namespaces and number spaces thus
  future URIs and real world identifiers from some identifier
  scheme will fit into some pattern: A feature important for
  real world identifiers and meaningless for URIs (don't even try
  to infer from the URI that a resource implicitly belongs to
  a certain dataset)
* Algorithms stating the equivalence of differently formatted
  numbers can only apply to real world identifiers, or strings
  not acting as URIs.
* If one would like to translate between urn:isbn URIs and
  the corresponding info:isbn URIs one should set up a web
  service feigning a gigantic store of individual statements
  Probably thats the way to go, maybe even in a centralized
  way: Whenever an identifier system is common enough bot
  does not have an universally accepted way of expressing it
  exclusively in URIs a "bibframe ecosystem" could provide
  the transformations necessary to perform comparisons between
  different identifiers.


Not surprisingly, we can apply a consensus algorithm also to legacy
identifiers, but with weaker semantics. For example, it is possible to
implement recognizers for all known forms of ISBN, and write conversions
from one form to another, without losing the context that all forms shall
describe the same resource. This is not Semantic Web, just theoretical
computer science.
Not sure how computer science enters the stage: Isn't dealing
with real world identifiers in a semantic web context a problem
of (applied) semiotics?

I really think that there is an abstract ISBN identifier space
(as a discrete, bounded or at least enumerable set) allowing
different string representations (think of abstract numbers and
their representation in the decimal system or the binary system
with certain flavors (how to provide dots and commas in different
cultures or endianness issues in binary reprentations). These
representations may happily coexist as long as there is not one
string corresponding to two different abstract identifiers depending
on the representation scheme.
These abstract identifiers can also be used to parameterize the URI
space some agent uses to denote the appropriate resources.
Thus a mapping between abstract identifiers and URIs is possible.

I think the abstraction introduced by bf:Identifier goes exactly
in the right direction, however there still major points to solve
before making it useful. Namely

* also URIs should be permitted as identifierValue , not only literals

* identifierAssigner and/or identifierScheme are important when it
  comes to deciding wether I want or can compare two bf:Identifier s.
  Thus we need a bibframe registry for this or delegate this to
  the existing ISIL registry or authority files of the users choice?

* identifierStatus presupposees that all business cases for an
  otherwise unknown number of identifier systems are known in
  advance and thoroughly understood by us and therefore we can
  create a bf:vocabulary for that element?

Certainly this path also has some dangers: It is way outside the scope
of bibframe to construct a comprehensive model for the library universe
with all its agents and services as a consistent dataset in the
semantic web just to deal with ISBNs the proper way...


An interesting challenge for library catalogs would be if e.g. the
publishing industry started to move their ISBN numbering system into the
web and introduced URIs, also for existing ISBNs. Then we'd have two
aspects of the same thing - the web resource of the ISBN and the legacy
ISBN thing, the "string thing". In such cases, a skilled programmer must
perform the "heavy lifting" so that a catalog still can ensure that
equivalent ISBNs of whatever semantics are still identifying the same thing
- since it is the user of the library catalog that is looking for the one
and only result that may be denoted by many different forms of an
identifier.
A more interesting question would be why the ISBN agency never introduced
official URIs for their numbers or endorsed the URN:ISBN scheme: Maybe
the too feared some kind of pollution: Imagine what would happen if
publishers started to print URIs on the book jackets, probably in
addition to the existing representations as strings and bar codes. And
the potential for confusion in the presence of misspellings, typos or
general cluelessness. Also I think that current business applications
would not gain from URI versions of ISBNs: The already /know/ what they
are dealing with.


My perception is that Bibframe is not providing any consensus mechanisms
except for "bibframe entities", which are only a small excerpt of the
Semantic Web. An open story is when e.g. catalogs are used outside the
library community scope, or merged into new data pools, and entities have
to be matched, Bibframe and non-Bibframe ones. This is not a new topic and
it is not specific to Semantic Web, but it is a strong advantage of the
Semantic Web. Maybe there is high hope for improved library catalog data
consensus by using Bibframe, but I am about to lose my optimism.
Bibframe will not be in a position to cut the bounds of libraryland with
real world phenomena. Thus even if it could impose exactly one kind of
string or - even better - URI representation for ISBNs and other identifiers
for all libraries in the world - what would be gained by that? Publishers
and patrons will continue to confront us with real world forms of real
world identifiers and continuously transforming between different
representations of identifiers or denominations of ressources will remain
one of the main tasks of our applications.

viele Gruesse
Thomas Berger
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iJwEAQECAAYFAlPGnQ4ACgkQYhMlmJ6W47O9YQQAsgm8/Ox/YlSrk354/GEgwUXm
1v1hMe6wHjgWnOo2WTGAYxIJMHDAQGuIuQQxmpJZzG9okcJAO642M+wvFCAwDXzA
eoscJ+ylPl7AdGLI1km3edYU2PHce6hAO0A8OaXQ7iuYAd/PQN+7N9YUA+CpGT4a
29nP5NyjyphnjFF/K+Q=
=DIlX
-----END PGP SIGNATURE-----

-- 
Karen Coyle
[log in to unmask] http://kcoyle.net
m: 1-510-435-8234
skype: kcoylenet