LISTSERV mailing list manager LISTSERV 16.0

Help for BIBFRAME Archives


BIBFRAME Archives

BIBFRAME Archives


BIBFRAME@LISTSERV.LOC.GOV


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Monospaced Font

LISTSERV Archives

LISTSERV Archives

BIBFRAME Home

BIBFRAME Home

BIBFRAME  July 2014

BIBFRAME July 2014

Subject:

Re: BibFrame and Linked Data: Identifiers

From:

Thomas Berger <[log in to unmask]>

Reply-To:

[log in to unmask]

Date:

Wed, 16 Jul 2014 17:41:02 +0200

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (249 lines)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Joerg,

Am 14.07.2014 22:34, schrieb [log in to unmask]:
> For implementations regarding the processing of identifiers, there are
> several steps to take care of:
>
> - parsing library catalog data for identifiers for finding equivalences.
> Bibliographic identifiers could be found from before the World Wide Web
> epoch and do not obey known semantics, like ISBN, originating from early
> 1970s on, and identifiers which do obey strict semantics and are genuine
> "new" web resources (as denoted by URIs)

Just for the record:
ISBN, especially since 2007, have very strict semantics, at least when
you ask the isbn agency (their FAQ will gladly tell).

Sadly especially libraries took their part in blurring that semantics,
but that is a different story.

Real world identifiers like ISBN play an important role and it would be
bold to deny that or to assume that this role has become obsolete because
there is now the new and shiny semantic web and its URIs solving any
problems the legacy identifiers are stuck with.

In a sense, real world identifiers are quite different from semantic
web identifiers. ISBNs for instance have an internal structure
reflecting the delegation of the identifier space to agencies and
publishers, and they have a check digit for processing. And contrary
to semantic web principles these properties are "leaked" on purpose
to the community. This helps recognizing ISBNs in different contexts,
validating them and hunting down ressources - very practical tasks
in the real world but of no relevance to the semantic web.


> - the canonicalization of identifiers; this challenge is not new, it is the
> problem of building internal representations of identifiers in a computer,
> for example so they can be used as keys denoting the same value

Emphasis on *internal* representation. Obviously live gets much
easier when I settle onto one form of ISBNs or make a decision
to use info:isbn or urn:isbn URIs for my processing purposes. But
since these different representation systems do exist, canonicalization
cannot be universal.

This is connected with the advice formulated last week on this list,
namely not to make statements with "foreign" identifiers in
subject position.

The "semantic web" way of communications seems to be aligned according
to the following pattern:

1.
Hi, this is facebook(tm) speaking:
< http://www.facebook.com/X > is a user (account) with some
   associated properties as follows

2.
Hi, this is Yahoo speaking:
< http://www.yahoo.com/Y > is a mail user (account) with some
   associated properties

3.
Hi, this is me speaking:
I interpret < http://www.facebook.com/X > and < http://www.yahoo.com/Y >
as foaf:Persons and they should be considered the same.


Last week, this was also the recommended way to deal with "books",
i.e. bibframe expressions representing FRBR expression records
for typical library holdings: Even if the LC URI for a
resource is known, library X should not add custom statements with
that URI in subject posistion, but rather craft an URI under its
own command, express the statements with that URI and then make
a statement to express equivalence of that URI with the LC one.
If I recall correctly the main motivation for that "indirect" way
were concerns about graph pollution in absence of universally
employed mechanisms for keeping track of provenance.

ISBN talk in contrast to this:

a.
Publisher:
I created a resource and assigned ISBN 123 to it.

b.
Distributor:
Please notice that the resource with ISBN 123 is now
available through me

c.
Libray:
We acquired the resource with ISBN 123 for immediate use
in circulation

d.
Me:
Please obtain the resource with ISBN 123 for me

The real world identifier here serves to purpose to transcend
the individual identifiers (in the publishers database, the
distributor's, the library's - I'm certain they do exist)
beyond their narrow context and does this without recurrence
on connecting statements like 3. above: By using ISBNs in
communications you adhere to the general semantics issued by
the ISBN agency and implicitly accept the blanket statement
that all resources with that ISBN are considered equivalent
in their aspects corresponding to FRBR expressions.

The challenge for semantic web applications is now, to model
this quite flexible behaviour of real world identifier systems
without twisting reality (like stating that ISBNs are URIs
or only URI reprensentations of ISBNs are proper identifiers ...)


> Not every identifier must represent a resource. With help of the Semantic
> Web, identifiers can easily be dereferenced and can uniquely identify
> resources on the web. Such a resource may consist of all these
> parsing/canonicalizing/formatting. So it is possible to build algorithms to
> reach consensus about identity of resources on the web.

Not sure that I can follow you:
* In the semantic web there may be ressources without URIs,
  but URIs (they have got an R inside?) which do not represent
  a resource identify what?
* Equally there are ISBNs reserved for a publisher but not yet
  assigned and therefore one can say that this ISBN does not
  (yet) exist
* On the other hand there are namespaces and number spaces thus
  future URIs and real world identifiers from some identifier
  scheme will fit into some pattern: A feature important for
  real world identifiers and meaningless for URIs (don't even try
  to infer from the URI that a resource implicitly belongs to
  a certain dataset)
* Algorithms stating the equivalence of differently formatted
  numbers can only apply to real world identifiers, or strings
  not acting as URIs.
* If one would like to translate between urn:isbn URIs and
  the corresponding info:isbn URIs one should set up a web
  service feigning a gigantic store of individual statements
  Probably thats the way to go, maybe even in a centralized
  way: Whenever an identifier system is common enough bot
  does not have an universally accepted way of expressing it
  exclusively in URIs a "bibframe ecosystem" could provide
  the transformations necessary to perform comparisons between
  different identifiers.


> Not surprisingly, we can apply a consensus algorithm also to legacy
> identifiers, but with weaker semantics. For example, it is possible to
> implement recognizers for all known forms of ISBN, and write conversions
> from one form to another, without losing the context that all forms shall
> describe the same resource. This is not Semantic Web, just theoretical
> computer science.

Not sure how computer science enters the stage: Isn't dealing
with real world identifiers in a semantic web context a problem
of (applied) semiotics?

I really think that there is an abstract ISBN identifier space
(as a discrete, bounded or at least enumerable set) allowing
different string representations (think of abstract numbers and
their representation in the decimal system or the binary system
with certain flavors (how to provide dots and commas in different
cultures or endianness issues in binary reprentations). These
representations may happily coexist as long as there is not one
string corresponding to two different abstract identifiers depending
on the representation scheme.
These abstract identifiers can also be used to parameterize the URI
space some agent uses to denote the appropriate resources.
Thus a mapping between abstract identifiers and URIs is possible.

I think the abstraction introduced by bf:Identifier goes exactly
in the right direction, however there still major points to solve
before making it useful. Namely

* also URIs should be permitted as identifierValue , not only literals

* identifierAssigner and/or identifierScheme are important when it
  comes to deciding wether I want or can compare two bf:Identifier s.
  Thus we need a bibframe registry for this or delegate this to
  the existing ISIL registry or authority files of the users choice?

* identifierStatus presupposees that all business cases for an
  otherwise unknown number of identifier systems are known in
  advance and thoroughly understood by us and therefore we can
  create a bf:vocabulary for that element?

Certainly this path also has some dangers: It is way outside the scope
of bibframe to construct a comprehensive model for the library universe
with all its agents and services as a consistent dataset in the
semantic web just to deal with ISBNs the proper way...


> An interesting challenge for library catalogs would be if e.g. the
> publishing industry started to move their ISBN numbering system into the
> web and introduced URIs, also for existing ISBNs. Then we'd have two
> aspects of the same thing - the web resource of the ISBN and the legacy
> ISBN thing, the "string thing". In such cases, a skilled programmer must
> perform the "heavy lifting" so that a catalog still can ensure that
> equivalent ISBNs of whatever semantics are still identifying the same thing
> - since it is the user of the library catalog that is looking for the one
> and only result that may be denoted by many different forms of an
> identifier.

A more interesting question would be why the ISBN agency never introduced
official URIs for their numbers or endorsed the URN:ISBN scheme: Maybe
the too feared some kind of pollution: Imagine what would happen if
publishers started to print URIs on the book jackets, probably in
addition to the existing representations as strings and bar codes. And
the potential for confusion in the presence of misspellings, typos or
general cluelessness. Also I think that current business applications
would not gain from URI versions of ISBNs: The already /know/ what they
are dealing with.


> My perception is that Bibframe is not providing any consensus mechanisms
> except for "bibframe entities", which are only a small excerpt of the
> Semantic Web. An open story is when e.g. catalogs are used outside the
> library community scope, or merged into new data pools, and entities have
> to be matched, Bibframe and non-Bibframe ones. This is not a new topic and
> it is not specific to Semantic Web, but it is a strong advantage of the
> Semantic Web. Maybe there is high hope for improved library catalog data
> consensus by using Bibframe, but I am about to lose my optimism.

Bibframe will not be in a position to cut the bounds of libraryland with
real world phenomena. Thus even if it could impose exactly one kind of
string or - even better - URI representation for ISBNs and other identifiers
for all libraries in the world - what would be gained by that? Publishers
and patrons will continue to confront us with real world forms of real
world identifiers and continuously transforming between different
representations of identifiers or denominations of ressources will remain
one of the main tasks of our applications.

viele Gruesse
Thomas Berger
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iJwEAQECAAYFAlPGnQ4ACgkQYhMlmJ6W47O9YQQAsgm8/Ox/YlSrk354/GEgwUXm
1v1hMe6wHjgWnOo2WTGAYxIJMHDAQGuIuQQxmpJZzG9okcJAO642M+wvFCAwDXzA
eoscJ+ylPl7AdGLI1km3edYU2PHce6hAO0A8OaXQ7iuYAd/PQN+7N9YUA+CpGT4a
29nP5NyjyphnjFF/K+Q=
=DIlX
-----END PGP SIGNATURE-----

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

December 2021
November 2021
October 2021
September 2021
August 2021
July 2021
June 2021
May 2021
April 2021
March 2021
February 2021
January 2021
December 2020
November 2020
October 2020
September 2020
August 2020
July 2020
June 2020
May 2020
April 2020
March 2020
February 2020
January 2020
December 2019
November 2019
October 2019
September 2019
August 2019
July 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
July 2011
June 2011

ATOM RSS1 RSS2



LISTSERV.LOC.GOV

CataList Email List Search Powered by the LISTSERV Email List Manager