LISTSERV mailing list manager LISTSERV 16.0

Help for BIBFRAME Archives


BIBFRAME Archives

BIBFRAME Archives


BIBFRAME@LISTSERV.LOC.GOV


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Monospaced Font

LISTSERV Archives

LISTSERV Archives

BIBFRAME Home

BIBFRAME Home

BIBFRAME  July 2014

BIBFRAME July 2014

Subject:

Re: BibFrame and Linked Data: Identifiers

From:

LAURA DAWSON <[log in to unmask]>

Reply-To:

Bibliographic Framework Transition Initiative Forum <[log in to unmask]>

Date:

Wed, 16 Jul 2014 11:49:58 -0400

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (282 lines)

Thomas, this is actually the most cogent description of real world
identifiers vs URIs that Išve ever read. This will help me explain some
things internally, and I really appreciate it.

The waters muddy further when 3rd parties assign ISBNs on behalf of
publishers who refuse to use them. This is a new development since 2007,
and it doesnšt happen very often, but it does happen.

The new ISNI (for authors, illustrators, publishers, & other creators)
removes a lot of the internal semantics from the number string (except
there is a check digit, which occasionally renders as an X). Therešs no
ŗpublisher prefix˛, for example - because ISNIs get assigned by the
central assignment agency at the request of organizations that need to use
them. So they really are dumb numbers, which means that they are ideal for
identification. Wešve rendered ISNIs as URIs with the following
standardized protocol: http://isni.org/isni/1234567891234567, but of
course each organization will use different URIs incorporating the ISNI
number for their own purposes.



On 7/16/14, 11:41 AM, "Thomas Berger" <[log in to unmask]> wrote:

>-----BEGIN PGP SIGNED MESSAGE-----
>Hash: SHA1
>
>Joerg,
>
>Am 14.07.2014 22:34, schrieb [log in to unmask]:
>> For implementations regarding the processing of identifiers, there are
>> several steps to take care of:
>>
>> - parsing library catalog data for identifiers for finding equivalences.
>> Bibliographic identifiers could be found from before the World Wide Web
>> epoch and do not obey known semantics, like ISBN, originating from early
>> 1970s on, and identifiers which do obey strict semantics and are genuine
>> "new" web resources (as denoted by URIs)
>
>Just for the record:
>ISBN, especially since 2007, have very strict semantics, at least when
>you ask the isbn agency (their FAQ will gladly tell).
>
>Sadly especially libraries took their part in blurring that semantics,
>but that is a different story.
>
>Real world identifiers like ISBN play an important role and it would be
>bold to deny that or to assume that this role has become obsolete because
>there is now the new and shiny semantic web and its URIs solving any
>problems the legacy identifiers are stuck with.
>
>In a sense, real world identifiers are quite different from semantic
>web identifiers. ISBNs for instance have an internal structure
>reflecting the delegation of the identifier space to agencies and
>publishers, and they have a check digit for processing. And contrary
>to semantic web principles these properties are "leaked" on purpose
>to the community. This helps recognizing ISBNs in different contexts,
>validating them and hunting down ressources - very practical tasks
>in the real world but of no relevance to the semantic web.
>
>
>> - the canonicalization of identifiers; this challenge is not new, it is
>>the
>> problem of building internal representations of identifiers in a
>>computer,
>> for example so they can be used as keys denoting the same value
>
>Emphasis on *internal* representation. Obviously live gets much
>easier when I settle onto one form of ISBNs or make a decision
>to use info:isbn or urn:isbn URIs for my processing purposes. But
>since these different representation systems do exist, canonicalization
>cannot be universal.
>
>This is connected with the advice formulated last week on this list,
>namely not to make statements with "foreign" identifiers in
>subject position.
>
>The "semantic web" way of communications seems to be aligned according
>to the following pattern:
>
>1.
>Hi, this is facebook(tm) speaking:
>< http://www.facebook.com/X > is a user (account) with some
> associated properties as follows
>
>2.
>Hi, this is Yahoo speaking:
>< http://www.yahoo.com/Y > is a mail user (account) with some
> associated properties
>
>3.
>Hi, this is me speaking:
>I interpret < http://www.facebook.com/X > and < http://www.yahoo.com/Y >
>as foaf:Persons and they should be considered the same.
>
>
>Last week, this was also the recommended way to deal with "books",
>i.e. bibframe expressions representing FRBR expression records
>for typical library holdings: Even if the LC URI for a
>resource is known, library X should not add custom statements with
>that URI in subject posistion, but rather craft an URI under its
>own command, express the statements with that URI and then make
>a statement to express equivalence of that URI with the LC one.
>If I recall correctly the main motivation for that "indirect" way
>were concerns about graph pollution in absence of universally
>employed mechanisms for keeping track of provenance.
>
>ISBN talk in contrast to this:
>
>a.
>Publisher:
>I created a resource and assigned ISBN 123 to it.
>
>b.
>Distributor:
>Please notice that the resource with ISBN 123 is now
>available through me
>
>c.
>Libray:
>We acquired the resource with ISBN 123 for immediate use
>in circulation
>
>d.
>Me:
>Please obtain the resource with ISBN 123 for me
>
>The real world identifier here serves to purpose to transcend
>the individual identifiers (in the publishers database, the
>distributor's, the library's - I'm certain they do exist)
>beyond their narrow context and does this without recurrence
>on connecting statements like 3. above: By using ISBNs in
>communications you adhere to the general semantics issued by
>the ISBN agency and implicitly accept the blanket statement
>that all resources with that ISBN are considered equivalent
>in their aspects corresponding to FRBR expressions.
>
>The challenge for semantic web applications is now, to model
>this quite flexible behaviour of real world identifier systems
>without twisting reality (like stating that ISBNs are URIs
>or only URI reprensentations of ISBNs are proper identifiers ...)
>
>
>> Not every identifier must represent a resource. With help of the
>>Semantic
>> Web, identifiers can easily be dereferenced and can uniquely identify
>> resources on the web. Such a resource may consist of all these
>> parsing/canonicalizing/formatting. So it is possible to build
>>algorithms to
>> reach consensus about identity of resources on the web.
>
>Not sure that I can follow you:
>* In the semantic web there may be ressources without URIs,
> but URIs (they have got an R inside?) which do not represent
> a resource identify what?
>* Equally there are ISBNs reserved for a publisher but not yet
> assigned and therefore one can say that this ISBN does not
> (yet) exist
>* On the other hand there are namespaces and number spaces thus
> future URIs and real world identifiers from some identifier
> scheme will fit into some pattern: A feature important for
> real world identifiers and meaningless for URIs (don't even try
> to infer from the URI that a resource implicitly belongs to
> a certain dataset)
>* Algorithms stating the equivalence of differently formatted
> numbers can only apply to real world identifiers, or strings
> not acting as URIs.
>* If one would like to translate between urn:isbn URIs and
> the corresponding info:isbn URIs one should set up a web
> service feigning a gigantic store of individual statements
> Probably thats the way to go, maybe even in a centralized
> way: Whenever an identifier system is common enough bot
> does not have an universally accepted way of expressing it
> exclusively in URIs a "bibframe ecosystem" could provide
> the transformations necessary to perform comparisons between
> different identifiers.
>
>
>> Not surprisingly, we can apply a consensus algorithm also to legacy
>> identifiers, but with weaker semantics. For example, it is possible to
>> implement recognizers for all known forms of ISBN, and write conversions
>> from one form to another, without losing the context that all forms
>>shall
>> describe the same resource. This is not Semantic Web, just theoretical
>> computer science.
>
>Not sure how computer science enters the stage: Isn't dealing
>with real world identifiers in a semantic web context a problem
>of (applied) semiotics?
>
>I really think that there is an abstract ISBN identifier space
>(as a discrete, bounded or at least enumerable set) allowing
>different string representations (think of abstract numbers and
>their representation in the decimal system or the binary system
>with certain flavors (how to provide dots and commas in different
>cultures or endianness issues in binary reprentations). These
>representations may happily coexist as long as there is not one
>string corresponding to two different abstract identifiers depending
>on the representation scheme.
>These abstract identifiers can also be used to parameterize the URI
>space some agent uses to denote the appropriate resources.
>Thus a mapping between abstract identifiers and URIs is possible.
>
>I think the abstraction introduced by bf:Identifier goes exactly
>in the right direction, however there still major points to solve
>before making it useful. Namely
>
>* also URIs should be permitted as identifierValue , not only literals
>
>* identifierAssigner and/or identifierScheme are important when it
> comes to deciding wether I want or can compare two bf:Identifier s.
> Thus we need a bibframe registry for this or delegate this to
> the existing ISIL registry or authority files of the users choice?
>
>* identifierStatus presupposees that all business cases for an
> otherwise unknown number of identifier systems are known in
> advance and thoroughly understood by us and therefore we can
> create a bf:vocabulary for that element?
>
>Certainly this path also has some dangers: It is way outside the scope
>of bibframe to construct a comprehensive model for the library universe
>with all its agents and services as a consistent dataset in the
>semantic web just to deal with ISBNs the proper way...
>
>
>> An interesting challenge for library catalogs would be if e.g. the
>> publishing industry started to move their ISBN numbering system into the
>> web and introduced URIs, also for existing ISBNs. Then we'd have two
>> aspects of the same thing - the web resource of the ISBN and the legacy
>> ISBN thing, the "string thing". In such cases, a skilled programmer must
>> perform the "heavy lifting" so that a catalog still can ensure that
>> equivalent ISBNs of whatever semantics are still identifying the same
>>thing
>> - since it is the user of the library catalog that is looking for the
>>one
>> and only result that may be denoted by many different forms of an
>> identifier.
>
>A more interesting question would be why the ISBN agency never introduced
>official URIs for their numbers or endorsed the URN:ISBN scheme: Maybe
>the too feared some kind of pollution: Imagine what would happen if
>publishers started to print URIs on the book jackets, probably in
>addition to the existing representations as strings and bar codes. And
>the potential for confusion in the presence of misspellings, typos or
>general cluelessness. Also I think that current business applications
>would not gain from URI versions of ISBNs: The already /know/ what they
>are dealing with.
>
>
>> My perception is that Bibframe is not providing any consensus mechanisms
>> except for "bibframe entities", which are only a small excerpt of the
>> Semantic Web. An open story is when e.g. catalogs are used outside the
>> library community scope, or merged into new data pools, and entities
>>have
>> to be matched, Bibframe and non-Bibframe ones. This is not a new topic
>>and
>> it is not specific to Semantic Web, but it is a strong advantage of the
>> Semantic Web. Maybe there is high hope for improved library catalog data
>> consensus by using Bibframe, but I am about to lose my optimism.
>
>Bibframe will not be in a position to cut the bounds of libraryland with
>real world phenomena. Thus even if it could impose exactly one kind of
>string or - even better - URI representation for ISBNs and other
>identifiers
>for all libraries in the world - what would be gained by that? Publishers
>and patrons will continue to confront us with real world forms of real
>world identifiers and continuously transforming between different
>representations of identifiers or denominations of ressources will remain
>one of the main tasks of our applications.
>
>viele Gruesse
>Thomas Berger
>-----BEGIN PGP SIGNATURE-----
>Version: GnuPG v1
>Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
>iJwEAQECAAYFAlPGnQ4ACgkQYhMlmJ6W47O9YQQAsgm8/Ox/YlSrk354/GEgwUXm
>1v1hMe6wHjgWnOo2WTGAYxIJMHDAQGuIuQQxmpJZzG9okcJAO642M+wvFCAwDXzA
>eoscJ+ylPl7AdGLI1km3edYU2PHce6hAO0A8OaXQ7iuYAd/PQN+7N9YUA+CpGT4a
>29nP5NyjyphnjFF/K+Q=
>=DIlX
>-----END PGP SIGNATURE-----

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

December 2021
November 2021
October 2021
September 2021
August 2021
July 2021
June 2021
May 2021
April 2021
March 2021
February 2021
January 2021
December 2020
November 2020
October 2020
September 2020
August 2020
July 2020
June 2020
May 2020
April 2020
March 2020
February 2020
January 2020
December 2019
November 2019
October 2019
September 2019
August 2019
July 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
July 2011
June 2011

ATOM RSS1 RSS2



LISTSERV.LOC.GOV

CataList Email List Search Powered by the LISTSERV Email List Manager