Thomas, this is actually the most cogent description of real world
identifiers vs URIs that Išve ever read. This will help me explain some
things internally, and I really appreciate it.
The waters muddy further when 3rd parties assign ISBNs on behalf of
publishers who refuse to use them. This is a new development since 2007,
and it doesnšt happen very often, but it does happen.
The new ISNI (for authors, illustrators, publishers, & other creators)
removes a lot of the internal semantics from the number string (except
there is a check digit, which occasionally renders as an X). Therešs no
ŗpublisher prefix˛, for example - because ISNIs get assigned by the
central assignment agency at the request of organizations that need to use
them. So they really are dumb numbers, which means that they are ideal for
identification. Wešve rendered ISNIs as URIs with the following
standardized protocol: http://isni.org/isni/1234567891234567, but of
course each organization will use different URIs incorporating the ISNI
number for their own purposes.
On 7/16/14, 11:41 AM, "Thomas Berger" <[log in to unmask]> wrote:
>-----BEGIN PGP SIGNED MESSAGE-----
>Hash: SHA1
>
>Joerg,
>
>Am 14.07.2014 22:34, schrieb [log in to unmask]:
>> For implementations regarding the processing of identifiers, there are
>> several steps to take care of:
>>
>> - parsing library catalog data for identifiers for finding equivalences.
>> Bibliographic identifiers could be found from before the World Wide Web
>> epoch and do not obey known semantics, like ISBN, originating from early
>> 1970s on, and identifiers which do obey strict semantics and are genuine
>> "new" web resources (as denoted by URIs)
>
>Just for the record:
>ISBN, especially since 2007, have very strict semantics, at least when
>you ask the isbn agency (their FAQ will gladly tell).
>
>Sadly especially libraries took their part in blurring that semantics,
>but that is a different story.
>
>Real world identifiers like ISBN play an important role and it would be
>bold to deny that or to assume that this role has become obsolete because
>there is now the new and shiny semantic web and its URIs solving any
>problems the legacy identifiers are stuck with.
>
>In a sense, real world identifiers are quite different from semantic
>web identifiers. ISBNs for instance have an internal structure
>reflecting the delegation of the identifier space to agencies and
>publishers, and they have a check digit for processing. And contrary
>to semantic web principles these properties are "leaked" on purpose
>to the community. This helps recognizing ISBNs in different contexts,
>validating them and hunting down ressources - very practical tasks
>in the real world but of no relevance to the semantic web.
>
>
>> - the canonicalization of identifiers; this challenge is not new, it is
>>the
>> problem of building internal representations of identifiers in a
>>computer,
>> for example so they can be used as keys denoting the same value
>
>Emphasis on *internal* representation. Obviously live gets much
>easier when I settle onto one form of ISBNs or make a decision
>to use info:isbn or urn:isbn URIs for my processing purposes. But
>since these different representation systems do exist, canonicalization
>cannot be universal.
>
>This is connected with the advice formulated last week on this list,
>namely not to make statements with "foreign" identifiers in
>subject position.
>
>The "semantic web" way of communications seems to be aligned according
>to the following pattern:
>
>1.
>Hi, this is facebook(tm) speaking:
>< http://www.facebook.com/X > is a user (account) with some
> associated properties as follows
>
>2.
>Hi, this is Yahoo speaking:
>< http://www.yahoo.com/Y > is a mail user (account) with some
> associated properties
>
>3.
>Hi, this is me speaking:
>I interpret < http://www.facebook.com/X > and < http://www.yahoo.com/Y >
>as foaf:Persons and they should be considered the same.
>
>
>Last week, this was also the recommended way to deal with "books",
>i.e. bibframe expressions representing FRBR expression records
>for typical library holdings: Even if the LC URI for a
>resource is known, library X should not add custom statements with
>that URI in subject posistion, but rather craft an URI under its
>own command, express the statements with that URI and then make
>a statement to express equivalence of that URI with the LC one.
>If I recall correctly the main motivation for that "indirect" way
>were concerns about graph pollution in absence of universally
>employed mechanisms for keeping track of provenance.
>
>ISBN talk in contrast to this:
>
>a.
>Publisher:
>I created a resource and assigned ISBN 123 to it.
>
>b.
>Distributor:
>Please notice that the resource with ISBN 123 is now
>available through me
>
>c.
>Libray:
>We acquired the resource with ISBN 123 for immediate use
>in circulation
>
>d.
>Me:
>Please obtain the resource with ISBN 123 for me
>
>The real world identifier here serves to purpose to transcend
>the individual identifiers (in the publishers database, the
>distributor's, the library's - I'm certain they do exist)
>beyond their narrow context and does this without recurrence
>on connecting statements like 3. above: By using ISBNs in
>communications you adhere to the general semantics issued by
>the ISBN agency and implicitly accept the blanket statement
>that all resources with that ISBN are considered equivalent
>in their aspects corresponding to FRBR expressions.
>
>The challenge for semantic web applications is now, to model
>this quite flexible behaviour of real world identifier systems
>without twisting reality (like stating that ISBNs are URIs
>or only URI reprensentations of ISBNs are proper identifiers ...)
>
>
>> Not every identifier must represent a resource. With help of the
>>Semantic
>> Web, identifiers can easily be dereferenced and can uniquely identify
>> resources on the web. Such a resource may consist of all these
>> parsing/canonicalizing/formatting. So it is possible to build
>>algorithms to
>> reach consensus about identity of resources on the web.
>
>Not sure that I can follow you:
>* In the semantic web there may be ressources without URIs,
> but URIs (they have got an R inside?) which do not represent
> a resource identify what?
>* Equally there are ISBNs reserved for a publisher but not yet
> assigned and therefore one can say that this ISBN does not
> (yet) exist
>* On the other hand there are namespaces and number spaces thus
> future URIs and real world identifiers from some identifier
> scheme will fit into some pattern: A feature important for
> real world identifiers and meaningless for URIs (don't even try
> to infer from the URI that a resource implicitly belongs to
> a certain dataset)
>* Algorithms stating the equivalence of differently formatted
> numbers can only apply to real world identifiers, or strings
> not acting as URIs.
>* If one would like to translate between urn:isbn URIs and
> the corresponding info:isbn URIs one should set up a web
> service feigning a gigantic store of individual statements
> Probably thats the way to go, maybe even in a centralized
> way: Whenever an identifier system is common enough bot
> does not have an universally accepted way of expressing it
> exclusively in URIs a "bibframe ecosystem" could provide
> the transformations necessary to perform comparisons between
> different identifiers.
>
>
>> Not surprisingly, we can apply a consensus algorithm also to legacy
>> identifiers, but with weaker semantics. For example, it is possible to
>> implement recognizers for all known forms of ISBN, and write conversions
>> from one form to another, without losing the context that all forms
>>shall
>> describe the same resource. This is not Semantic Web, just theoretical
>> computer science.
>
>Not sure how computer science enters the stage: Isn't dealing
>with real world identifiers in a semantic web context a problem
>of (applied) semiotics?
>
>I really think that there is an abstract ISBN identifier space
>(as a discrete, bounded or at least enumerable set) allowing
>different string representations (think of abstract numbers and
>their representation in the decimal system or the binary system
>with certain flavors (how to provide dots and commas in different
>cultures or endianness issues in binary reprentations). These
>representations may happily coexist as long as there is not one
>string corresponding to two different abstract identifiers depending
>on the representation scheme.
>These abstract identifiers can also be used to parameterize the URI
>space some agent uses to denote the appropriate resources.
>Thus a mapping between abstract identifiers and URIs is possible.
>
>I think the abstraction introduced by bf:Identifier goes exactly
>in the right direction, however there still major points to solve
>before making it useful. Namely
>
>* also URIs should be permitted as identifierValue , not only literals
>
>* identifierAssigner and/or identifierScheme are important when it
> comes to deciding wether I want or can compare two bf:Identifier s.
> Thus we need a bibframe registry for this or delegate this to
> the existing ISIL registry or authority files of the users choice?
>
>* identifierStatus presupposees that all business cases for an
> otherwise unknown number of identifier systems are known in
> advance and thoroughly understood by us and therefore we can
> create a bf:vocabulary for that element?
>
>Certainly this path also has some dangers: It is way outside the scope
>of bibframe to construct a comprehensive model for the library universe
>with all its agents and services as a consistent dataset in the
>semantic web just to deal with ISBNs the proper way...
>
>
>> An interesting challenge for library catalogs would be if e.g. the
>> publishing industry started to move their ISBN numbering system into the
>> web and introduced URIs, also for existing ISBNs. Then we'd have two
>> aspects of the same thing - the web resource of the ISBN and the legacy
>> ISBN thing, the "string thing". In such cases, a skilled programmer must
>> perform the "heavy lifting" so that a catalog still can ensure that
>> equivalent ISBNs of whatever semantics are still identifying the same
>>thing
>> - since it is the user of the library catalog that is looking for the
>>one
>> and only result that may be denoted by many different forms of an
>> identifier.
>
>A more interesting question would be why the ISBN agency never introduced
>official URIs for their numbers or endorsed the URN:ISBN scheme: Maybe
>the too feared some kind of pollution: Imagine what would happen if
>publishers started to print URIs on the book jackets, probably in
>addition to the existing representations as strings and bar codes. And
>the potential for confusion in the presence of misspellings, typos or
>general cluelessness. Also I think that current business applications
>would not gain from URI versions of ISBNs: The already /know/ what they
>are dealing with.
>
>
>> My perception is that Bibframe is not providing any consensus mechanisms
>> except for "bibframe entities", which are only a small excerpt of the
>> Semantic Web. An open story is when e.g. catalogs are used outside the
>> library community scope, or merged into new data pools, and entities
>>have
>> to be matched, Bibframe and non-Bibframe ones. This is not a new topic
>>and
>> it is not specific to Semantic Web, but it is a strong advantage of the
>> Semantic Web. Maybe there is high hope for improved library catalog data
>> consensus by using Bibframe, but I am about to lose my optimism.
>
>Bibframe will not be in a position to cut the bounds of libraryland with
>real world phenomena. Thus even if it could impose exactly one kind of
>string or - even better - URI representation for ISBNs and other
>identifiers
>for all libraries in the world - what would be gained by that? Publishers
>and patrons will continue to confront us with real world forms of real
>world identifiers and continuously transforming between different
>representations of identifiers or denominations of ressources will remain
>one of the main tasks of our applications.
>
>viele Gruesse
>Thomas Berger
>-----BEGIN PGP SIGNATURE-----
>Version: GnuPG v1
>Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
>iJwEAQECAAYFAlPGnQ4ACgkQYhMlmJ6W47O9YQQAsgm8/Ox/YlSrk354/GEgwUXm
>1v1hMe6wHjgWnOo2WTGAYxIJMHDAQGuIuQQxmpJZzG9okcJAO642M+wvFCAwDXzA
>eoscJ+ylPl7AdGLI1km3edYU2PHce6hAO0A8OaXQ7iuYAd/PQN+7N9YUA+CpGT4a
>29nP5NyjyphnjFF/K+Q=
>=DIlX
>-----END PGP SIGNATURE-----
|