Print

Print


Thomas,

You seem to have missed part of the discussion about identifiers. I 
point you to:

http://listserv.loc.gov/cgi-bin/wa?A2=ind1407&L=bibframe&T=0&P=8029

The thread begins here, but unfortunately the archive is not working 
correctly and some posts (esp. those from LC) do not display:

http://listserv.loc.gov/cgi-bin/wa?A2=ind1407&L=bibframe&T=0&X=39B4FE6748B566AFC0&Y=lists%40kcoyle.net&P=1926

And one of those from LC, 7/11/14, from Ray Denenberg, states:
"·I believe it has been clearly demonstrated by this discussion that a 
URI should not be one of the “identifier schemes” for bf:Identifier."

And in the thread that begins on 7/10/14 with a post by Karen 
Smith-Yoshimura, I believe that we demonstrate that using as subject a 
URI from a third part does NOT imply that the statement was made by that 
party. This is one of the fundamental "truths" of the semantic web - 
that anyone can say anything about anything (AAA), and the URI does NOT 
indicate provenance of the statement (triple).

kc

On 7/16/14, 8:41 AM, Thomas Berger wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Joerg,
>
> Am 14.07.2014 22:34, schrieb [log in to unmask]:
>> For implementations regarding the processing of identifiers, there are
>> several steps to take care of:
>>
>> - parsing library catalog data for identifiers for finding equivalences.
>> Bibliographic identifiers could be found from before the World Wide Web
>> epoch and do not obey known semantics, like ISBN, originating from early
>> 1970s on, and identifiers which do obey strict semantics and are genuine
>> "new" web resources (as denoted by URIs)
> Just for the record:
> ISBN, especially since 2007, have very strict semantics, at least when
> you ask the isbn agency (their FAQ will gladly tell).
>
> Sadly especially libraries took their part in blurring that semantics,
> but that is a different story.
>
> Real world identifiers like ISBN play an important role and it would be
> bold to deny that or to assume that this role has become obsolete because
> there is now the new and shiny semantic web and its URIs solving any
> problems the legacy identifiers are stuck with.
>
> In a sense, real world identifiers are quite different from semantic
> web identifiers. ISBNs for instance have an internal structure
> reflecting the delegation of the identifier space to agencies and
> publishers, and they have a check digit for processing. And contrary
> to semantic web principles these properties are "leaked" on purpose
> to the community. This helps recognizing ISBNs in different contexts,
> validating them and hunting down ressources - very practical tasks
> in the real world but of no relevance to the semantic web.
>
>
>> - the canonicalization of identifiers; this challenge is not new, it is the
>> problem of building internal representations of identifiers in a computer,
>> for example so they can be used as keys denoting the same value
> Emphasis on *internal* representation. Obviously live gets much
> easier when I settle onto one form of ISBNs or make a decision
> to use info:isbn or urn:isbn URIs for my processing purposes. But
> since these different representation systems do exist, canonicalization
> cannot be universal.
>
> This is connected with the advice formulated last week on this list,
> namely not to make statements with "foreign" identifiers in
> subject position.
>
> The "semantic web" way of communications seems to be aligned according
> to the following pattern:
>
> 1.
> Hi, this is facebook(tm) speaking:
> < http://www.facebook.com/X > is a user (account) with some
>     associated properties as follows
>
> 2.
> Hi, this is Yahoo speaking:
> < http://www.yahoo.com/Y > is a mail user (account) with some
>     associated properties
>
> 3.
> Hi, this is me speaking:
> I interpret < http://www.facebook.com/X > and < http://www.yahoo.com/Y >
> as foaf:Persons and they should be considered the same.
>
>
> Last week, this was also the recommended way to deal with "books",
> i.e. bibframe expressions representing FRBR expression records
> for typical library holdings: Even if the LC URI for a
> resource is known, library X should not add custom statements with
> that URI in subject posistion, but rather craft an URI under its
> own command, express the statements with that URI and then make
> a statement to express equivalence of that URI with the LC one.
> If I recall correctly the main motivation for that "indirect" way
> were concerns about graph pollution in absence of universally
> employed mechanisms for keeping track of provenance.
>
> ISBN talk in contrast to this:
>
> a.
> Publisher:
> I created a resource and assigned ISBN 123 to it.
>
> b.
> Distributor:
> Please notice that the resource with ISBN 123 is now
> available through me
>
> c.
> Libray:
> We acquired the resource with ISBN 123 for immediate use
> in circulation
>
> d.
> Me:
> Please obtain the resource with ISBN 123 for me
>
> The real world identifier here serves to purpose to transcend
> the individual identifiers (in the publishers database, the
> distributor's, the library's - I'm certain they do exist)
> beyond their narrow context and does this without recurrence
> on connecting statements like 3. above: By using ISBNs in
> communications you adhere to the general semantics issued by
> the ISBN agency and implicitly accept the blanket statement
> that all resources with that ISBN are considered equivalent
> in their aspects corresponding to FRBR expressions.
>
> The challenge for semantic web applications is now, to model
> this quite flexible behaviour of real world identifier systems
> without twisting reality (like stating that ISBNs are URIs
> or only URI reprensentations of ISBNs are proper identifiers ...)
>
>
>> Not every identifier must represent a resource. With help of the Semantic
>> Web, identifiers can easily be dereferenced and can uniquely identify
>> resources on the web. Such a resource may consist of all these
>> parsing/canonicalizing/formatting. So it is possible to build algorithms to
>> reach consensus about identity of resources on the web.
> Not sure that I can follow you:
> * In the semantic web there may be ressources without URIs,
>    but URIs (they have got an R inside?) which do not represent
>    a resource identify what?
> * Equally there are ISBNs reserved for a publisher but not yet
>    assigned and therefore one can say that this ISBN does not
>    (yet) exist
> * On the other hand there are namespaces and number spaces thus
>    future URIs and real world identifiers from some identifier
>    scheme will fit into some pattern: A feature important for
>    real world identifiers and meaningless for URIs (don't even try
>    to infer from the URI that a resource implicitly belongs to
>    a certain dataset)
> * Algorithms stating the equivalence of differently formatted
>    numbers can only apply to real world identifiers, or strings
>    not acting as URIs.
> * If one would like to translate between urn:isbn URIs and
>    the corresponding info:isbn URIs one should set up a web
>    service feigning a gigantic store of individual statements
>    Probably thats the way to go, maybe even in a centralized
>    way: Whenever an identifier system is common enough bot
>    does not have an universally accepted way of expressing it
>    exclusively in URIs a "bibframe ecosystem" could provide
>    the transformations necessary to perform comparisons between
>    different identifiers.
>
>
>> Not surprisingly, we can apply a consensus algorithm also to legacy
>> identifiers, but with weaker semantics. For example, it is possible to
>> implement recognizers for all known forms of ISBN, and write conversions
>> from one form to another, without losing the context that all forms shall
>> describe the same resource. This is not Semantic Web, just theoretical
>> computer science.
> Not sure how computer science enters the stage: Isn't dealing
> with real world identifiers in a semantic web context a problem
> of (applied) semiotics?
>
> I really think that there is an abstract ISBN identifier space
> (as a discrete, bounded or at least enumerable set) allowing
> different string representations (think of abstract numbers and
> their representation in the decimal system or the binary system
> with certain flavors (how to provide dots and commas in different
> cultures or endianness issues in binary reprentations). These
> representations may happily coexist as long as there is not one
> string corresponding to two different abstract identifiers depending
> on the representation scheme.
> These abstract identifiers can also be used to parameterize the URI
> space some agent uses to denote the appropriate resources.
> Thus a mapping between abstract identifiers and URIs is possible.
>
> I think the abstraction introduced by bf:Identifier goes exactly
> in the right direction, however there still major points to solve
> before making it useful. Namely
>
> * also URIs should be permitted as identifierValue , not only literals
>
> * identifierAssigner and/or identifierScheme are important when it
>    comes to deciding wether I want or can compare two bf:Identifier s.
>    Thus we need a bibframe registry for this or delegate this to
>    the existing ISIL registry or authority files of the users choice?
>
> * identifierStatus presupposees that all business cases for an
>    otherwise unknown number of identifier systems are known in
>    advance and thoroughly understood by us and therefore we can
>    create a bf:vocabulary for that element?
>
> Certainly this path also has some dangers: It is way outside the scope
> of bibframe to construct a comprehensive model for the library universe
> with all its agents and services as a consistent dataset in the
> semantic web just to deal with ISBNs the proper way...
>
>
>> An interesting challenge for library catalogs would be if e.g. the
>> publishing industry started to move their ISBN numbering system into the
>> web and introduced URIs, also for existing ISBNs. Then we'd have two
>> aspects of the same thing - the web resource of the ISBN and the legacy
>> ISBN thing, the "string thing". In such cases, a skilled programmer must
>> perform the "heavy lifting" so that a catalog still can ensure that
>> equivalent ISBNs of whatever semantics are still identifying the same thing
>> - since it is the user of the library catalog that is looking for the one
>> and only result that may be denoted by many different forms of an
>> identifier.
> A more interesting question would be why the ISBN agency never introduced
> official URIs for their numbers or endorsed the URN:ISBN scheme: Maybe
> the too feared some kind of pollution: Imagine what would happen if
> publishers started to print URIs on the book jackets, probably in
> addition to the existing representations as strings and bar codes. And
> the potential for confusion in the presence of misspellings, typos or
> general cluelessness. Also I think that current business applications
> would not gain from URI versions of ISBNs: The already /know/ what they
> are dealing with.
>
>
>> My perception is that Bibframe is not providing any consensus mechanisms
>> except for "bibframe entities", which are only a small excerpt of the
>> Semantic Web. An open story is when e.g. catalogs are used outside the
>> library community scope, or merged into new data pools, and entities have
>> to be matched, Bibframe and non-Bibframe ones. This is not a new topic and
>> it is not specific to Semantic Web, but it is a strong advantage of the
>> Semantic Web. Maybe there is high hope for improved library catalog data
>> consensus by using Bibframe, but I am about to lose my optimism.
> Bibframe will not be in a position to cut the bounds of libraryland with
> real world phenomena. Thus even if it could impose exactly one kind of
> string or - even better - URI representation for ISBNs and other identifiers
> for all libraries in the world - what would be gained by that? Publishers
> and patrons will continue to confront us with real world forms of real
> world identifiers and continuously transforming between different
> representations of identifiers or denominations of ressources will remain
> one of the main tasks of our applications.
>
> viele Gruesse
> Thomas Berger
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iJwEAQECAAYFAlPGnQ4ACgkQYhMlmJ6W47O9YQQAsgm8/Ox/YlSrk354/GEgwUXm
> 1v1hMe6wHjgWnOo2WTGAYxIJMHDAQGuIuQQxmpJZzG9okcJAO642M+wvFCAwDXzA
> eoscJ+ylPl7AdGLI1km3edYU2PHce6hAO0A8OaXQ7iuYAd/PQN+7N9YUA+CpGT4a
> 29nP5NyjyphnjFF/K+Q=
> =DIlX
> -----END PGP SIGNATURE-----

-- 
Karen Coyle
[log in to unmask] http://kcoyle.net
m: 1-510-435-8234
skype: kcoylenet