Thomas, You seem to have missed part of the discussion about identifiers. I point you to: http://listserv.loc.gov/cgi-bin/wa?A2=ind1407&L=bibframe&T=0&P=8029 The thread begins here, but unfortunately the archive is not working correctly and some posts (esp. those from LC) do not display: http://listserv.loc.gov/cgi-bin/wa?A2=ind1407&L=bibframe&T=0&X=39B4FE6748B566AFC0&Y=lists%40kcoyle.net&P=1926 And one of those from LC, 7/11/14, from Ray Denenberg, states: "·I believe it has been clearly demonstrated by this discussion that a URI should not be one of the “identifier schemes” for bf:Identifier." And in the thread that begins on 7/10/14 with a post by Karen Smith-Yoshimura, I believe that we demonstrate that using as subject a URI from a third part does NOT imply that the statement was made by that party. This is one of the fundamental "truths" of the semantic web - that anyone can say anything about anything (AAA), and the URI does NOT indicate provenance of the statement (triple). kc On 7/16/14, 8:41 AM, Thomas Berger wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Joerg, > > Am 14.07.2014 22:34, schrieb [log in to unmask]: >> For implementations regarding the processing of identifiers, there are >> several steps to take care of: >> >> - parsing library catalog data for identifiers for finding equivalences. >> Bibliographic identifiers could be found from before the World Wide Web >> epoch and do not obey known semantics, like ISBN, originating from early >> 1970s on, and identifiers which do obey strict semantics and are genuine >> "new" web resources (as denoted by URIs) > Just for the record: > ISBN, especially since 2007, have very strict semantics, at least when > you ask the isbn agency (their FAQ will gladly tell). > > Sadly especially libraries took their part in blurring that semantics, > but that is a different story. > > Real world identifiers like ISBN play an important role and it would be > bold to deny that or to assume that this role has become obsolete because > there is now the new and shiny semantic web and its URIs solving any > problems the legacy identifiers are stuck with. > > In a sense, real world identifiers are quite different from semantic > web identifiers. ISBNs for instance have an internal structure > reflecting the delegation of the identifier space to agencies and > publishers, and they have a check digit for processing. And contrary > to semantic web principles these properties are "leaked" on purpose > to the community. This helps recognizing ISBNs in different contexts, > validating them and hunting down ressources - very practical tasks > in the real world but of no relevance to the semantic web. > > >> - the canonicalization of identifiers; this challenge is not new, it is the >> problem of building internal representations of identifiers in a computer, >> for example so they can be used as keys denoting the same value > Emphasis on *internal* representation. Obviously live gets much > easier when I settle onto one form of ISBNs or make a decision > to use info:isbn or urn:isbn URIs for my processing purposes. But > since these different representation systems do exist, canonicalization > cannot be universal. > > This is connected with the advice formulated last week on this list, > namely not to make statements with "foreign" identifiers in > subject position. > > The "semantic web" way of communications seems to be aligned according > to the following pattern: > > 1. > Hi, this is facebook(tm) speaking: > < http://www.facebook.com/X > is a user (account) with some > associated properties as follows > > 2. > Hi, this is Yahoo speaking: > < http://www.yahoo.com/Y > is a mail user (account) with some > associated properties > > 3. > Hi, this is me speaking: > I interpret < http://www.facebook.com/X > and < http://www.yahoo.com/Y > > as foaf:Persons and they should be considered the same. > > > Last week, this was also the recommended way to deal with "books", > i.e. bibframe expressions representing FRBR expression records > for typical library holdings: Even if the LC URI for a > resource is known, library X should not add custom statements with > that URI in subject posistion, but rather craft an URI under its > own command, express the statements with that URI and then make > a statement to express equivalence of that URI with the LC one. > If I recall correctly the main motivation for that "indirect" way > were concerns about graph pollution in absence of universally > employed mechanisms for keeping track of provenance. > > ISBN talk in contrast to this: > > a. > Publisher: > I created a resource and assigned ISBN 123 to it. > > b. > Distributor: > Please notice that the resource with ISBN 123 is now > available through me > > c. > Libray: > We acquired the resource with ISBN 123 for immediate use > in circulation > > d. > Me: > Please obtain the resource with ISBN 123 for me > > The real world identifier here serves to purpose to transcend > the individual identifiers (in the publishers database, the > distributor's, the library's - I'm certain they do exist) > beyond their narrow context and does this without recurrence > on connecting statements like 3. above: By using ISBNs in > communications you adhere to the general semantics issued by > the ISBN agency and implicitly accept the blanket statement > that all resources with that ISBN are considered equivalent > in their aspects corresponding to FRBR expressions. > > The challenge for semantic web applications is now, to model > this quite flexible behaviour of real world identifier systems > without twisting reality (like stating that ISBNs are URIs > or only URI reprensentations of ISBNs are proper identifiers ...) > > >> Not every identifier must represent a resource. With help of the Semantic >> Web, identifiers can easily be dereferenced and can uniquely identify >> resources on the web. Such a resource may consist of all these >> parsing/canonicalizing/formatting. So it is possible to build algorithms to >> reach consensus about identity of resources on the web. > Not sure that I can follow you: > * In the semantic web there may be ressources without URIs, > but URIs (they have got an R inside?) which do not represent > a resource identify what? > * Equally there are ISBNs reserved for a publisher but not yet > assigned and therefore one can say that this ISBN does not > (yet) exist > * On the other hand there are namespaces and number spaces thus > future URIs and real world identifiers from some identifier > scheme will fit into some pattern: A feature important for > real world identifiers and meaningless for URIs (don't even try > to infer from the URI that a resource implicitly belongs to > a certain dataset) > * Algorithms stating the equivalence of differently formatted > numbers can only apply to real world identifiers, or strings > not acting as URIs. > * If one would like to translate between urn:isbn URIs and > the corresponding info:isbn URIs one should set up a web > service feigning a gigantic store of individual statements > Probably thats the way to go, maybe even in a centralized > way: Whenever an identifier system is common enough bot > does not have an universally accepted way of expressing it > exclusively in URIs a "bibframe ecosystem" could provide > the transformations necessary to perform comparisons between > different identifiers. > > >> Not surprisingly, we can apply a consensus algorithm also to legacy >> identifiers, but with weaker semantics. For example, it is possible to >> implement recognizers for all known forms of ISBN, and write conversions >> from one form to another, without losing the context that all forms shall >> describe the same resource. This is not Semantic Web, just theoretical >> computer science. > Not sure how computer science enters the stage: Isn't dealing > with real world identifiers in a semantic web context a problem > of (applied) semiotics? > > I really think that there is an abstract ISBN identifier space > (as a discrete, bounded or at least enumerable set) allowing > different string representations (think of abstract numbers and > their representation in the decimal system or the binary system > with certain flavors (how to provide dots and commas in different > cultures or endianness issues in binary reprentations). These > representations may happily coexist as long as there is not one > string corresponding to two different abstract identifiers depending > on the representation scheme. > These abstract identifiers can also be used to parameterize the URI > space some agent uses to denote the appropriate resources. > Thus a mapping between abstract identifiers and URIs is possible. > > I think the abstraction introduced by bf:Identifier goes exactly > in the right direction, however there still major points to solve > before making it useful. Namely > > * also URIs should be permitted as identifierValue , not only literals > > * identifierAssigner and/or identifierScheme are important when it > comes to deciding wether I want or can compare two bf:Identifier s. > Thus we need a bibframe registry for this or delegate this to > the existing ISIL registry or authority files of the users choice? > > * identifierStatus presupposees that all business cases for an > otherwise unknown number of identifier systems are known in > advance and thoroughly understood by us and therefore we can > create a bf:vocabulary for that element? > > Certainly this path also has some dangers: It is way outside the scope > of bibframe to construct a comprehensive model for the library universe > with all its agents and services as a consistent dataset in the > semantic web just to deal with ISBNs the proper way... > > >> An interesting challenge for library catalogs would be if e.g. the >> publishing industry started to move their ISBN numbering system into the >> web and introduced URIs, also for existing ISBNs. Then we'd have two >> aspects of the same thing - the web resource of the ISBN and the legacy >> ISBN thing, the "string thing". In such cases, a skilled programmer must >> perform the "heavy lifting" so that a catalog still can ensure that >> equivalent ISBNs of whatever semantics are still identifying the same thing >> - since it is the user of the library catalog that is looking for the one >> and only result that may be denoted by many different forms of an >> identifier. > A more interesting question would be why the ISBN agency never introduced > official URIs for their numbers or endorsed the URN:ISBN scheme: Maybe > the too feared some kind of pollution: Imagine what would happen if > publishers started to print URIs on the book jackets, probably in > addition to the existing representations as strings and bar codes. And > the potential for confusion in the presence of misspellings, typos or > general cluelessness. Also I think that current business applications > would not gain from URI versions of ISBNs: The already /know/ what they > are dealing with. > > >> My perception is that Bibframe is not providing any consensus mechanisms >> except for "bibframe entities", which are only a small excerpt of the >> Semantic Web. An open story is when e.g. catalogs are used outside the >> library community scope, or merged into new data pools, and entities have >> to be matched, Bibframe and non-Bibframe ones. This is not a new topic and >> it is not specific to Semantic Web, but it is a strong advantage of the >> Semantic Web. Maybe there is high hope for improved library catalog data >> consensus by using Bibframe, but I am about to lose my optimism. > Bibframe will not be in a position to cut the bounds of libraryland with > real world phenomena. Thus even if it could impose exactly one kind of > string or - even better - URI representation for ISBNs and other identifiers > for all libraries in the world - what would be gained by that? Publishers > and patrons will continue to confront us with real world forms of real > world identifiers and continuously transforming between different > representations of identifiers or denominations of ressources will remain > one of the main tasks of our applications. > > viele Gruesse > Thomas Berger > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1 > Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ > > iJwEAQECAAYFAlPGnQ4ACgkQYhMlmJ6W47O9YQQAsgm8/Ox/YlSrk354/GEgwUXm > 1v1hMe6wHjgWnOo2WTGAYxIJMHDAQGuIuQQxmpJZzG9okcJAO642M+wvFCAwDXzA > eoscJ+ylPl7AdGLI1km3edYU2PHce6hAO0A8OaXQ7iuYAd/PQN+7N9YUA+CpGT4a > 29nP5NyjyphnjFF/K+Q= > =DIlX > -----END PGP SIGNATURE----- -- Karen Coyle [log in to unmask] http://kcoyle.net m: 1-510-435-8234 skype: kcoylenet