Print

Print


For implementations regarding the processing of identifiers, there are
several steps to take care of:

- parsing library catalog data for identifiers for finding equivalences.
Bibliographic identifiers could be found from before the World Wide Web
epoch and do not obey known semantics, like ISBN, originating from early
1970s on, and identifiers which do obey strict semantics and are genuine
"new" web resources (as denoted by URIs)

- the canonicalization of identifiers; this challenge is not new, it is the
problem of building internal representations of identifiers in a computer,
for example so they can be used as keys denoting the same value

- creating formatted output for computation, for presentation, or for
preservation of identifiers, following different formatting rules

Not every identifier must represent a resource. With help of the Semantic
Web, identifiers can easily be dereferenced and can uniquely identify
resources on the web. Such a resource may consist of all these
parsing/canonicalizing/formatting. So it is possible to build algorithms to
reach consensus about identity of resources on the web.

Not surprisingly, we can apply a consensus algorithm also to legacy
identifiers, but with weaker semantics. For example, it is possible to
implement recognizers for all known forms of ISBN, and write conversions
from one form to another, without losing the context that all forms shall
describe the same resource. This is not Semantic Web, just theoretical
computer science.

An interesting challenge for library catalogs would be if e.g. the
publishing industry started to move their ISBN numbering system into the
web and introduced URIs, also for existing ISBNs. Then we'd have two
aspects of the same thing - the web resource of the ISBN and the legacy
ISBN thing, the "string thing". In such cases, a skilled programmer must
perform the "heavy lifting" so that a catalog still can ensure that
equivalent ISBNs of whatever semantics are still identifying the same thing
- since it is the user of the library catalog that is looking for the one
and only result that may be denoted by many different forms of an
identifier.

My perception is that Bibframe is not providing any consensus mechanisms
except for "bibframe entities", which are only a small excerpt of the
Semantic Web. An open story is when e.g. catalogs are used outside the
library community scope, or merged into new data pools, and entities have
to be matched, Bibframe and non-Bibframe ones. This is not a new topic and
it is not specific to Semantic Web, but it is a strong advantage of the
Semantic Web. Maybe there is high hope for improved library catalog data
consensus by using Bibframe, but I am about to lose my optimism.

Best,

Jörg


On Sat, Jul 12, 2014 at 3:32 AM, Thomas Berger <[log in to unmask]> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>
>
> Am 11.07.2014 23:49, schrieb Denenberg, Ray:
>
> > ·         I agree with Jeff Young who said (if it really was Jeff – hard
> to tell)
> > ‘ The abandoned "info" URI effort leaves me skeptical that non-HTTP URIs
> can be systematically described in general.’
> > (This is a battle that I fought for years, but I long ago accepted
> defeat.)  And I honestly think we should treat isbn, issn, etc. – even
> fully formulated URNs – as string identifiers and not try to turn these
> into actionable URIs.
>
> Perhaps with emphasis on "we".
>
> To give two examples:
>
> The Gemeinsame Normdatei (GND) of the German National Library clearly
> has identifiers. To many of us they are known as in the form of the
> example string "123799465". However, in the MARC community they are
> known as "(DE-588)123799465". The DNB Website and MARC21 representations
> of the authority record state as "other standard identifier" some
> "http://d-nb.info/gnd/123799465" either to be considered as a "weblink"
> or as an identifier sourced from some "uri" identifier system.
>
> Following Ray's comment maybe we have outwitted ourselves by "knowing" that
> <http://d-nb.info/gnd/123799465> is the URI for the identifier "123799465"
> and anyhow "(DE-588)" and "http://d-nb.info/gnd/" are just some namespacey
> way to identify the identifier system for the real identifier following
> these prefixes.
>
> Maybe all three forms just are string representations for some abstract
> GND identifier of the resource: Equivalent with respect to the
> resource they are identifying and distinct when it comes to different
> contexts where their use is encouraged or not permissible:
> * "(DE-588)123799465" is mandatory in MARC contexts
> * "http://d-nb.info/gnd/123799465" is the string representation for
>   the /official/ URI <http://d-nb.info/gnd/123799465>
> * "http://d-nb.info/gnd/123799465" is the string representation for
>   an officialy provided actionable URI / URL <
> http://d-nb.info/gnd/123799465>
>
> The important point is, these equivalences, transformations and
> interpretations are properties of that particular identifier system
> and their validity is declared, guaranteed and technically maintained
> by some body (DNB) responsible for "operating" this identifier system.
> This body issues statements that identifiers like "123799465" and
> "http://d-nb.info/gnd/123799465" pertain to the same resource, may
> be turned into actionable URLs and how this can be done. (One might
> argue that the equivalence of "123799465" and "(DE-588)123799465"
> is a statement issued by LC in its role as the MARC standards body
> and there especially as maintainer of the list of organizational codes.
> Or - since these codes are defined as to be ISILs - a joint statement
> of LC and the ISIL agency maintaining "DE-588" as identifier for the
> GND as such)
>
> [Note that the "prefix URI" < http://d-nb.info/gnd/ > is not web
> actionable and there is no evidence that this URI was ever used
> to identify the GND as a database or web application, or the dataset
> of all concepts covered by individual GND records, nor the set of
> all GND identifiers emitted so far or the space of all possible
> GND identifiers or GND URIs]
>
>
> Now GND and VIAF are some of the few identifier systems which provide
> us with official URIs and actionable URLs at all. Many more systems
> do not have these properties, even quite recent ones like ISIL or ISNI.
>
> Consider ISBNs as another example:
>
> * There is the "old" form "1-59158-509-0" of an ISBN and the "new"
>   (EAN) form "978-1-59158-509-1" (I've choosen a publication from
>   2007 for my example to avoid discussions that one should be
>   preferred over the other).
> * If I recall correctly the ISBN agency states that ISBNs shall
>   be used (imprinted) with dashes and a prefix "ISBN" followed by
>   a space: "ISBN 1-59158-509-0" rsp. "ISBN 978-1-59158-509-1"
> * And the forms "1 59158 509 0" and "978 1 59158 509 1" commonly
>   printed by US publishers into the resource.
> * Not to forget the forms "1591585090" and "9781591585091" as
>   recorded in 020$a of MARC21 records.
>
> * And there are URN:ISBN:1-59158-509-0 by RFC 3187 and
>   info:isbn/1591585090 from the "info" URI scheme/registry
>   To my knowledge none of the two approaches ever has been
>   acknowledged or endorsed by the ISBN agency
>
> All these strings are equivalent identifiers when considered /as/
> ISBN but again in different usage contexts only certain representations
> are allowed: MARC21 does not allow to record "ISBN 978-1-59158-509-1"
> in field 020 although the ISBN agency declares this as /the/ official
> form.
>
> In this situation we have many communities issuing equivalence
> statements for string representations of "abstract" ISBNs:
> - - the agency (ISBN-10 <-> ISBN-13 transformation of the dashed forms)
> - - (some) librarians (MARC21 form)
> - - (some) publishers ("blanked" forms)
> - - ??? (ubiquitous eqivalence of ...-x and ...-X)
> - - IETF (URN:ISBN scheme)
> - - OCLC (info:isbn scheme)
> ...
>
> I don't think bibframe will ever be able to enforce the usage of
> one of these representation styles as preferred over all of the others
>  - even the ISBN agency had not been able to enforce the official
> form. And it will not desirable to always supply the complete zoo
> of equivalent strings for every resource.
>
> Thus there will be systems (as there are people) which will not
> be able to detect that the identifier strings presented by two
> ressources are equivalent within the ISBN context and actually
> represent the same (abstract) ISBN. And neither bibframe itself
> nor the kind of reasoning or deference currently available in
> the semantic web will be able to remedy that.
>
> To conclude:
> - - Many of our favourite identifiers are and will remain strings
>
> - - since not all of these strings are URIs we'll have to indicate
>   what identifier system the belong to (bibframe might provide
>   a registry providing URIs for the identifier systems in a
>   vocabulary-like manner)
>
> - - Also strings which look like URIs should be acoompanied by
>   information to which identifier system they are to be
>   associated - when used as identifiers: Distilling a common
>   prefix from uniformly build URIs is not a permitted operation
>   /and/ we would not know wether the thus extracted URI
>   "URN:ISBN:" should represent the ISBN identifier system as
>   such or the Dataset of all ressources identified by ISBNs
>   (and we propbably cannot afford to neglect that distinction)
>
> - - Most of our identifier systems have specific and non-trivial
>   equivalence rules for the strings (considering them to be opaque
>   as demanded for URIs won't be of any help) often reflecting common
>   usage in different communities. Not even the maintainers of
>   the identifier systems will have knowledge about all of these
>   convenience forms, let alone bibframe.
>
> viele Gruesse
> Thomas Berger
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iJwEAQECAAYFAlPAkBoACgkQYhMlmJ6W47P+tQP9FdAIdJMFO7Nfh2ralSmpVfx5
> 8rBl5sScdPGvwpRgKbQS52Q8GlUl6LFKBb4opl5zpcl2+tXT2Va3+DRnVvZoEuXF
> kR/pWz7rQnM0lPzvxwEk0kOOOSH+T4ZnfO4t/RKZLdFq1XZWfab4Y0CmoprYLjSs
> qxvbOsKcNVn+134v8zU=
> =cgb1
> -----END PGP SIGNATURE-----
>