As a bit of history, I'll review how we've treated URIs in MARC. I'm not
sure how much bearing it has on this discussion (but we do want to
maintain compability), and it does point out the complexities here.
Field 856 was defined in MARC 21 as "Electronic Location" in the early
1990s, actually before the URL was finalized. We chose a holdings field
for a reason: because this was considered equivalent to 852 (Location),
which is used to record the repository/holding institution in the physical
sense. The thought was that as a location it could be used either in a
bibliographic record or a holdings record. (Now a number of institutions
use it as a holdings field, but I won't go into that.)
Several years later LC was interested in recording persistent names for
electronic items. Since the name needed to be associated with the
particular location, we used subfields of 856 $d (Path) and $f (Electronic
name). (Many subfields were defined because in the early days we didn't
know whether we wanted to parse the pieces of the URL.) We used $d for the
"aggregate name", actually a directory that brought together different
files of the same intellectual object and $f for the particular filename;
together these served as a persistent ID (we knew we were going to move
the files from one server to another, so didn't want to use $u for the
URL). As things progressed and we decided to use handles for persistent
names (and we considered a handle a URN, even though it wasn't officially
registered), we took a proposal to define a subfield of 856 for a URN
($g). Shortly after that the MARC Advisory Committee considered a proposal
to make $g obsolete and redefine $u as "URI", because it was argued (quite
strongly by a prominent W3C member) that the distinction between URL and
URN was not needed and all should be considered URIs. Since then LC has
supplied in its records handles that are resolvable by attaching an http:
proxy server name (which is essentially a URL with a persistent name
attached) and those have been recorded in 856$u.
The 024 field is used for "Other Standard Identifier" (i.e. other than
those that have their own fields) and includes various kinds of
identifiers, such as SICI, International Standard Recording Code, etc. A
proposal recently approved specified using this field for the
International Standard Text Code (ISTC) when appropriate. There is no
definition in that field now for recording URIs that are persistent names.
One could argue that there should be given Ray's statements below.
Ray's arguments make a lot of sense, but I am mainly concerned about the
ability of the person creating the metadata to distinguish between a URI
as persistent name and a URI as a locator. This is not immediately
apparent by looking at the URI string. Or if you don't know would you
always record it twice? That brings up the problem or redundancy.
I'd be interested in further thoughts about this issue.
Rebecca
On Thu, 11 Sep 2003, Ray Denenberg, Library of Congress wrote:
> I'd like to propose for consideration a MODS change, to be applied in 3.0.
> (I think this is an important change, and that the impact on the schema is
> fairly small.)
>
> I suppose I had though that if you have a URL to access an item and you want
> to include it in a MODS record for that item, you could put it in the
> <location> element. Well, you can't. <location> is essentially
> physical.It's defined as sourceType with an authority attribute for an
> organization code. The authority can be omitted in which case it's just a
> string, but there isn't any way to indicate it's a URL. It appears that the
> prescribed way to code a URL is as an identifier (the <identifier> element)
> of type URI. Recent discussion of 'date accessed' has brought this to my
> attention. (I think Bruce brought it up. But I should have realized this
> long ago.)
>
> Coding a URL as an identifier, when the intent is to provide a URL for
> access, is a big mistake. I'm willing to elaborate profusely on this point
> if anyone needs to be convinced.
>
> To be clear: if the intent of supplying a URI is to provide an identifier --
> even when that string also happens to to be a URL that can be used to access
> the resource -- by all means, put it in the <identifier> element and call it
> an identifier. But if the intent is also to provide location information, we
> need somewhere in addition to put it (if that means putting an identical
> string in two places, so be it), and the logical place would be <location> I
> think.
>
> My suggestion is to add an attribute to <location> to indicate if it's a
> physical or electronic source (values 'physical' and 'electronic' or please
> suggest alternative values); in the latter case a URL would be assumed.
>
> This will take a little fiddling with the definition and references to
> sourceType, but not much.
>
> Please comment soon on this proposal, as we want to get 3.0 out.
>
> --Ray
>
|