Print

Print


>>>>> "RG" == Rebecca S Guenther <[log in to unmask]> writes:

RG> There are various ways to approach this. One problem is that in terms
RG> of a MARC to MODS conversion where the data being converted has used
RG> the transcription rules of AACR2, it is unfortunately not that
RG> predictable the way the data will appear. So the simple case is
RG> "c1999", but there are all kinds of variations in what would be in the
RG> 260$c. If you look at the format examples there are things like "1979
RG> printing, c1975", "April 15 1977". You could also have "ca. 1820". So
RG> it may not be a good idea to fool with the data and try to take out the
RG> "c", since the program would have to look at the data and will only
RG> catch some of them.

[sample options deleted]

One would not blindly remove any "c" in an AACR2 date, but rather write a
parser to cover a variety of cases.  This is something of a nightmare,
given that the date in AACR2 is little better than free text, but the
majority of cases can be identified and handled.  Currently, those of us
who need to do date comparisons on exported MARC data have to write these
parsers anyhow.  Here are a few examples of dates I had to parse for a
recent project:

    1857
    183-
    c1856
    ca. 1852
    [between 1842 and 1844]
    [not before 1852]
    1842?

From this perspective, it would be nice if MODS were more predictable in
its date formats.  As Mr. Tennant points out, additional information, if
deemed necessary, could be provided by attributes.  The conventions here
have not been well-considered and are for purposes of illustration only:

    <dateIssued encoding="iso8601">1857</dateIssued>
    <dateIssued encoding="iso8601" inferred="within decade">1830</dateIssued>
    <dateIssued encoding="iso8601" type="copyright">1856</dateIssued>
    <dateIssued encoding="iso8601" inferred="circa">1852</dateIssued>
    <dateIssued encoding="iso8601" inferred="between">1842/1844</dateIssued>
    <dateIssued encoding="iso8601" inferred="not before">1852</dateIssued>
    <dateIssued encoding="iso8601" inferred="probable">1842</dateIssued>

In case of multiple dates in the 260$c, they will be separated by a comma
and can be reliably separated into multiple <dateIssued> elements.  The
above example of "1979 printing, c1975" becomes something like:

    <dateIssued type="printing">1979</dateIssued>
    <dateIssued type="copyright">1975</dateIssued>

Correct AACR2 terms and punctuation could then be supplied for display by
an XSLT stylesheet, if desired.  This goes further than the guidelines
currently do in moving responsibility for ISBD punctuation into the
stylesheets, but the payoff is functional date parsing.

If absolutely necessary, extremely nasty, unparsable date info could
be marked so that programs can treat it literally for display, and avoid it
like the plague for date matching:

    <dateIssued type="unstructured">1953 [1935]</dateIssued>

(Though here there could, if needed, be some other way to record a
misprinted date.)

The point is that a well-defined date format is an asset of MODS,
especially when we consider that applications will need to do date
processing on MODS data.  Conformance to such date formats should be
explicitly encouraged if not required.  Any need to support AACR2 date
formatting and a MARC to MODS crosswalk should be secondary to support for
this more basic need.


Tod A. Olson <[log in to unmask]>     "How do you know I'm mad?" said Alice.
Sr. Programmer / Analyst            "If you weren't mad, you wouldn't have
The University of Chicago Library    come here," said the Cat.