You almost had me, up until the "baby and bathwater" statement. MARC
is the bathwater; the baby is our data. There is no need to lose any
data as we move forward, but we can lose some of the oddities of MARC
that are making it hard to add new information to our record format.
For example, we do not have a way in MARC to associate an identifier
with a particular set of subfields within a field. Although a $0 has
been added to the MARC format so that it can accept some of the RDA
data, the subfield remains ambiguous in some fields, and therefore
isn't usable in the intended way (substituting an identifier for a
particular data element).
ISO 2709, the thing that is structured with a Leader, directory and a
character string, is a very neat data transportation format -- genius,
really. (All hail Henriette! We were so lucky she came to work for
libraries.) What we have put into that format, however, is now rife
with inconsistencies and ambiguities. If we could have a MARC do-over
within ISO 2709 that would be great. However, we would once again find
that over time ISO 2709 doesn't scale. We have fields that have used
$a-$z and have nowhere to go. Should we have 2-character subfield
codes? That doesn't solve the problem of ordering which seems to
plague some systems. The 3-digit limitation on tags is also a
hindrance, especially since we have designated tag areas to somewhat
align with ISBD areas. Some systems have used letters in their tags to
expand the record.
I think we should pay less attention to the physical format of our
data and more to the CONTENT. I've been working on an analysis of MARC
content   for a while as a kind of hobby. If we define our
content clearly, then we can choose a serialization (or two or three)
that simply carries our data, it doesn't define its structure nor
would it limit its growth.
 MARC as Data: A start. Code4lib journal.
 Futurelib wiki. MARC analysis.
Quoting Jeffrey Trimble <[log in to unmask]>:
> I think the biggest crux of the problem is the eventual replacement
> for MARC--if that is even absolutely necessary.
> Working in the IT environment, and seeing different types of data,
> have actually given me an appreciation for the MARC record
> structure as a data exchange format. (Notice I didn't say User format).
> I'm currently working with about 6 different data storage types and
> getting the data between different systems has been a real
> problem. Everyone immediately says "oh, this is in an XML format".
> Let me tell you, XML is a markup language only, and not
> a real storage or data exchange format. The problem is that I'm
> spending considerable amount of time writing mapping software
> just to load and manipulate data. Our vendors (like SCT Sungard,
> Novell, Cisco, Microsoft) are anything but sympathetic.
> We've lost lots of minute parts of data that has no where to go, and
> attempting to remap is giving me rethought on a MARC replacement.
> We should remember that Henrietta Avram headed the MARC Pilot
> project in the 1960s and she was a programmer. But more important,
> this was not commercial software, or commercial standards, but
> custom software for LC and LC's standards developed as the format
> I wonder today what our programmers would come up with as a
> replacement. Our IT folks on our campus were comparing our current
> storage format (MARC21) with our SCT Sungard Banner system and quite
> frankly said they were 'envious'. The folks that wrote
> Banner are curious too since we load our Patron data in a MARC
> format--and actually can store it this way. I'll let you know how
> the meeting
> goes since Banner wants to look at this thing called 'MARC'.
> I wouldn't be so quick just to throw it [MARC] out before having a
> format in place that surpasses the usability of the MARC data exchange
> format. That would include compact storage to exchange the data, a
> user interface that is easy to work with and not cumbersome like XML
> vendors that will adopt the format **before** implementing the new
> standard (vendors like III, Ex Libraris, VTLS, OCLC, etc.)
> The cost of changing over to new format may prove to be too
> prohibitive for most. The commercial ILS vendor are just not going
> to re-write
> their systems and give it to the customer free. This could be one
> of the most expensive things that the library community will
> undertake, more
> expensive than any migration form AACR2 to RDA could ever be. If I
> were an automation vendor, I'd probably see this as a goldmine--"you
> have to buy my new ILS I'm selling for $ 3 million if you want to
> get off MARC". That's what I would be thinking as a business person.
> (Oh Boy, I need to change careers and get ready to make lots of money!)
> So, when will this post-MARC environment happen? With the economy
> in the dumps, and my institution in a financial crisis (who isn't)
> it may be some time before this all happens--we all may be retired.
> I would say, if the replacement isn't as robust as MARC, then it is
> doomed to fail.
> I'd like to propose that an examination of the MARC format be looked
> at again--could it be expanded? (The LDR and Directory lead one to
> believe so)
> Who said we can't have over 999 tag numbers (expand it to 4 numbers.
> If the storage size is limited to 9,999 characters, expand it in
> the LDR.
> What about subfield codes? Redefine it to three
> characters--combined subfield codes. This format is precise,
> compact and completely expandable.
> Don't go throwing out the baby with the bath water.
> Jeff Trimble
> On Sep 15, 2011, at 11:47 AM, Karen Coyle wrote:
>> What I haven't seen discussed here is the frequency with which this
>> data is needed. When I post about making place of publication an
>> actual place data element, I'm told there is rarely a need for it.
>> How often is a precise comparison of title pages of essence? Is it
>> worth making copies of all title pages for that number of
>> instances? Does this apply to all works, or is there a niche where
>> this has more use than, say, currently published trade books?
>> What this comes down to is a need to look at all of our data
>> practices and ask ourselves:
>> - who needs this?
>> - what is the context in which they need it?
>> - how often is it used?
>> - is there a more efficient way to provide this information?
>> - is there a better way to achieve this goal?
>> If I were being asked to create a new metadata scheme for Widgets,
>> Inc., those are among the many questions I would ask of the
>> providers and users of the information.
>> One of the big difficulties that I see for this effort is that most
>> of us come to the task with deeply ingrained practices and
>> assumptions. We won't go very far forward if we can't re-visit all
>> of these and decide what REALLY is needed today. This is why I
>> recommend that there be some non-librarian IT folks consulted. As I
>> said in a blog post, in fact that is exactly how MARC was developed
>> - by an IT person who (fortunately!) was very good at listening to
>> Quoting Ed Jones <[log in to unmask]>:
>>> Actually, I was thinking more of page images, trying to look at
>>> the two kinds of data. I was viewing transcribed data as serving
>>> the function of "Is this what you were looking for?" in which
>>> case, as Robert points out, transcription is inexact, and a page
>>> image would be more faithful. In a world where keyword searching
>>> is the default mode for most of us, I see structured access
>>> points, etc.--the other kind of data--as means of slicing and
>>> dicing the result set and triggering related-entity searches. The
>>> whole text would indeed be present in any contemporary e-text
>>> file--and even as imperfect OCR in digitized older resources--to
>>> facilitate keyword searching, but I wasn't thinking of any
>>> accompanying metadata. I wanted to try to look at the question
>>> purely in terms of the two kinds of data--three, if one includes
>>> the jumble of extracted text--and ask whether, if the purpose of
>>> the transcribed sort is really to answer this question-- "Is this
>>> what you were looking for?"--whether a page image or two serves
>>> the purpose better.
>>> Sent from my iPad
>>> On Sep 14, 2011, at 4:38 PM, "Mark Ehlert" <[log in to unmask]> wrote:
>>>> J. McRee Elrod <[log in to unmask]> wrote:
>>>>> Ed Jones <[log in to unmask]> wrote:
>>>>>> Would transcription still be necessary if a title page (or analogous
>>>>>> source for other types of resource) image were routinely included ...
>>>>> We include "thumbnails" of cover images for a major client (30,000
>>>>> records so far). But they are images, and can not be keyword
>>>> Ed's not referring to keyword searching an image with text. He's
>>>> referring to, say, an ePub or PDF file of text (title page or whole
>>>> work) within the coding of which is metadata that can be searched on
>>>> or extracted and put into a database. You might be familiar with EXIF
>>>> and image metadata, which is somewhat similar.
>>>> Mark K. Ehlert Minitex
>>>> Coordinator University of Minnesota
>>>> Bibliographic & Technical 15 Andersen Library
>>>> Services (BATS) Unit 222 21st Avenue South
>>>> Phone: 612-624-0805 Minneapolis, MN 55455-0439
>> Karen Coyle
>> [log in to unmask] http://kcoyle.net
>> ph: 1-510-540-7596
>> m: 1-510-435-8234
>> skype: kcoylenet
> Jeffrey Trimble
> System LIbrarian
> William F. Maag Library
> Youngstown State University
> 330.941.2483 (Office)
> [log in to unmask]
> ""For he is the Kwisatz Haderach..."
[log in to unmask] http://kcoyle.net