On Sep 25, 2011, at 12:17 PM, Karen Coyle wrote:
> Jeffrey,
>
> You almost had me, up until the "baby and bathwater" statement. MARC is the bathwater; the baby is our data. There is no need to lose any data as we move forward, but we can lose some of the oddities of MARC that are making it hard to add new information to our record format. For example, we do not have a way in MARC to associate an identifier with a particular set of subfields within a field. Although a $0 has been added to the MARC format so that it can accept some of the RDA data, the subfield remains ambiguous in some fields, and therefore isn't usable in the intended way (substituting an identifier for a particular data element).
>
> ISO 2709, the thing that is structured with a Leader, directory and a character string, is a very neat data transportation format -- genius, really. (All hail Henriette! We were so lucky she came to work for libraries.) What we have put into that format, however, is now rife with inconsistencies and ambiguities. If we could have a MARC do-over within ISO 2709 that would be great. However, we would once again find that over time ISO 2709 doesn't scale. We have fields that have used $a-$z and have nowhere to go. Should we have 2-character subfield codes? That doesn't solve the problem of ordering which seems to plague some systems. The 3-digit limitation on tags is also a hindrance, especially since we have designated tag areas to somewhat align with ISBD areas. Some systems have used letters in their tags to expand the record.
>
So, let's spend the money and the time to revise the MARC structure a little to make sense of things. We've done this before. We had format integration in 1993/1994. And Henrietta Avram even admitted that her biggest mistake was creating different formats and then the authority format. If she had it all over to do again, she would have created the Authority MARC format and then a Bibliographic MARC format.
Now to the limitations. I herewith make a proposal, and I should even be sold bold to say that this is something we need to take to MARBI. As for ISO 2709, let's change it, don't let it box you in. I propose:
1. Record Length. We'll need to adjust the Leader positions of 00-04, and move it to something much higher. Perhaps push bytes 05-23 further out. So we can reserve bytes
00-12 for record length (and bytes 05-23 become byes 17-31) That give you up to 9.999999 TB. That's one hell of a record. Do you think you have enough content for that large of a record? You can now include the actual printed book.
2. Expand the MARC record to have a 4 character numeric tag, starting with 0001 and continue to 9999. That too is quite big, many fields repeated, and more fields to define. Oh boy can we define fields.
3. Indicator count. Again, expand it to 3. We may not use it, but let's get rolling.
4. Subfield code count. Again, expand it to 3. You can then tell the computer that after the "delimiter" ($), you have either a 1 or 2 byte subfield. I can see us using $aa $ab $ac (or if you go to 4 character count you
could do something like $a-b $d-a or even $a$b $d$a. Or even a different delimiter sign as a secondary delimiter.
So you want more content. I've just answered your question plain and simple, with little disruption to the current structure. We can easily write conversion programs to deal with current MARC records.
We did something similar to this back in the 1999/2000 glitch. Most mainframes at that time stored only the last digit in the VSAM records. What was the answer? Well, spend Trillions of dollars and throw out the mainframe
and buy a unix box. (Unix store all 4 digits of the date (at least BSD and AIX). Instead, what most people did was to address the VSAM record storage issue and expand it by 2 bytes. This was not at easy task but it was
cheaper than buying new software. (Oh, yes, IBM was happy to sell you AIX--Sun told you that you needed to get off the Mainframe--the Y2K was going to make you loose your hardware and it wasn't fixable).
Now what I've proposed is simple, straight forward, and most of our ILS vendors and OCLC could do this in a matter of months, maybe a year extra. We've just bought ourselves several decades of time until technology is so advance we don't even need to worry about the printed word.
I'm no luddite, but in my experience as a programmer, MARC works, xml is just crap. Every time I have to deal with it, I start charging customers more (in this case, I start to whine a lot at my place of business).
Institutional Repositories have been using XML with limited success. In fact, DSpace software now allows you to contribute using an Excel spreadsheet because the XML coding is so difficult for the end user.
We've stopped using XML here at YSU for DSpace contribution. It's excel and then to Postgres. I'm finishing up a daemon to take an OCLC export and send it over to DSpace--directly to the postgres database, skipping the XML apart. Much simpler, and less work and our staff are much happier.
I'm not trying to derail LC and it's move, what I'm saying is think long and hard. This is a very expensive move and RDA will seem like peanuts--and we already know how much its is disliked by many in our community.
Finally, I have to remind us all that we aren't even using all of the current MARC features, and we want to replace it. How do you know it needs to be replaced when you haven't even scratched the surface of seeing if we can enlarge it, restructure it, change it up. It was originally a communications format, not an end user input format. That said, I can't wait to see some poor cataloger given a blank OCLC screen to input and original and type in xml coding. Directors will really want to get rid of catalogers because this is really kludgy.
I'm really glad we are having this conversation. It is long overdue. We need to continue the dialogue, with respect, and we need to begin asking the simple question "How do we know if MARC is dying if we haven't attempted to push it further?" And right now, I haven't seen a serious push to restructure and expand it.
Best wishes in programming.
--Jeff Trimble
> I think we should pay less attention to the physical format of our data and more to the CONTENT. I've been working on an analysis of MARC content [1] [2] for a while as a kind of hobby. If we define our content clearly, then we can choose a serialization (or two or three) that simply carries our data, it doesn't define its structure nor would it limit its growth.
>
> kc
> [1] MARC as Data: A start. Code4lib journal. http://journal.code4lib.org/articles/5468
> [1] Futurelib wiki. MARC analysis. http://futurelib.pbworks.com/w/page/29114548/MARC%20elements
>
> t
Jeffrey Trimble
System LIbrarian
William F. Maag Library
Youngstown State University
330.941.2483 (Office)
[log in to unmask]
http://www.maag.ysu.edu
http://digital.maag.ysu.edu
""For he is the Kwisatz Haderach..."
|