On Sep 25, 2011, at 10:46 PM, Karen Coyle wrote:
>> Yes, are you also speaking of a VOC $ that links it to the authority record?
> Not sure what you mean... a subfield? I could be a subfield, or it could be a defined "unit" that has its own divisions within it (since MARC has only tags and subfields there are only two levels of organization - it could be that more are needed).
VOC is whereby the bib record only stores an internal linking ID # of the authority record (But it displays the real heading). When you change the 1XX in an authority record, presto,
the bib displays the new 1XX. No global change needed. But you now have to "assemble" the bibliographical record to display all the parts to it. In the public display, this actually
makes better sense in the efficiencies gained in real time changes.
>>> - that it uses data where possible, not text
>> Please explain what the different data and text is.
> There are some gray areas, for sure, but essentially this is what I mean by text, although you could also refer to it as the display form of the information:
> xii, 356 p., 23 cm.
> and this is one possible way to express that as data (this is conceptual, obviously, but could be coded in XML or JSON or RDF):
> first pagination: xii
> coded: roman numerals
> second pagination: 356
> coded: (code or identifier for:arabic numerals)
> unit: pages
> height: 23
> unit: (code or identifier for:centimeters)
> Data is coded specifically for computer manipulation; plain text is intended to be read and interpreted by humans.
> One way to move from plain text to data is to use codes or identifiers for all controlled vocabularies so that you are manipulating the identity, not a text string. That allows you to vary the display form without modifying the meaning of your data, for example using displays in different languages.
> It's a pretty basic IT concept, which intends to make as much of your information "processable" as possible. If you want to be able to sort your address book by zip code, zip code can't be buried in a single string that represents the address. If you want to put books in order by their size (height) or size (number of pages) it is darned difficult to do given the text version, above. Essentially, library records tend to have their information in the form in which it should display, rather than in a form that can be acted on by a machine. You can derive the display form from the more heavily coded data form much more easily than you can derive data from the display form.
> And before anyone objects, yes, you can write algorithms that pull out this data, at least from any fields that aren't too complex or that have problems with punctuation. But that assumes that you have the information in your possession and will be able to normalize it before use. If we move our data onto the web and try to implement connections between our data and other peoples' data, we need to present *already usable* data, not something that has to be munged with a complex algorithm before it can be used. Basically, the underlying stuff of our records has to be more rigorous, and we can't expect others to jump through hoops to try to figure out how tall a book is.
Yes, I agree that pulling the data out can be done. Is this efficient? Well, computer processing is cheap, but you bring an important point regarding the usability of data. Our patron data is stored with address in one complete field. Is this easy to retrieve? Yes. Is it easy to sort? Not really. I usually have to pull the data into an excel spreadsheet or write code to do this. This is where more definition in the MARC record would need to be expanded and how you coded would have to be thought out logically. Of course the same applies in a non-marc environment.
I'm not so sure our end users really care about the exactness of collation statements. I know that form our LibQual surveys and other surveys most of our users want the title and the call#. They could care less about the # of pages or size of the item. But for collection development purposes (and research purposes) it is important. (And we all know that 300 $c is used to determine if you have oversize or not (in our case we have three locations based on size.... (under 19cm, over 28 cm. ) Though this is "down in the weeds" view and not looking far above to the bigger issues.
>>> I'm sure there is a lot more, but we have to be clear on our goals before we select or modify a data format.
>> This is great that we are talking about "that which must-not-be-spoken". I know we are just tossing out ideas. I hope from this, we can create the synergy necessary to move this into reality.
>> Jeffrey Trimble
>> System LIbrarian
>> William F. Maag Library
>> Youngstown State University
>> 330.941.2483 (Office)
>> [log in to unmask]
>> ""For he is the Kwisatz Haderach..."
> Karen Coyle
> [log in to unmask] http://kcoyle.net
> ph: 1-510-540-7596
> m: 1-510-435-8234
> skype: kcoylenet
William F. Maag Library
Youngstown State University
[log in to unmask]
""For he is the Kwisatz Haderach..."