Quoting Jeffrey Trimble <[log in to unmask]>:
>
> Yes, are you also speaking of a VOC $ that links it to the authority record?
Not sure what you mean... a subfield? I could be a subfield, or it
could be a defined "unit" that has its own divisions within it (since
MARC has only tags and subfields there are only two levels of
organization - it could be that more are needed).
>> - that it uses data where possible, not text
> Please explain what the different data and text is.
There are some gray areas, for sure, but essentially this is what I
mean by text, although you could also refer to it as the display form
of the information:
xii, 356 p., 23 cm.
and this is one possible way to express that as data (this is
conceptual, obviously, but could be coded in XML or JSON or RDF):
pagination
first pagination: xii
coded: roman numerals
second pagination: 356
coded: (code or identifier for:arabic numerals)
unit: pages
extent
height: 23
unit: (code or identifier for:centimeters)
Data is coded specifically for computer manipulation; plain text is
intended to be read and interpreted by humans.
One way to move from plain text to data is to use codes or identifiers
for all controlled vocabularies so that you are manipulating the
identity, not a text string. That allows you to vary the display form
without modifying the meaning of your data, for example using displays
in different languages.
It's a pretty basic IT concept, which intends to make as much of your
information "processable" as possible. If you want to be able to sort
your address book by zip code, zip code can't be buried in a single
string that represents the address. If you want to put books in order
by their size (height) or size (number of pages) it is darned
difficult to do given the text version, above. Essentially, library
records tend to have their information in the form in which it should
display, rather than in a form that can be acted on by a machine. You
can derive the display form from the more heavily coded data form much
more easily than you can derive data from the display form.
And before anyone objects, yes, you can write algorithms that pull out
this data, at least from any fields that aren't too complex or that
have problems with punctuation. But that assumes that you have the
information in your possession and will be able to normalize it before
use. If we move our data onto the web and try to implement connections
between our data and other peoples' data, we need to present *already
usable* data, not something that has to be munged with a complex
algorithm before it can be used. Basically, the underlying stuff of
our records has to be more rigorous, and we can't expect others to
jump through hoops to try to figure out how tall a book is.
kc
>>
>> I'm sure there is a lot more, but we have to be clear on our goals
>> before we select or modify a data format.
>>
> This is great that we are talking about "that which
> must-not-be-spoken". I know we are just tossing out ideas. I hope
> from this, we can create the synergy necessary to move this into
> reality.
>
>
> Jeffrey Trimble
> System LIbrarian
> William F. Maag Library
> Youngstown State University
> 330.941.2483 (Office)
> [log in to unmask]
> http://www.maag.ysu.edu
> http://digital.maag.ysu.edu
> ""For he is the Kwisatz Haderach..."
>
--
Karen Coyle
[log in to unmask] http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet
|