LISTSERV mailing list manager LISTSERV 16.0

Help for EAD Archives


EAD Archives

EAD Archives


EAD@LISTSERV.LOC.GOV


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

EAD Home

EAD Home

EAD  December 2004

EAD December 2004

Subject:

Re: MYSQL and EAD

From:

Elizabeth Shaw <[log in to unmask]>

Reply-To:

Encoded Archival Description List <[log in to unmask]>

Date:

Wed, 22 Dec 2004 11:33:26 -0500

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (318 lines)

Mike Ferrando wrote:
> Friends,
> One thing that really troubles me about the database approach is the
> attributes. I find that most people that use the database approach
> simply do not use attributes in their code at all.

This is a design problem and a human problem (data entry) -not a problem 
with the technology itself. If a database doesn't provide you with that 
information it is a fault of the design - not the database.

People can do horrible XML markup too. In EAD which is a loose DTD, you 
can have a valid document that looks like this:


<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE ead SYSTEM "ead.dtd">
<ead>
        <eadheader>
                <eadid/>
                <filedesc>
                        <titlestmt>
                                <titleproper/>
                        </titlestmt>
                </filedesc>
        </eadheader>
        <archdesc level="collection">
                <did>
                        <abstract/>
                </did>
        </archdesc>
</ead>

Yes - the only thing I *had* to enter other than mark-up was the level 
attribute. And I have a valid document. Again, this is a human problem - 
not a technology problem.

In fact, relational databases have datatyping (which is part of why 
searching and indexing can be so efficient). Data entry can also be set 
so that the user is forced to use the correct datatype or  validation 
against a regular expression. For example you can specify that the 
titleproper contain at least 5 characters and no numbers.

XML Schema allows for this sort of thing - but since EAD isn't 
officially in XML Schema and signficant work would have to be done to 
develop the required datatypes it isn't really of much use yet in 
forcing normalized practice.
SO indeed, if normalization is your goal - database win hands down.

Almost all encoding in EAD has to be done as established best practice 
at a local level.

In summary, in either case this is a human design problem -not the 
fundamental technology. And XML doesn't provide a more rigorous solution 
and may in fact provide fewer tools than a RDMS to enforce solutions.

> 
> Those that do usually have fields that are designated by some type of
> standard. For us that would be MARC. However, even if that is a
> better scenerio, I cringe to think of typing in dates twice or even
> three times in order to create a normalized string.
> 

In a database, in fact, you can represent dates only once - you don't 
have to provide a lot of different forms. A date is stored as the 
datatype DATE which is a numeric representation of the date (not a 
string). Retrieval on dates entered as DATE types is incredibly fast 
(computers like numbers better than strings :)) On output you transform 
the date to the form that you wish to display or manipulate. One entry 
and infinite display possiblilities. MARC, AACR2, ISO8601 anything you 
can possibly imagine. And the tools to format the date are usually 
already a part of any half way functional RDMS. So you don't have to 
bake your own.


> If these types of attributes are to be done as the data is entered,
> this would put an incredible burden on the data collector.
> 
 > Further, without the EAD tag set, it seems to me that a search engine
 > would really have to do double duty to get to the data (AACR2 date;
 > ISO 8661 date).
 >


And not in XML??? Currently, if you want your dates machine processable 
(thus searchable, sortable etc.) you have to enter a normal attribute or 
place a normalized form in the field

<unitdate>2000-12-22</unitdate>
or
<unitdate normal="2004-12-22">December 22, 2004</unitdate>

In the first instance you will probably find yourself writing a 
stylesheet (somewhat complex) to parse the date and display it for humans.

So RDMS - one date - many outputs.
In XML - many dates (Some for humans and some for machines) *or* snarly 
XSLT to make it look human readable. And you can not force the DTD to 
make sure that the user entered the date correctly (But a DB schema can)


> Finally, I would think that crawling through a proprietary software
> would be much more difficult than an XML document.
> 

Not sure I understand this.

First,- not all relational database are proprietary.

Second, as Richard points out, they have very robust and capable 
indexing and searching. If you start thinking large quantities of data - 
relational databases have XML databases hands down.

True, you can't open up a database in any old text editor because the 
data is coded in binary (usually) format that isn't readable in a text 
editor (that doesn't mean it is proprietary). And point in fact - you 
can't open up a XML database file in a text editor either - it will also 
be in a binary format that facilitates searching and retrieval.

But reading the raw data isn't why people use databases in the first 
place. They are used to create efficiencies in data entry, reduce 
redundancy, provide noramlization and provide efficient and effective 
searching of the data.

> These are my reasons for sticking with XML rather than databases. I
> see a separation between data collection software and mark up/display
> of that data. Mapping the datatypes seems to be the key, but context
> (heirarchy) conveys information I would not want to try to capture in
> a database format.
> 

I can agree that context is important to represent in  archival 
collections. And I think it is a very hard schema design problem  to 
capture the richness that EAD can capture. But it is possible. I am not 
saying it is the best solution - only that it is possible.

I think it is important to make a clear distinction between the 
technology, the format, the particular instances of schema or DTD design.

A relational database is a set of technologies that optimize storage and 
retrieval and transactions of data.


The data that is entered into a database could also be stored in a CSV 
file (which could be read by a text editor) just as XML is stored in a 
file that can be read by a text editor. However we generally store the 
data indexed in a binary format.

A particular database schema that provides the set of data fields, their 
relations and their datatypes may be well designed or poorly designed - 
but it doesn't negate the underlying capabilities of RDBMS.


For our purposes, XML is a format that can be read by a text editor - 
not a set of technology tools (any XML geek right now would cringe over 
this simplification but for our purposes it will do).

There are XML databases that have similar (though less efficient at the 
moment) capabilities - for storage and retrieval - but their files too 
are in a binary format. In fact, if you enter data directly into the XML 
database, it may never be represented as a "readable" XML file until 
exported. In that sense it is just like the data in the relational 
database - until serialized as text.

You can have a good XML Schema/DTD or a bad one. You can have one that 
enforces a fair amount of rigor - but until XML schemas are widely used 
you will not have one that enforces datatyping.

Relational Database Management Systems can be compared to XML Database 
systems in terms of capabilities.

DB Schemas can be compared to DTDs and XML Schemas.

But comparing XML to RDMS doesn't make sense. It is comparing apples and 
oranges.

Liz Shaw
PS -Richard - my address book example was indeed too simple - Perhaps a 
personnel database would have been more apt.






> Mike Ferrando
> Library Technician
> Music Division
> Library of Congress
> Washington, DC
> 202-707-4454
> 
> --- Richard Davis <[log in to unmask]> wrote:
> 
> 
>>Hi again
>>
>>Liz's post was very clear and interesting. I just wanted to
>>elaborate a
>>couple of points:
>>
>>
>>Elizabeth Shaw wrote:
>>
>>>It is not that archival data is particularly unique but it is
>>
>>true
>>
>>>that highly nested linear data stored across many fields in a
>>>database is more difficult to retrieve and reconstruct.
>>
>>It hadn't occurred to me that anyone might think of storing every
>>last
>><emph> in its own field or row. As you suggest, it doesn't sound
>>like
>>something to be recommended, nor does it seem very relational.
>>
>>
>>
>>>But I would argue that perhaps the archival community should move
>>>away from the notion of thinking of its data as a linear
>>
>>document.
>>
>>>If you move away from that notion, then storing the descrete data
>>>elements that describe a collection and its component parts in a
>>>database begins to make more sense.
>>
>>This is the approach I've taken, and still favour, at least for the
>>time
>>being. The finding aids I've dealt with are ISAD(G) based, and all
>>seemed strongly field-oriented. Within each component field,
>>greater
>>granularity is preserved by using the markup for the equivalent EAD
>>element. At the moment, little further use is made of this markup,
>>except for transformation to HTML. But valid and meaningful EAD can
>>easily be reconsituted, offline or on-the-fly, for transmission
>>over the
>>web, for indexing, or for when the ultimate killer EAD app is ready
>>to
>>migrate to.
>>
>>
>>
>>>Although data may be indexed by elements for searching purposes,
>>
>>it
>>
>>>is usually retrieved as a chunk (with all the internal tagging
>>>intact).
>>
>>This point is important, and often overlooked. Indexing is
>>fundamental
>>to any DBMS (including XML). None of it works at all, except in
>>theory,
>>without indexing.
>>
>>In modern RDBMS, indexing is exceedingly well implemented: for that
>>speed and reliability alone, it's likely to be worth compromising
>>the
>>absolute integrity of a logical design. And, lurching back to the
>>topic,
>>MySQL's indexing features work extremely well, and include the
>>option of
>>"fulltext" indexing, which makes it very attractive for storing
>>chunks
>>of markup.
>>
>>MySQL has long lacked some core relational features. For example,
>>I've
>>had to implement referential integrity at application level, which
>>(like Bartleby) I'd prefer not to. MySQL makes up for that by being
>>fast, and free, and well-supported - though I understand PostgreSQL
>>is
>>also free, and fully relational, and performs well.
>>
>>
>>
>>>Depends on what you want to do. I will put my address book in a
>>>relational database anyday but when I want to search Shakespeare
>>
>>give
>>
>>> me XML.
>>
>>At first I agreed with you, but then I had second thoughts: XML
>>suits
>>address books very well, probably more so than a heavyweight RDBMS
>>-
>>unless your address book is Yellow Pages! On the other hand, a
>>complex
>>network of multi-level descriptive records seems eminently suitable
>>for
>>the relational treatment.
>>
>>Seasonal regards!
>>
>>Richard
>>
>>
>>--
>>/
>>\ Richard M Davis
>>/ Digital Archives Specialist
>>\ University of London Computer Centre
>>/ Tel: +44 (0) 20 7692 1350
>>\ mailto: [log in to unmask]
>>/
>>
> 
> 
> 
> 
> 
> __________________________________
> Do you Yahoo!?
> All your favorites on one personal page  Try My Yahoo!
> http://my.yahoo.com

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
December 2006
November 2006
October 2006
September 2006
August 2006
July 2006
June 2006
May 2006
April 2006
March 2006
February 2006
January 2006
December 2005
November 2005
October 2005
September 2005
August 2005
July 2005
June 2005
May 2005
April 2005
March 2005
February 2005
January 2005
December 2004
November 2004
October 2004
September 2004
August 2004
July 2004
June 2004
May 2004
April 2004
March 2004
February 2004
January 2004
December 2003
November 2003
October 2003
September 2003
August 2003
July 2003
June 2003
May 2003
April 2003
March 2003
February 2003
January 2003
December 2002
November 2002
October 2002
September 2002
August 2002
July 2002
June 2002
May 2002
April 2002
March 2002
February 2002
January 2002
December 2001
November 2001
October 2001
September 2001
August 2001
July 2001
June 2001
May 2001
April 2001
March 2001
February 2001
January 2001
December 2000
November 2000
October 2000
September 2000
August 2000
July 2000
June 2000
May 2000
April 2000
March 2000
February 2000
January 2000
December 1999
November 1999
October 1999
September 1999
August 1999
July 1999
June 1999
May 1999
April 1999
March 1999
February 1999
January 1999
December 1998
November 1998
October 1998
September 1998
August 1998
July 1998
June 1998
May 1998
April 1998
March 1998
February 1998
January 1998
December 1997
November 1997
October 1997
September 1997
August 1997
July 1997
June 1997
May 1997
April 1997
March 1997
February 1997
January 1997
December 1996
November 1996
October 1996
September 1996
August 1996
July 1996
June 1996
May 1996
April 1996
March 1996
February 1996
December 1995

ATOM RSS1 RSS2



LISTSERV.LOC.GOV

CataList Email List Search Powered by the LISTSERV Email List Manager