Thomas Habing wrote:
>Jerome McDonough wrote:
>>On Apr 22, 2005, at 2:35 PM, Stephen Abrams wrote:
>>
>>> Was it intended that METS can only encapsulate valid XML as
>>><FContent>? (And I suppose the same question could be asked about
>>><mdWrap>.) If not, then I think that the processContent property for
>>>the <xmlData> element should be set explicitly to "lax" so that
>>>fragments containing Schema references can be validated while letting
>>>fragments without Schema references get by being merely well-formed.
>>
>>
>>There was an intention on my part to try to insure that only valid XML
>>would be included,
>>but in retrospect, that was probably the control-freak part of my
>>personality manifesting itself
>>in an unfortunate manner. How do other people feel about this? Are
>>others encountering the
>>same problem?
>
>I would be inclined to use lax for encapsulated XML. There are a number
>of XML dialects that lack XML Schema, the XML serialization of RDF being
>a good example. Other XML dialects may be validateable but not with XML
>Schema, but could be validated with Schematron, RelaxNG, or good ole DTDs.
One of the ways we're using METS is as a container for the metadata and,
sometimes, for the pieces of digital objects in an OAIS. The digital
objects are coming from different Producers, not all of whom provide
schemas for their XML. Lax encapsulation would accomodate their files more
accurately and a whole lot more efficiently than strict. We wouldn't need
to create extra schemas and we wouldn't have to make them available as part
of the archived object or its packaging information. For long-term
preservation, all the schemas for the XML files used in an archive should
be available in the archive itself. Creating and storing and managing all
the custom schemas for snippets and single-use xml files could become a
nightmare, both for the current stewards and for the folks in the future
who will someday have to unpack and recreate the objects. We could insist
on all files being referenced with FLocat, but then the metadata starts to
take on some of the attributes of content. A neater, tighter archival
object, with less administrative overhead could be made if FContent and,
especially, mdWrap were lax.
Bill
------------------------------------------------------------------
William R. Kehoe
Cornell University Library
-- Digital Library and Information Technologies (D-LIT)
503 Olin Library, Cornell University
Ithaca, New York 14853, USA
+1 (607) 254-8220
-------------------------------------------------------------------
|