> From: Houghton,Andrew [mailto:[log in to unmask]]
> Sent: Monday, April 01, 2002 10:42 AM
>
> Actually, I think we all agree that mixing #PCDATA _between_
> elements, ala HTML, would _not_ be a good content model. I
> can for see communities that do not want to specify "type" nor
> "description". They just want something plain and simple like
> Dublin Core. So I can see a use for:
>
> <!ELEMENT creator (#PCDATA|(type,name,description)>
> <!ELEMENT name (#PCDATA)>
>
I wanted to expand a little on my previous message, since I
didn't have time yesterday. I'm _proposing_ that all MODS
elements use a conceptual content model of:
<!ENTITY % creatorRefinement "">
<!ELEMENT creator (#PCDATA%creatorRefinement;)>
Where you can substitute any element name for "creator". By
using this content model it provides several advantages. First,
it allows those non-library communities that don't want or need
detailed metadata, the option to specify just the textual
content for an element. Second, it allows those metadata
communities that want more detailed metadata to do so within
the confines of a framework. For example LC could define a
series of refinements that would allow progressively more
detailed markup so eventually you could do full MARC21 to MODS
to MARC21 conversions without the loss of information or the
"displacing" of information. Lastly, it would also aid in
interoperability.
We can code the quoted creator example, above, by redefining
the creatorRefinement entity as:
<!ENTITY % creatorRefinement "|(type,name,description)">
<!ELEMENT creator (#PCDATA%creatorRefinement)>
<!ENTITY % typeRefinement "">
<!ELEMENT type (#PCDATA%typeRefinement)>
<!ENTITY % nameRefinement "">
<!ELEMENT name (#PCDATA%nameRefinement)>
<!ENTITY % descriptionRefinement "">
<!ELEMENT description (#PCDATA%descriptionRefinement)>
Metadata communities can either specify the textual information
for creator or the elements type, name and description. When
specifying either type, name or description the metadata
community could specify a string or further refine an existing
MODS refinement element.
Any metadata community that wished to add their own refinements
could just declare their own namespace and use the MODS
specification as a base for those refinement. Their refinements
however, must conform to the same content "refinement" model.
Lets say first that the above creatorRefinement was the way
MODS was specified. We will call this hypothetical MODS.
Further, lets say that in my community we prefer that the
hypothetical MODS name element, above, should provide more
detail. We can declare a namespace, called "my" and define
the following refinements:
<!ENTITY % nameRefinement "|(my:given,my:family)">
<!ENTITY % my_givenRefinement "">
<!ELEMENT my:given (#PCDATA%my_givenRefinement;)>
<!ENTITY % my_familyRefinement "">
<!ELEMENT my:family (#PCDATA%my_familyRefinement;)>
This allows "my" community to refine the base hypothetical
MODS standard while reusing it's framework for my metadata
description.
This "refinement" concept, demonstrates the first two points I
mentioned above. But what about interoperability? If anyone
can refine the base MODS specification how does this help
interoperability? Interoperability, is easily achieved, if you
understand some technical concepts with XML, XMLDOM and XSLT.
Lets say that some organization just received a bunch of those
hypothetical MODS records from "my" community. However, they
use the base MODS specification but "my" records contain
refinements. How do they resolve "my" refinements? What they
can do is dumb down "my" records either by using the XMLDOM or
writing an XSLT transform. In the case of the XMLDOM, it allows
you to retrieve all the textual information from a given node
down the tree. Effectively, collapsing all XML elements out of
the data stream.
You can also accomplish the same thing using XSLT where I can
collapse out the "recognized" or unrecognized XML elements. I
stressed "recognized" since with XSLT you could do what I will
call a smart collapse. Note that in "my" communities refinement
I specified that the given name must precede the family name.
However, if in your catalog you expected the more formal AACR2
form, the XSLT transform could be smart enough to swap the
contents of the given and family elements and add the comma at
the end of the family element's content.
The three points I initially brought up are all made possibly by
the simple concept that elements can contain either #PCDATA or
one or more elements, exclusively. In theory, XML elements are
just refined textual content. In other words, XML elements are
specialized markup to allow you describe specific aspects of the
content. For example one could declare just a document element
in XML with it's content model being just #PCDATA. Just one blob
of textual information. Further markup then, is just providing
more structure to that blob of textual information. So by
association if you remove the markup, e.g. structure, then you
are left with the original blob of textual information. This
concept always gives you a fall back, e.g. dumb down, for
compatibility and interoperability.
Andy.
|