Print

Print


> From: Geoff Mottram [mailto:[log in to unmask]]
> Sent: Tuesday, April 02, 2002 4:15 PM
>
> I agree with the above statement as a general design
> philosophy provided you
> permit the occasional exception.
>
> I have been having some discussions with MARC listserv
> regarding non-sorting
> portions of a title and I believe there needs to be one
> exception to not
> mixing PCDATA and elements. Assume you add a "nonsort"
> element as a method
> of marking up the portions of a title that are not to be
> sorted.  If you
> don't allow it to intermix with PCDATA within the title, you will have
> problems distinguishing between the parts of the title.  For example:
>
>     <title>
>       <part>first part of the title</part>
>       <nonsort>I belong to first title part but should be
> ignored</nonsort>
>       <part>second part of the title</part>
>     </title>
>
> With the above approach, you can't tell which part of the title the
> "nonsort" element belongs to.  It should be marked up as
> follows to avoid
> any ambiguity:
>
>     <title>
>       <part>first part of the title
>           <nonsort>I belong to first title part but should be
> ignored</nonsort>
>       </part>
>       <part>second part of the title</part>
>     </title>

Actually, I disagree that you don't know what <part> the <nonsort>
element belongs to.  XML nodes (elements) are sequenced order.
From the first example, the <nonsort> element falls between the
first <part> and the second.  Since it occurs before the second,
by definition of XML sequencing nodes, it belongs to the first.
Using XML's built-in sequencing solves the problem.

This also goes along with your original thought that you can pull
out all the <part> elements and that's your sort string.  What
you didn't point out is that the reason why it works is because
of XML sequencing nodes.  When I pull out all <part> elements
with XPath, XPath will always give me the elements in sequence
order unless I specify otherwise.

Your second example can easily be converted to the non-mixed
content model I described in my earlier message.  It's my
personal opinion that mixed-content modeling is a result of
not fully thinking out what you are marking up.  At the cost of
adding one additional element, which makes the markup more clear,
you could transform your example to:

  <title>
    <part>
      <sort>first part of the title<sort>
      <nonsort>
        I belong to first title part but should be ignored
      </nonsort>
    </part>
    <part>second part of the title</part>
  </title>

Note the above still follows my original content rules.  That
being every element should be either #PCDATA or one or more
refinement elements.  In the case of the first <part>, I
added the refinement elements <sort> and <nonsort>.  In the
case of the second <part>, I decided it was not necessary to
refine, thus implicitly saying that the whole <part> is of
element <sort>.  You could, if you really wanted to, enclose
the entire contents of the second <part> with a <sort>
element.

BTW, the content model for the <part> as I just described
above would be:

  <!ENTITY % partRefinement "|(sort|nonsort)*">
  <!ELEMENT part (#PCDATA%partRefinement;)>

With due sincerity, no exceptions are needed.


Andy.