(Glad to see that my post made it through even though I got a bounce
message. My apologies if duplicates were received).
My post wasn't terribly clear. By "random badness" I was trying to
contrast a structured test suite or devised set of examples that
demonstrate a known range set of errors with a set of collected files
that exhibit unusual combinations of problems or problems not
necessarily anticipated by the test organizers. Examples collected from
the wild, as it were, rather than created are a helpful supplement to a
formal test suite. Not a terribly profound observation, just a reminder
that using more than one approach is helpful.
-----Original Message-----
From: Metadata Encoding and Transmission Standard [mailto:[log in to unmask]]
On Behalf Of Erik Hetzner
Sent: Monday, March 12, 2007 3:18 PM
To: [log in to unmask]
Subject: Re: [METS] Schema testing (was Re: [METS] METS schema in RNG?)
At Mon, 12 Mar 2007 09:39:26 -0400,
Evan Owens <[log in to unmask]> wrote:
> Test suites are tremendously useful. If I understand this thread
> correctly, there are two different questions here:
>
> 1) the METS schema itself, and any new versions compared to previous
> versions
>
> 2) whether a RNG or other version of the METS schema enforces exactly
> the same set of constraints, no more and no less
There are constraints in RNG that cannot be expressed in W3C schema, and
W3C schema can provide default values for attributes (though I don't
think that METS uses this). So, in general, it is not possible to
guarantee this. In the specific case of METS I think it may be.
> For 2) I would start by a careful examination of the converted schema
> and determine whether it appears to express the same constraints as
> judged by someone conversant in both schema expression languages. If
> it passes the human test, then move on to parallel software tests
> using a suite of known good and bad files.
The trick is finding somebody conversant in both languages. I myself
have examine these files, but not that carefully.
> I would suggest that test files be created that exhibit only one fault
> per file, and then known combinations of faults, and finally a lot of
> random badness. Ideally you work backwards from the original business
> requirements that guided the development of the schema and make sure
> that you have a test for each requirement. The randomly bad files
> should be created by as many different people as possible.
This sounds like a good idea, but a lot of work. I'm not quite sure what
is meant by 'random badness', however.
> After you think that you have tested everything that you can possibly
> test, then change the tools that you are using to do the testing and
> try again and hope that you get the same results. I remember the early
> days of SGML when two well-known SGML parsers disagreed about what was
> and was not a valid SGML file. I trust that XML tools are better, but
> I would still verify. Judging from a recent thread on XML-DEV the test
> suites for XML parsers are not perfect.
Fortunately XML is a bit easier to parse! Your point is well taken,
however; some tools recognized ID/IDREF mismatches, for example, while
others do not.
best,
Erik Hetzner
|