clarification (I hope) - using catalog.xml and the other ixml-tests catalogs

Mail from an implementor has made me aware that my note
about the ixml-tests catalogs for Steven Pemberton’s two
collections of tests left a little too much to the imagination.

If you are not currently trying to build a test harness or use
the test catalogs I have prepared, you may skip this mail and
use the next five minutes in some other useful activity.

If you are currently trying to use the catalogs, my thanks —
and my apologies.  

Things are a little more confused than they ideally should be, partly
because of the difficulties Steven encountered uploading empty files
to github (which led to uploading tests.zip and syntaxtests.zip
instead of just adding tests and syntaxtests directories), and partly
because in developing test catalogs for those tests I have not been
completely consistent.

I initially imagined making catalogs that could be added pointing at
SP's tests, without changing anything else.  But the *.req files in
tests.zip and syntaxtests.zip are not XML documents, and the test
catalogs need to point at XML documents as the expected output.  And
doubtless other things went wrong.  My apologies for the ensuing
confusion.

If you want to use the test catalogs I have made, here is what I think
you need to know.  

(1) The catalog in ixml/tests/steven/catalog.xml points to test
catalogs in subdirectories named syntaxtests and tests-SP-MSM.

To get the syntaxtests directory, unzip SP-syntaxtests-package.zip in
place.  If you have already unzipped syntaxtests.zip, this will add
some catalog files to it, and overwrite the *.ixml files with
identical *.ixml files.

The tests-SP-MSM directory should appear when you pull in changes from
github.


(2) The zip file SP-syntaxtests-package.zip should unzip to a
syntaxtests/ directory containing Steven's tests from syntaxtests.zip
(just the *.ixml files, which are the only ones required by the
catalog), and three catalog files which uses those ixml files as tests
in different ways.  The readme file and the catalogs themselves will
explain the differences.

(3) The directory tests-SP-MSM contains a catalog plus a revision of
the tests from Steven's tests.zip of 21 December.  For a given test
foo, the foo.ixml and foo.inp files are the same as those in tests.zip
(or modified); the foo.req files (which contain the grammar, the input
string, and the expected output) have been replaced by foo.output.xml
files which contain only the expected output.

If my query counted correctly, I have modified 45 of the 59 tests.
The most frequent modification is stripping ungrammatical trailing
whitespace from input files to make them be sentences generated by the
grammar of the test.  The second most frequent modification is the
elimination of nonsignficant whitespace from XML results, so that
instead of

    <a>
        <b/>
 <c/>
    </a>

the output file will have either

    <a><b/><c/></a>

or (for longer files especially)

    <a
        ><b
 /><c
    /></a>

which is an unfamiliar format for most of us, but retains the
indentation and avoids introducing whitespace not generated by the
grammar or the input. It thus enables a simpler comparison using
something like the XPath function deep-equal().  Test harnesses that
don't have access to deep-equal() will presumably have to run the
processor's output and the expected output through a canonicalizer in
any case.  

There may be better ways of dealing with indentation in XML test
results, and if anyone knows one I hope they will share the joyful
news with the rest of us.

The third most frequent modification has been the addition of
alternate output in cases of ambiguity or potential ambiguity.  The
effect of the changes is to make the test suite agnostic over whether
the empty string is ambiguous when parsed against a grammar like

    s = 'a'*; 'b'*. 

or

    s = ()?.

That is, both <s/> and <s xmlns:ixml="..." ixml:state="ambiguous"/>
are currently accepted by the test catalog in tests-SP-MSM.  This
allows both Steven's processor (which wants to mark these as
ambiguous) and my parser (which does not) to claim success, until the
group gets around to deciding just how many angels are dancing on this
particular pin and requires one of us to change, or decides we are
both right.

I hope this helps.

Michael

Received on Friday, 31 December 2021 17:58:34 UTC