[Library update] Position now returned by all tests that apply at the markup level

Hi mobileOK Checker task force and other library users,

Tests that apply at the markup level usually return a code extract, but 
not the position of the code extract in the source code, which would be 
extremely useful from a user perspective.

The main reason that explains why the position is not there is that it 
was lost when the source was parsed to build an XML tree and tests are 
run against the XML tree, not against the source.

I updated the code of the library to preserve and return the position 
(well, only the line number in the end) whenever possible. I committed 
the changes to CVS. I will need to re-validate the whole test suite but 
I wanted to make sure I had not missed something before I do that. Hence 
this email. Reactions or comments?


Externally visible changes
-----
- the index of the line (starting at 1) where each node appears in the 
source document now also appears in the HTML tree representation in the 
moki (within the "docContent" element). A "line" attribute in the moki 
namespace is added to each HTML node. The "line" attribute is in the 
moki namespace to prevent collisions with any (existing or not!) HTML 
attribute and to make it easy to remove the attribute when e.g. the node 
is serialized to a string to report a code extract.

- A "tidied" attribute is now added to the "docContent" element in the 
moki representation. When set to "true", it means the mobileOK Checker 
had to tidy the resource under test before it could parse it. It also 
means that the positions may not be accurate since they represent 
positions in the tidied document, not in the original one.

- Tests that return a code extract were updated to also return the 
position where the code may be found using the usual "position" element. 
The "tidied" attribute is set to "true" when the position comes from a 
tidied version of the resource under test. The following tests were 
updated to return the position: AUTO_REFRESH, CACHING-3 and CACHING-6, 
DEFAULT_INPUT_MODE, IMAGE_MAPS, IMAGES_SPECIFY_SIZE, 
LINK_TARGET_FORMAT-3, NO_FRAMES, NON_TEXT_ALTERNATIVES, 
OBJECTS_OR_SCRIPT, PROVIDE_DEFAULTS, STYLE_SHEETS_USE

- Code extracts are now limited to about 50 characters in size. The 
Checker could sometimes return a whole section of the document as a code 
extract, which was not truly useful to know what was wrong.


Internal changes
-----
The main change is that Saxon's TinyTree's DOM implementation is used to 
parse the document under test with line numbering activated. The line 
number is then added to the moki serialization (see methods 
XhtmlContent.parse and XhtmlContent.toMokiNode).

The use of Saxon's DOM implementation triggered a couple of bugs related 
to the fact that instances of DOM nodes are created on the fly by Saxon 
when needed, and cannot be compared with "==". They must be compared 
with the DOM "Node.isSameNode" method (see e.g. changes in 
ObjectResourceExtractor).


Notes
-----
- Newer versions of Saxon would also allow to preserve the column, but 
we cannot switch to newer versions for licensing reasons (the mobileOK 
Checker uses extension functions which are not included in Saxon-HE, 
AFAICT).

- The line number seems to stay accurate when the source is tidied: the 
library the Checker uses to tidy up the source does not seem to add or 
remove lines. This shouldn't be relied upon, though. The column number 
would also not stay accurate.

- In the moki, the introduction of the "line" attribute in HTML elements 
triggers the definition of a "ns0" prefix for the moki namespace defined 
in the HTML root, e.g.:
  <html xmlns="http://www.w3.org/1999/xhtml" lang="en" 
xmlns:ns0="http://www.w3.org/2007/05/moki" ns0:line="2">
That's technically correct, alghough visually ugly. I would have 
preferred to control the serialization and generate a "moki" (or "m") 
prefix, but I could not figure out any easy way to do that in Java.


Related "bugs"
-----
5006: Does a "tidied" element or attribute exist?
6962: Code extracts: closing tag and tag content are often useless
9538: Improve code references
9583: Return code position consistently across the tests that output 
code extracts
These bugs are visible using:
http://www.w3.org/Bugs/Public/show_bug.cgi


Francois.

Received on Monday, 26 April 2010 13:37:14 UTC