Re: JTidy new line processing

> 
> >>I am running the Java version of HtmlTidy.  When the Html input looks
> >>like the one below , Tidy replaces the ^M with nothing, resulting in two
> >>separate words being combined (see Tidy output below also).  I do not
> >>know what product was used to create the offending Html. 

We appear to have a simular, but I suspect bug of different origin

When JTidy is called through  its main method i.e

java org.w3c.tidy.Tidy -asxml  file.html

the output is fine.

when it is called through its parser method i.e

Tidy tidy = new Tidy();
          tidy.setMakeClean(true);
//	  tidy.setXmlTags(true);
          tidy.setXHTML(true);
tidy.parse(in, out);


the output has deleted spaces,

i.e 

density was calculatedboth for anion A and B and the most


instead of 

density was calculated both for anion A and B and the most

This space is NOT at the end of line markers, its in the middle.
In a file of say about  10,000 characters, it appears perhaps  50-100
spaces will be deleted. We think its only spaces are lost, and not
other characters.

If  tidy.setXmlTags(true); is uncommented 
then it produces extra line throws
but does not delete gaps.

I might add that other versions of Tidy process the same file with no 
problem.

since calculatedboth  has a different meaning from calculated both,
we consider this a serious problem.

Any comments on the origin of the problem?

Received on Wednesday, 14 June 2000 13:25:30 UTC