problem with Tidy and some Frontpage html

First I must say that we greatly appreciate Tidy and, as best as I can
enforce it, we use it to clean up the pages of MathSciNet from the Math
Society.

I hesitate to call this a bug but Tidy is doing bad by some bad html from
Frontpage 4.0.

Tidy throws away a name-anchor when it is represented as:
"<P><A NAME="clipboard"><H3>Clipboard</H3></A><P>"

That becomes "<H3>Clipboard</H3>".

I'm running as new a version as I could easily find.  For completeness I'm
including a) how it runs, b) the test input file and c) the output file.


||||||||||||||||||||||||||||||||||||||||||||||||||||||||

   Drew Burton
   drb@ams.org 
   Ann Arbor, Michigan
   American Mathematical Society

||||||||||||||||||||||||||||||||||||||||||||||||||||||||


a) how it runs
Tidy (vers 30th April 2000) Parsing "frontpage.html"
line 10 column 24 - Warning: missing </a> before <h3>
line 10 column 24 - Warning: trimming empty <p>
line 10 column 42 - Warning: discarding unexpected </a>

"frontpage.html" appears to be HTML 4.01 Transitional
3 warnings/errors were found!


b) 
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML>
<HEAD>
<META NAME="generator" CONTENT="HTML Tidy, see www.w3.org">
<META NAME="generator" CONTENT="Microsoft FrontPage 4.0">
<TITLE>MATHSCINET SEARCH RESULTS</TITLE>
</HEAD>
<BODY>
A fragment of htmp created by Frontpage.....
<P><A NAME="clipboard"><H3>Clipboard</H3></A><P>

lots more stuff deleted....

</BODY>
</HTML>


c) output file

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<meta name="generator" content="HTML Tidy, see www.w3.org">
<meta name="generator" content="Microsoft FrontPage 4.0">
<title>MATHSCINET SEARCH RESULTS</title>
</head>
<body>
A fragment of htmp created by Frontpage..... 

<h3>Clipboard</h3>

<p>lots more stuff deleted....</p>
</body>
</html>

Received on Tuesday, 5 September 2000 13:54:40 UTC