HtmlDiff
- myobie's htmldiff Solid library in Ruby with a lot of forks
- PHP port of myobie's htmldiff Written in PHP with some improvements. Not too fast, but precise
- HTML Diff Web Service Built on myobie and rashid2538 libraries
- DaisyDiff
- Bert Bos's htmldiff-0.4 Fast, but doesn't handle radical changes well
- Python Script by Ian Bicking Quite slow for large files, but handles radical changes very well
- htmldiff.py on github, based on Ian Bicking's original script, some improvements for whitespace and script handling
- Python Script by Aaron Swartz (GPL) Unusably slow for large files
- HTML::Diff Library
- diffmk by Norm Walsh
- Online W3C HTML Diff service based on Perl script
- Alternate online HTML Diff service based on a different version of the same script (related: source code)
- lxml.html.diff in python Seems to work pretty well
See also
It would be nice to have a comparison of the various tools to see how well they do it for different things like moving large sections, rewrites of sections, only minor differences between versions, changes only visible through view-source like new attributes, changes to white-space but nothing else, etc.
I ran
- the Aaron Swartz's python script
- the perl script which the Online W3C HTML Diff service is based on (http://www.themacs.com referenced in the source, link doesn't work at the moment -- http://htmlwg.mn.aptest.com/viewcvs/viewcvs.cgi/htmldiff/ now seems to be the home of the source)
on two HTML files like this:
<html> <body> <h1>Lorem Ipsum</h1> <p class="c1"> Lorem ipsum dolor sit amet... </p> </body> </html>
- File 1 had
<p class="c1">
(shown above) - File 2 had
<p class="c2">
.
It's not clear what an htmldiff
tool should do with such (or similar) input.
- The python script produced
...<h1>Lorem Ipsum</h1> <del class="diff modified"><p class="c1"></del><ins class="diff modified"><p class="c2"></ins> Lorem ipsum dolor sit amet...
The difference was picked up, but not in a usable way: how is this HTML going to be processed?
- The perl script produced
<html> <body> <h1> Lorem Ipsum </h1> <p class="c2"> Lorem ipsum dolor sit amet...
Difference not picked up, but maybe this is more usable.
Other attribute changes would expect to be handled the same way.
Maybe user-friendly inline diff is not possible for HTML? What if the class, or other attribute change, doesn't make any difference in the final display? No diff-tool is going to be able to tell. On the other hand, if there is no visible difference, the user is not going to know either when just shown two apparently identical blocks. Showing the user the class-difference itself would confuse many people.
Maybe use an outright "diff only on the text and images" and indicate differences in layout using IFRAME' (side-by-side view, not inline-highlights)
I didn't look at other tools from the list. -- swehner DateTime(2007-11-30T02:20:09Z)