99.99999% of the Web was invalid HTML. W3C pretended that didn't exist. This isn't a workable solution.
Taking your tagsoups and making valid HTML 4.01 or valid XHTML 1.0 gave the developer no new features and no benefit besides a gold star and a pat on the back
(well, except for consistent parsing, a reliable DOM, and easier scripting and styling and document maintenance)
FAQ on HTML Invited Experts, need a better name
The two serialisations are exactly equivalent, both are produced from the same DOM, they can be interconverted.
Pretending to base HTML on SGML is no longer funny
Need a transition strategy, and a bridge so that the more capable forms processors can also read 'classic HTML' forms. Cue next talk ...