This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 25871 - What's the "character that caused the state machine…" when it was a branch on EOF? What about the transition from markup declaration open state which does lookahead of multiple charaters, so what's "the" character?
Summary: What's the "character that caused the state machine…" when it was a branch on...
Status: RESOLVED FIXED
Alias: None
Product: WHATWG
Classification: Unclassified
Component: HTML (show other bugs)
Version: unspecified
Hardware: Other other
: P3 normal
Target Milestone: Unsorted
Assignee: Ian 'Hixie' Hickson
QA Contact: contributor
URL: http://www.whatwg.org/specs/web-apps/...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-05-23 03:02 UTC by contributor
Modified: 2014-08-05 19:34 UTC (History)
3 users (show)

See Also:


Attachments

Description contributor 2014-05-23 03:02:14 UTC
Specification: http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html
Multipage: http://www.whatwg.org/C#bogus-comment-state
Complete: http://www.whatwg.org/c#bogus-comment-state
Referrer: http://www.whatwg.org/specs/web-apps/current-work/multipage/

Comment:
What's the "character that caused the state machine…" when it was a branch
on EOF? What about the transition from markup declaration open state which
does lookahead of multiple charaters, so what's "the" character?

Posted from: 86.30.234.99 by geoffers@gmail.com
User agent: Mozilla/5.0 (X11; Linux x86_64; rv:29.0) Gecko/20100101 Firefox/29.0
Comment 1 Ian 'Hixie' Hickson 2014-05-23 19:52:30 UTC
EOF is a character. See the last paragraph of:
 http://www.whatwg.org/specs/web-apps/current-work/#preprocessing-the-input-stream

What's the case where you enter the bogus comment state from a string?
Comment 2 Geoffrey Sneddon 2014-05-23 21:54:09 UTC
Then how do we include an EOF character in a comment token? Surely every UA is non-conforming given they don't include the EOF character in that case?

And the second case was me misreading the spec, so ignore that.
Comment 3 Ian 'Hixie' Hickson 2014-05-27 06:10:50 UTC
Does the parenthetical at the end of the paragraph not sufficiently cover this? Maybe I should remove the parentheses?
Comment 4 Geoffrey Sneddon 2014-05-27 22:27:25 UTC
That sounds to me like it's meant to be restating what has already been stated, and therefore contradicting the above, given the token should include the character that caused the transition (i.e., EOF) while also being empty.
Comment 5 Ian 'Hixie' Hickson 2014-05-28 05:37:33 UTC
Well right now the other text says "the characters starting from and including the character that caused the state machine to switch into the bogus comment state, up to and including the character immediately before the last consumed character", which is a negative-length string in the case of "<!" followed by EOF. The parenthetical was trying to clean that up.

But I can try to improve this.
Comment 6 Ian 'Hixie' Hickson 2014-08-05 19:33:57 UTC
Please reopen if the fix isn't sufficient.
Comment 7 contributor 2014-08-05 19:34:51 UTC
Checked in as WHATWG revision r8709.
Check-in comment: Try to avoid negative-length strings and strings with EOF characters in them...
http://html5.org/tools/web-apps-tracker?from=8708&to=8709