This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 14032 - Attribute values cannot contain ambiguous ampersands
Summary: Attribute values cannot contain ambiguous ampersands
Status: VERIFIED FIXED
Alias: None
Product: HTML WG
Classification: Unclassified
Component: HTML5: The Markup Language (editor: Michael(tm) Smith) (show other bugs)
Version: unspecified
Hardware: All Linux
: P2 normal
Target Milestone: ---
Assignee: Michael[tm] Smith
QA Contact: HTML WG Bugzilla archive list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-09-04 22:31 UTC by Filipus Klutiero
Modified: 2012-10-06 14:19 UTC (History)
3 users (show)

See Also:


Attachments

Description Filipus Klutiero 2011-09-04 22:31:11 UTC
As indicated in http://dev.w3.org/html5/spec/syntax.html#attributes-0 :

Attribute values are a mixture of text and character references, except with the additional restriction that the text cannot contain an ambiguous ampersand.

The reference does not mention that:

attribute values can contain text and character references, with additional restrictions depending on whether they are unquoted attribute values, single-quoted attribute values, or double-quoted attribute values. Also, the HTML elements section of this reference describes further restrictions on the allowed values of particular attributes, and attributes must have values that conform to those restrictions.

http://dev.w3.org/html5/markup/syntax.html#syntax-attributes


Note that the definition of ambiguous ampersands differs from the specification's. The specification says:

An ambiguous ampersand is a U+0026 AMPERSAND character (&) that is followed by one or more characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9), U+0061 LATIN SMALL LETTER A to U+007A LATIN SMALL LETTER Z, and U+0041 LATIN CAPITAL LETTER A to U+005A LATIN CAPITAL LETTER Z, followed by a U+003B SEMICOLON character (;), where these characters do not match any of the names given in the named character references section.

The reference says:

An ambiguous ampersand is an "&" character that is followed by some text other than a space character, a "<", character, or another "&" character.
Comment 1 Michael[tm] Smith 2012-10-06 10:18:32 UTC
I updated the doc so that it now matches the spec:

http://dev.w3.org/html5/markup/syntax.html#syntax-ambiguous-ampersand