14548 – Grouping Content: algorithm for incrementing value (OL->LI @value) does not match any current user agent

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 14548 - Grouping Content: algorithm for incrementing value (OL->LI @value) does not match any current user agent

Summary: Grouping Content: algorithm for incrementing value (OL->LI @value) does not m...

Status:	CLOSED WONTFIX

Alias:	None

Product:	HTML WG
Classification:	Unclassified
Component:	HTML5 spec (show other bugs)
Version:	unspecified
Hardware:	All All

Importance:	P2 minor
Target Milestone:	---
Assignee:	Ian 'Hixie' Hickson
QA Contact:	HTML WG Bugzilla archive list

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2011-10-23 14:03 UTC by theimp
Modified:	2011-11-05 02:59 UTC (History)
CC List:	4 users (show)

See Also:

Attachments
Initial post in HTML markup due to heavy pseudo-markup and tables (12.75 KB, text/html) 2011-10-23 14:03 UTC, theimp	Details

Description theimp 2011-10-23 14:03:41 UTC

Created attachment 1036 [details]
Initial post in HTML markup due to heavy pseudo-markup and tables

When I went to investigate this, I tested how a lot of user agents currently handle their numbering rules, and none of them match the current spec. (and very few come close to the spec. or each other, and no major ones match each other exactly).

Results:
* Lynx comes the closest to what the HTML5 spec. currently says, but is the only browser that performs incorrectly on the (invalid) initial positive sign symbol (U+002B PLUS SIGN).
* Only Opera and Lynx support negative numbers at all (except through unhandled overflows).
* IE and Webkit do not support the value 0.
* An empty string is treated according to the spec. by Gecko, Webkit, Presto and Lynx. It always returns 0 for w3m and Dillo. It always returns 1 for IE.
* There's a three-way split between how invalid values that begin with valid characters, are currently interpreted. Gecko ignores the entire value, treating it as if no value was specified (incrementing the previous value by 1). IE ignores the whole value but returns an absolute value of 1. The rest, per the spec., read all valid characters up to the first invalid character, and use that.
* As a special case of the above, all browsers will use a value that is invalid, due to containing a whitespace character as the first invalid character (that is, contains valid digits and then one or more whitespace characters and then any other non-whitespace characters), using the characters up to the first invalid character as above.
* As a special exception of the above, Gecko, if the first invalid character is a period (U+002E FULL STOP) but there are valid numeric digit characters after it, will use those digits as a number, ignoring the period.
* Presto, w3m and Dillo will treat invalid values as a 0. IE will treat invalid values as a 1. All others treat invalid values as an increment of the previous value.

                   Sample Ordered List Value Attributes                   
Markup                HTML5*  IE   Gecko  Webkit  Presto  Lynx  w3m  Dillo
<li            ></li>   1      1     1       1       1      1    1     1  
<li value="0"  ></li>   0      2     0       2       0      0    0     0  
<li value="1"  ></li>   1      1     1       1       1      1    1     1  
<li value="3a" ></li>   3      1     3       3       3      3    3     3  
<li value="3b" ></li>   3      1     4       3       3      3    3     3  
<li value="2  "></li>   2      2     2       2       2      2    2     2  
<li value="2 1"></li>   2      2     2       2       2      2    2     2  
<li value="2.1"></li>   2      2     2       2       2      2    2     2  
<li value=".2" ></li>   3      3     2       3       0      3    0     0  
<li value="-3" ></li>  -3      3     0       4      -3     -3    0     0  
<li value="+3" ></li>   3      3     3       3       3     -2    3     3  
<li value="-3" ></li>  -3      4     0       4      -3     -3    0     0  
<li value=""   ></li>  -2      1     1       5       0     -2    0     0  
<li            ></li>  -1      2     2       6       1     -1    1     1  
<li value=""   ></li>   0      1     3       7       0      0    0     0  
<li value="4"  ></li>   4      4     4       4       4      4    4     4  
<li value="5"  ></li>   5      5     5       5       5      5    5     5  
<li            ></li>   6      6     6       6       6      6    6     6  
<li value="c"  ></li>   7      1     7       7       0      7    0     0  
*The current HTML5 algorithm

Details:

IE
* By default, the value is an automatic increment of the previous number
* Values greater than 2147483647 return 2147483647
* Values less than 0 return an automatic increment of the previous number
* An empty string returns 1
* A string not beginning with any numeric characters (including U+002B PLUS SIGN, U+002D HYPHEN-MINUS) returns 1
* Note: automatic increments of the previous value are 1 if there is no previous value, but overflow to -2147483648 if the previous value was 2147483647

Webkit
* By default, the value is an automatic increment of the previous number
* Values greater than 2147483647 return an automatic increment of the previous number
* Values less than 0 return an automatic increment of the previous number
* An empty string returns an automatic increment of the previous number
* A string not beginning with any numeric characters (including U+002B PLUS SIGN, U+002D HYPHEN-MINUS) returns an automatic increment of the previous number
* Note: automatic increments of the previous value are 1 if there is no previous value, but overflow to -2147483648 if the previous value was 2147483647

Gecko
* By default, the value is an automatic increment of the previous number
* Values greater than 2147483647 return an automatic increment of the previous number
* Values less than 0 return 0, unless it's also less than -2147483647, in which case it returns an automatic increment of the previous number
* An empty string returns an automatic increment of the previous number
* A string not beginning with any numeric characters (including U+002B PLUS SIGN, U+002D HYPHEN-MINUS) returns an automatic increment of the previous number
* Note: strings beginning with U+002E FULL STOP ignore that character
* Note: automatic increments of the previous value are 1 if there is no previous value, but overflow to -2147483648 if the previous value was 2147483647

Opera
* By default, the value is an automatic increment of the previous number
* Values greater than 536870911 return 536870911
* Values less than -536870912 return -536870912
* An empty string returns 0
* A string not beginning with any numeric characters (including U+002B PLUS SIGN, U+002D HYPHEN-MINUS) returns 0
* Note: automatic increments of the previous value are 1 if there is no previous value

Dillo
* By default, the value is an automatic increment of the previous number
* Values greater than 2147483647 return 2147483647
* Values less than 0 return 0
* An empty string returns 0
* A string not beginning with any numeric characters (including U+002B PLUS SIGN, U+002D HYPHEN-MINUS) returns 0
* Note: automatic increments of the previous value are 1 if there is no previous value

Lynx
* By default, the value is an automatic increment of the previous number
* Values greater than 2147483647 return 2147483647
* Values less than -29997 return -29997
* Note: This behavior is documented: http://lynx.isc.org/current/lynx2-8-8/lynx_help/Lynx_users_guide.html#Lists
* An empty string returns an automatic increment of the previous number
* A string not beginning with any numeric characters (including U+002D HYPHEN-MINUS) returns an automatic increment of the previous number
* Note: automatic increments of the previous value are 1 if there is no previous value

w3m
* By default, the value is an automatic increment of the previous number
* Values greater than 2147483647 return 2147483647
* Values less than 0 return 0
* An empty string returns 0
* A string not beginning with any numeric characters (including U+002B PLUS SIGN, U+002D HYPHEN-MINUS) returns 0
* Note: automatic increments of the previous value are 1 if there is no previous value

Notes:

* Versions of user agents included the latest versions, all commonly-used versions, and many others, including extremely old versions (for example, IE was tested between versions 3 and 9). Occasional variations exist only in extremely old versions (for example, IE version 2 has different behavior to all other versions), however, no other browsers, or old versions of browsers, seem to be worth reporting on due to the small number of active deployments.

* The value 2147483647 is the highest value that can be stored in a signed 32-bit integer. The value -2147483648 is the lowest value that can be stored in a signed 32-bit integer. In principle, software compiled using a different int width may have different, corresponding limits; although in practice this is very rare.

* Also, note that the current HTML5 algorithms is slightly inconsistent with CSS (which explicitly allows U+002B PLUS SIGN to indicate positive integers). I don't know if mapping this attribute to CSS is a concern.

***

Personally, I don't have a solution to propose; this is purely informative.

Comment 1 Ian 'Hixie' Hickson 2011-10-25 05:00:41 UTC

Yeah, I discovered much the same thing in my own research on this topic. The spec represents what I thought was the best compromise possible.

I'm open to changing it if there's something specific that should be changed, but I'm not sure what to do otherwise. Is the spec as it stands today sufficient?

Comment 2 Daniel.S 2011-10-25 15:41:16 UTC

(In reply to comment #1)
> I'm open to changing it if there's something specific that should be changed,
> but I'm not sure what to do otherwise. Is the spec as it stands today
> sufficient?

1. Mozilla implemented the HTML5 algorithm in Firefox 9. It passes the tests of the initial reporter. So I'm not sure what "Versions of user agents included the latest versions" refers to.

2. Although some browsers do not support negative li@value values, they *all* support negative ol@start values, so negative counters are no real problem.

3. Opera 11.5 only fails 3 tests (value=".2", value="" and value="c").

IE10 PP2 doesn't support yet non-positive li@value values, but MS knows about the issue. And their algorithm is probably as old as the whole list implementation.
Chrome 14 also doesn't support non-positive values for li@value.

I think the current state of the spec is a sane one that can be implemented.
The big problems seem to be error recovery and non-positive values. The latter can be easily fixed as seen in Gecko. I don't know if the former is really influencing real world websites.

Comment 3 theimp 2011-10-25 18:14:40 UTC

Errata to comment #0:

The first five values for Gecko in the table:
> 1,0,1,3,4
  1.0,1,2,3

IE
> * Values less than 0 return an automatic increment of the previous number
  * Values less than 1 return an automatic increment of the previous number

Note:
When the phrase "Values greater than" or "Values less than" are used, it could be more clearly written as "Value attributes explicitly set to" and "greater than" or "less than", respectively.

***

> Firefox 9
> IE10 PP2
> Chrome 14

I was only testing release versions.

I don't know if @value is being reintroduced for backwards-compatibility with pages written to take advantage of older user agents, or if it is intended to be used with its new behavior going forward to unify HTML with various author requirements. If it's the former, then the behavior of browsers of the future are less critical. Also note that the current behavior, as I reported it, has been stable and predicable for over 10 years in these browsers - I usually went back to the very first versions. IE 2 was the only major exception to browser "families" in their whole existence, and that's not a browser that *anyone* is trying to be compatible with.

> Opera 11.5 only fails 3 tests

I think the table reflects that, yes? Most browsers only failed about three tests; just not the same ones :-(

***

Personally, I think that allowing negative numbers is useful, and the value zero is very important. The most consistent results are obtained by forbidding them; even so, compatibility with the (currently draft) CSS Lists Module requires allowing them. HTML4 allowed 0, but not negatives; that fact is mostly academic, though.

The positive sign symbol is, I think, the most difficult part to reconcile. Personally, I'd either make it fully conforming (Lynx would have to change its behavior, and as an important and stable browser, that's hard), which would also gain the benefits of compatibility with CSS; or else, forbid it entirely. I think that it's not good to specify the behavior but forbid the usage, when the behavior is not universal.

Do you have any idea what the actual usage of otherwise-valid integer values prepended with U+002B PLUS SIGN actually is? I would guess it would be extremely rare.

I wonder if "conforming limits" might be useful. Implementation limits are essential, because some kind of limit is unavoidable. Realistically, I can't think of a scenario where a list is practical beyond a few thousand entries, or is useful with starting numbers beyond a few million. I think the CSS Lists Module group are considering whether a two- (about 32,000) or four-byte (about 2 billion) minimum limit is acceptable. Most browsers, even extremely old or mobile browsers, support 4-byte integers (though Opera only uses the 31 least-significant bits).

Also, whatever the limit is - even if unspecified - the spec. should describe the behavior when exceeded. Should the value be capped within the limit (probably the best idea, and also the most compatible), or should it be treated as an erroneous value (with an implied automatic increment)? I agree it's mostly a rare event, but spec. ambiguity seems to be the result of more wasted effort than everything else combined. I believe that ensuring that user agents are consistent is a primary goal of HTML5; if I understand correctly, reliability is at least as important as functionality for authors, in the opinion of the HTML5 stakeholders.

I do wonder if it might be better to require either completely-conforming values or else discard the entire sequence. I don't really think that a value like "324fq!n6ireb" should return "324". How commonly is this relied upon by authors? Is this for some kind of DOM- (as opposed to source-) level safety catch for authors that set attributes with strings instead of integers?

Really, I would imagine that the best idea of what do, exactly, would come from considering the question: is this mostly for the benefit of legacy documents, or is there some reconsideration of where list values fit in the content/structure/presentation triumvirate?

If it is expected that the value attribute will almost entirely be for the benefit of legacy documents, and that new documents will use the CSS Lists Module, then it probably should match either where there is current consensus among major browsers, or where there was once (basically, IE, which was made to be mostly compatible with Netscape Navigator).

If, however, it is designed to complement the CSS Lists Module, by providing alternate means of providing the same (at least numerical) information in a way that is more robust in old browsers or unusual interchanges, I would favor compatibility with CSS. This would mostly imply that the positive sign symbol should be permitted.

If it's intended to be a method of marking up lists with complex values that are actually structure, as an alternative to presentation, then it needs different rules altogether. Probably starting with a requirement to use it on all LI elements in an OL block, and no automatic incrementing. Although I understand that some people believe this is reasonable (frequently, people used to using document publishing software to create complex lists), I think that lists where the sequence counters are structure are really tabular information. So, I don't recommend this idea.

If it's intended to follow the previous standard, then neither the negative nor positive sign symbols are permitted, and neither are empty values, but the value zero is. Handling of invalid values is undefined. It might make a good compromise, but I don't think anyone actually wants this.

In all cases, I think that a reasonable minimum implementation range should be specified, and error handling for values that exceed this range should also be specified.

> I think the current state of the spec is a sane one that can be implemented.

Absolutely, although the question that I'm asking is: what is the spec trying to achieve, given that it's currently another standard to add to the half-dozen other de facto standards that can be seen now. Is it supposed to unify behavior going forward, or is it supposed to bridge behavior for those catching up? Or both? That will pretty much indicate what is better.

Comment 4 Daniel.S 2011-10-26 16:38:58 UTC

(In reply to comment #3)
> I was only testing release versions.

I admit, it's difficult to stay up to date in the current situation. But I think it' important to take into account even the latest changes.

> I don't know if @value is being reintroduced for backwards-compatibility with
> pages written to take advantage of older user agents, or if it is intended to
> be used with its new behavior going forward to unify HTML with various author
> requirements. If it's the former, then the behavior of browsers of the future
> are less critical. Also note that the current behavior, as I reported it, has
> been stable and predicable for over 10 years in these browsers - I usually
> went back to the very first versions. IE 2 was the only major exception to
> browser "families" in their whole existence, and that's not a browser that
> *anyone* is trying to be compatible with.

A part of the web authoring community, including myself, feels that the start and value attributes have erratically been removed from HTML 4 Strict. There are cases where numbering is important, for example if a ordered list is shortened, you still want the index to be the same.

Besides, all browsers already allow negative numbering, so lifting the restriction on value makes sense.

Browsers are stable here, because the issue simply isn't really a big deal.
Apparently interoperability wasn't necessary in the past.

> I think the table reflects that, yes? Most browsers only failed about three
> tests; just not the same ones :-(

Yes, sorry for repetitions.

> Do you have any idea what the actual usage of otherwise-valid integer values
> prepended with U+002B PLUS SIGN actually is? I would guess it would be
> extremely rare.

I'd guess the same.

> Really, I would imagine that the best idea of what do, exactly, would come
> from considering the question: is this mostly for the benefit of legacy
> documents, or is there some reconsideration of where list values fit in the
> content/structure/presentation triumvirate?

See above :) There is some content value in li@value.

Comment 5 theimp 2011-10-27 04:40:16 UTC

> I admit, it's difficult to stay up to date in the current situation. But I think it' important to take into account even the latest changes.

Yes, I agree that testing betas is usually important. But what I meant was, no-one is writing documents based upon these new versions, yet. What I think we have to worry about is authors who have, over the last 15+ years, relied upon some particular behavior. We don't really have to worry as much about how far a given vendor is from implementing the spec. (unless they're going to refuse altogether).

It's a new spec.; we can do whatever we want. But vendors will not (properly) implement any spec. that messes too much with what users/authors expect (wherever those expectations have come from). Raising this bug was just an exercise in caution. I have no stake in this, and will be just as happy with WONTFIX, or even INVALID, as any other result.

> A part of the web authoring community, including myself, feels that the start and value attributes have erratically been removed from HTML 4 Strict. There are cases where numbering is important, for example if a ordered list is shortened, you still want the index to be the same.

I'm not here to debate correct usage. I would, with caveats that do not belong here, agree that sometimes list values are content. But that's neither here nor there (from my point of view).

The value/start/type attributes don't come close the the flexibility of the CSS Lists Module. I don't know how much there is to gain even by changing it from the HTML4 standard (that, looking at the table, no vendor ever quite implemented correctly anyway). But, again, I'm not saying that this should be done any particular way.

> Besides, all browsers already allow negative numbering, so lifting the restriction on value makes sense.

Only if you start at a negative number, and count in one direction, and don't skip any numbers until you get to the positives. But I guess that is, after all, by far the most likely use case for negative numbers.

> the issue simply isn't really a big deal.

Well, I guess I won't disagree.

> Apparently interoperability wasn't necessary in the past.

On the other hand, I would hope we want to do better in the future. Did it not matter because there no use case, or because lack of use was caused by the impossibility of doing what one wanted? To find out, we'd have to look at every other structure (inline text, tables, etc.) and determine "would they likely have used the OL structure if it could render zero/negative numbers on all user agents?".

I am simply concerned that, leaving room for user agents to do what they want in edge cases, it'll cause problems later. Sure, this example is trivial, but the number of "trivial" cases that have been exploited to compound or evade other less trivial bugs is a large part of the history of the web, given that authors are always pushing the limits of what browsers can do. And worse, once it starts being (ab)used in such a way, it's extremely painful to specify exact behavior later because a massive number of pages depend upon it. What about a theoretical script that sniffs for the user agent by looking at how it handles limits/overflows?

Comment 6 Ian 'Hixie' Hickson 2011-11-01 17:15:21 UTC

> I wonder if "conforming limits" might be useful. Implementation limits are
> essential, because some kind of limit is unavoidable.

In general I've tried to avoid specifying such limits because over time, different limits become possible. e.g. what is a reasonable limit on a 32bit system is not on a 64bit system. Some UAs may get some benefit from using one or two of the higher-order bits for some internal state, making the ideal number for some browsers different than others. I think authors understand that when they push the limits, the results won't be the same everywhere; furthermore, as the "correct" behaviour at any particular number is clear, the risk of us eventually relying on a particular vendor's error handling for these cases is limited compared to other situations on the platform.


> Realistically, I can't
> think of a scenario where a list is practical beyond a few thousand entries, or
> is useful with starting numbers beyond a few million.

What about a list that talks about who owns what dollars of the US debt? One could easily imagine a list with a few list items with values in the trillions. Or a list where the values are distances from earth; one could then imagine a list with truly astronomical numbers if the units used are meters.


> Also, whatever the limit is - even if unspecified - the spec. should describe
> the behavior when exceeded.

The behaviour is described. It just keeps going. :-)


> I do wonder if it might be better to require either completely-conforming
> values or else discard the entire sequence. I don't really think that a value
> like "324fq!n6ireb" should return "324". How commonly is this relied upon by
> authors? Is this for some kind of DOM- (as opposed to source-) level safety
> catch for authors that set attributes with strings instead of integers?

I'm pretty sure we can't change this, for legacy compat reasons.


> Really, I would imagine that the best idea of what do, exactly, would come from
> considering the question: is this mostly for the benefit of legacy documents,
> or is there some reconsideration of where list values fit in the
> content/structure/presentation triumvirate?

Both.


> Absolutely, although the question that I'm asking is: what is the spec trying
> to achieve, given that it's currently another standard to add to the half-dozen
> other de facto standards that can be seen now. Is it supposed to unify behavior
> going forward, or is it supposed to bridge behavior for those catching up? Or
> both? That will pretty much indicate what is better.

Both.

Comment 7 theimp 2011-11-02 09:52:37 UTC

> In general I've tried to avoid specifying such limits because over time, different limits become possible.

I was thinking limits like: 
"Vendors should use at least a 16-bit signed integer to store a list counter; this is an implementation requirement. Authors should avoid using numbers beyond the range -32768 to 32767, because they may not be counted predictably; if larger number ranges are needed, authors must test the values with all software that they require compatibility with."

The practical limits for the sizes of lists is unlikely to change much. It's just not likely to ever be realistic for humans to read lists with counts of two billion or so.

If software needs to manipulate such lists, then that's fine: that software can just make sure it uses as many bits as it needs (if authors are coding with specific software in mind, we're really not taking about general-purpose HTML anymore).

More significantly, though, how long will it be before HTML6? (I understand about HTML version numbers; I just mean, how long before the limit can be realistically revised?) 15 years? A given limit does not need to extend unto eternity. I believe that the current position is more-or-less that the spec. de jure is not even meaningful if all user agents decide to do something different; so it wouldn't even take a spec. update for authors to get the benefit of user agents reliably supporting other values. The spec. is just a recommended starting point to try to avoid the balkanization that was common when vendors decided to define their implementations unilaterally.

> e.g. what is a reasonable limit on a 32bit system is not on a 64bit system. Some UAs may get some benefit from using one or two of the higher-order bits for some internal state, making the ideal number for some browsers different than others.

Vendors are likely to use the same limit for the entire user agent family (rendering core), irrespective of the target hardware. Any user agent that can be compiled for both a 32-bit and 64-bit environment will likely use the exact same data structure. Opera's desktop limit, for example, might plausibly arise from limits due to the unified codebase for Opera Mobile/Opera Mini, etc.

Lynx has an arbitrary limit for negative numbers, probably related to Roman Numeral rendering concerns. This is a different kind of limit, but one that authors still need to know about (at least it's documented).

JavaScript (currently) has an arbitrary range. CSS does not, but for practical purposes an author would never notice, because the only values that can be so large will also be realistically unusable (currently). Where this is not true (for example. the CSS Lists Module), there is consideration for including a realistic minimum limit.

> I think authors understand that when they push the limits, the results won't be the same everywhere;

Does an author understand what's so special about the number 2147483647 that they get bizarre results when it's exceeded? Do they understand that not all browsers will behave the same at some limit or another (but they usually won't document where and how, so you'll have to test them all yourself, including past and future versions), and that they should consider another structure if it is critical that it renders identically?

If a major vendor used a four-bit counter, and claimed that this was okay because 99.999% of lists use only positive numbers and stop before 16, would this be an appropriate limit?

How does one assess conformance with an impossible limit? If it's okay to say "we use the HTML5 algorithm, with implementation-specific constraints", then why can't another say the exact same thing but explain: "But we use an unsigned integer, so while we follow the algorithm, logical constraints prevent negative numbers"? It's not that they're not following the spec.; it's that the limits of binary arithmetic don't allow negative numbers within the constraints that they have imposed. We've seen stranger claims.

In fact, you could also define an integer to allow only negative numbers. Or only even numbers. Or any arbitrary set of numbers. "Data is what you define it to be". If the spec. doesn't define it appropriately, that falls to the user agent developers, and they could define it as practically anything. Of course, they won't do anything silly; but the point is that it really shouldn't be up to anyone else, if authors are to get predictable results.

> furthermore, as the "correct" behaviour at any particular number is clear, the risk of us eventually relying on a particular vendor's error handling for these cases is limited compared to other situations on the platform.

The CSS Box Model was as clear as can be, but probably half of the DIV elements and lines of CSS ever written were to force a certain user agent to behave in a certain way.

> What about a list that talks about who owns what dollars of the US debt? One could easily imagine a list with a few list items with values in the trillions. Or a list where the values are distances from earth; one could then imagine a list with truly astronomical numbers if the units used are meters.

Have you ever seen such a list (formatted as a list)? I haven't.

I'm not exactly a semantic purist, but if I understand your examples, then it's an extremely bad way to use the OL structure. The list counters become content; the examples are data that belongs in a two-dimensional list (DL) or a table.

(Also ignoring that most such examples will require symbols such as $ or %, or will require fractions, or need markup such as ABBR, or otherwise be unrealistic with the spartan list structure in HTML.)

I will rephrase: "I can't think of a scenario where a list is practical beyond a few thousand entries, or is useful with starting numbers beyond a few million, when the counters do nothing other than count the entries (i.e. are not content)."

The number of realistic use-cases are so few that I can't think of one (and that's not because I haven't tried).

If the counters are content, the "list" almost certainly belongs in a definition list or table. If the "counters" are presentational markers, then that's a job for CSS. If they're structure, the practical limits are what humans can use and not how many bits a developer can allocate.

This is why negative values, or multi-billion numeric counts, though perhaps useful in some extreme scenarios, are not really needed: counters should count, nothing more. OL allocates counters because it expresses the semantic difference from UL that the order of the list matters. It is not really for numbering *per se*, in a word-processing sense; that's mostly presentation, handled by CSS. It's not for organizing two-dimensional content; other structures, such as definition lists and tables, do that.

All that you truly need, structurally, is that entries that come before a given element are numbered smaller, and entries that come after a given element are numbered larger. Still, @value has much usefulness even if only from the point of view that it allows you to use CSS such as: 

li[value=1] { list-style-image: url("http://www.example.com/number1.png") } 

and this makes it reasonable to use the actual numeric values that are equivalent to your presentation, because this keeps the list consistent for accessible access, for example. So arbitrary numbering is not automatically bad, even when using CSS.

But even if you disagree with the content/structure/presentation argument, you'd have to agree that HTML, without CSS, cannot come close to what authors would like from a presentational point of view. You can't enter unnumbered entries, for example. Or compound subvalues (such as "2b" for the second child of the second child). Such use cases (while impossible due to IDL compatibility constraints) are plentiful and significant, while negative numbers and multi-billion numeric counts are features that almost no-one has ever asked for. Almost any use case is possible with CSS, but if you're relying on CSS then why even change the behavior from HTML4? Or even what most browsers implemented?

I can't see the point, but please understand that this is no reason to not allow something. I am not challenging the spec., just seeking to understand it so that possible issues are dealt with. Since you have answers that seem to satisfy you, to leave it unchanged, that's fine with me.

> The behaviour is described. It just keeps going. :-)

Behavior that is not currently, has not ever been, will not ever be, and cannot ever be, implemented by anyone. And with substitute behavior which varies (some cap, some overflow) at more than one different limit, and also differently depending upon how that limit is reached.

Even if developers attempt to have no "hard" limits, and have the limits imposed at compile-time by the configuration, or at run-time by the amount of memory; these kind of requirements make testing extremely difficult, and bugs easy to miss until they cause spectacular havoc. This could also affect authoring software and even automatic data processing software. Okay, so it's not the spec. author's duty to ensure that software does not contain bugs, but I wonder if pragmatism might be beneficial. Software bugs don't just affect software developers, after all.

I'll admit that I agree with the philosophy in principle, and that the empirical limits are already unlikely to actually be encountered in any useful application. So we can scratch this idea if you think it's better.

> I'm pretty sure we can't change this, for legacy compat reasons.

Well, I guess that if authors behave themselves, it's not a problem (big "if"!).

I admit, I looked hard and couldn't find an example or a likely reason, but I'm not going to challenge you on that.

***

Summary:

The initial positive sign symbol (U+002B PLUS SIGN):
Pros: Compatible with CSS numbering, allowing trivial machine transformations for cases where the @value attribute is presentational or where documents are converted.
Cons: Causes problems for a major, widely-deployed client that is very infrequently upgraded and which is typically used in environments (such as remote shell sessions, or in conjunction with accessible input/output hardware) where it cannot be upgraded directly by the user and where another browser cannot be used. Forbidden by previous HTML specs.

Negative numbers:
Pros: Occasionally useful. Compatible with CSS numbering.
Cons: Widely unsupported currently. Widely unsupported historically. Forbidden by previous HTML specs.

The value zero:
Pros: Occasionally useful. Compatible with CSS numbering. Permitted by previous HTML specs.
Cons: Somewhat supported currently. Somewhat supported historically.

Implementation limits:
Pros: Unavoidable in practice. Makes rendering more predictable for authors. Makes testing more practical for developers. Makes conformance more conclusive.
Cons: Previously not specified. Constrains usage artificially.


Basically every vendor has to fix at least one problem with their list implementation under the current spec, and no behavior can be defined that is compatible with everything unless it matches the one vendor that has not committed to changing anything (naming no names). That's not a great idea, because the behavior is both quirky and doesn't match the previous specs., so, on the balance of everything, it's probably best to emphasize future usefulness.

I'd make the the initial positive sign symbol (U+002B PLUS SIGN) valid for authors to use. If it has to be supported anyhow, you lose nothing by allowing it, but gain complete CSS compatibility for all legal values. I see no reason to forbid it, and there's less fuss that way. I understand that it is unlikely to encounter such values in CSS, either. Recommend that it not be used, if that is a major concern.

If you think this is not a good idea, I presume that's because a major user agent is currently incompatible with it. But then, negative numbers cause much bigger problems, and they're in the new spec.

Comment 8 Ian 'Hixie' Hickson 2011-11-02 21:38:55 UTC

It would be helpful if you could split out your concerns into separate bugs, if you're going to post comments that big.

In brief:

> If a major vendor used a four-bit counter, and claimed that this was okay
> because 99.999% of lists use only positive numbers and stop before 16, would
> this be an appropriate limit?

Sure. Using that limit might mean that on your particular hardware platform, you are more competitive. Or, it might be that using such a limit makes you very _non_competitive and so you lose market share.


> The CSS Box Model was as clear as can be, but probably half of the DIV elements
> and lines of CSS ever written were to force a certain user agent to behave in a
> certain way.

The CSS box model is orders of magnitude more complicated than this case.


At the end of the day, I just don't believe that this limit matters in practice.


> I'd make the the initial positive sign symbol (U+002B PLUS SIGN) valid for
> authors to use. If it has to be supported anyhow, you lose nothing by allowing
> it, but gain complete CSS compatibility for all legal values. I see no reason
> to forbid it, and there's less fuss that way. I understand that it is unlikely
> to encounter such values in CSS, either. Recommend that it not be used, if that
> is a major concern.

I don't really see any reason to allow it. It doesn't do anything useful. By making it not allowed we let authors know they can omit it.

Comment 9 theimp 2011-11-03 03:23:52 UTC

> Importance: normal

Sorry, it was only "minor" when I first created it, and it didn't look like anyone thought it was an even a minor issue.

> It would be helpful if you could split out your concerns into separate bugs, if you're going to post comments that big.

Sorry, being so verbose is a bad habit of mine.

> I don't really see any reason to allow it. It doesn't do anything useful. By making it not allowed we let authors know they can omit it.

Well, authors don't seem to have used it much yet, even though it's been supported by every major browser, bar Lynx, for years. I don't think author confusion is a concern.

On the other hand, consider:

<style>
    ol { padding-left: 1em; }
    li { list-style-type: none; }
    li.five { counter-reset: five 5; }
    li.five:before { content: counter(five) "." "\000009"; }

</style>

...
<ol>

<li class="five" value="5">FIVE</li>

</ol>


If this was used:

    li.five { counter-reset: five +5; }

then it would still be valid CSS2.1; but using +5 in the value attribute would be invalid HTML. Despite both values having the same meaning and the same result.

A Content Management System developer might allow users to input the value, and use just a simple test to ensure that it evaluates to a number in CSS. This might be stored in a database. When retrieved, the CMS might use the value for both the CSS (which would be valid) and the value attribute (say, to support clients that do not process stylesheets). This might even have already happened, except that support for outputting the value attribute has not been added yet (due to being deprecated); when it is added to the CMS at a later date, some pages will become invalid.

While admittedly likely to be a minor case, it might have an impact on what has been done or might be done.

I, personally, don't see much reason to forbid it unless you were trying to simplify the attribute to just digits; but the negative sign is allowed (for probably no more benefit), so I would think it better to allow for consistency if nothing else.

***

Well, if you're satisfied that this not worth changing the spec. for (even though it does not affect user agents because they already have to handle it, and users shouldn't be writing serious HTML5 pages at this early stage), then I don't really have much else to add. I wasn't necessarily looking to have anything changed when I reported the bug, anyhow.

Please accept my apologies, again.

Comment 10 Ian 'Hixie' Hickson 2011-11-04 17:07:04 UTC

I think there's a (minor) benefit to having the positive-only numbers only allow digits, and there's a(n even more minor) benefit to making the allowed syntax for signed numbers be just the positive-only syntax with an optional "-", but I agree it's a very minor issue in either case.

Anyway. Thank you for doing such detailed research, it is much appreciated! I don't think we need to make a change here, though it's mostly a judgement call. Part of it is I try to minimise churn, so unless there's a strong reason to make a change like this, I'd rather avoid doing it.


EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Rejected
Change Description: no spec change
Rationale: see above