28148 – Improve parsing of numbers for coords attribute

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 28148 - Improve parsing of numbers for coords attribute

Summary: Improve parsing of numbers for coords attribute

Status:	RESOLVED MOVED

Alias:	None

Product:	HTML WG
Classification:	Unclassified
Component:	HTML5 spec (show other bugs)
Version:	unspecified
Hardware:	PC Windows NT

Importance:	P1 normal
Target Milestone:	---
Assignee:	This bug has no owner yet - up for the taking
QA Contact:	HTML WG Bugzilla archive list

URL:
Whiteboard:	whatwg-resolved
Keywords:

Depends on:
Blocks:

Reported:	2015-03-05 21:35 UTC by Travis Leithead [MSFT]
Modified:	2016-04-29 14:56 UTC (History)
CC List:	7 users (show)

See Also:

Attachments

Description Travis Leithead [MSFT] 2015-03-05 21:35:59 UTC

For floating point arguments between 0 and 1 without a leading 0, the floating point value simply drops the decimal.  So for example, <area shape="rect" coords=".100, .100, 101, 101"> will result in a rect at "100, 100, 101, 101" rather than something more expected like "0, 0, 101, 101".

On the other hand, there's no reason why the coords attribute shouldn't accept a list of floating point values rather than integers, which could also circumvent the problem.

Comment 1 Boris Zbarsky 2015-03-06 02:30:24 UTC

> For floating point arguments between 0 and 1 without a leading 0, the floating
> point value simply drops the decimal.

Do UAs implement that?  Because I don't think Gecko does: we seem to chunk up the list on whitespace+commas and then just atoi the things in between, so .100 turns into 0.

> there's no reason why the coords attribute shouldn't accept a list of floating
> point values

Except for possible compat issues, right?  Again, would be good to know what actual UAs do here.

Comment 2 Simon Pieters 2015-03-06 08:16:24 UTC

The spec is intended to match IE exactly for the ASCII range. Presto implemented the spec.

It seems Blink and WebKit parses more like Gecko.

http://software.hixie.ch/utilities/js/live-dom-viewer/saved/3439

Also see https://lists.w3.org/Archives/Public/public-html/2009Jan/0086.html for earlier compat analysis.

I suppose we could change the spec to be more like Gecko/Blink/WebKit, but at least when I studied this for Presto's implementation it seemed like IE's parsing worked better for actual Web content.

Comment 3 Travis Leithead [MSFT] 2015-03-27 20:43:22 UTC

Only IE has this behavior (and is thus compliant with the wording in the spec), test page: http://jsfiddle.net/6mr5s457/

Chrome/FF’s behavior is noncompliant on two accounts. One, they don’t implement the algorithm as specified or they’d have the same behavior as IE. Two, they are accepting floating point values even though the spec says to use integer parsing.

Even assuming we change the spec to allow coords to accept floating point values (which we should), it’s strange that Chrome/FF don’t do any normalization in this parsing – e.g. ".100" stays as such, and isn’t normalized to ".1" or "0.1".

So I recommend a few things:
1. Fix the integer list parsing algorithm so it doesn’t just drop leading decimals (instead these should become 0, for both positive and negative, in order to match the truncating behavior that other floating point numbers get in this algorithm):
http://www.w3.org/TR/html5/infrastructure.html#rules-for-parsing-a-list-of-integers

2. Rephrase coords to allow a valid list of floating point numbers. I don’t think there’s an existing algorithm for this (there’s an algorithm for floating point numbers, and a list of integers, but not a list of floating point numbers):
http://www.w3.org/TR/html5/embedded-content-0.html#attr-area-coords

3. I didn’t look very hard at the floating point number parsing algorithm, but if it doesn’t specify normalization (e.g. trim trailing 0’s and enforce a digit before the decimal) that might be worthwhile to discuss.
http://www.w3.org/TR/html5/infrastructure.html#rules-for-parsing-floating-point-number-values

4. There’s a lot of ambiguity around coords whether the requirements are for the UA or the author. Would be good to clarify so these are actually testable.
http://www.w3.org/TR/html5/embedded-content-0.html#attr-area-shape-circle

Comment 4 Simon Pieters 2015-03-30 20:37:05 UTC

(In reply to Travis Leithead [MSFT] from comment #3)
> Only IE has this behavior (and is thus compliant with the wording in the
> spec), test page: http://jsfiddle.net/6mr5s457/
> 
> Chrome/FF’s behavior is noncompliant on two accounts.  One, they don’t
> implement the algorithm as specified or they’d have the same behavior as IE.
> Two, they are accepting floating point values even though the spec says to
> use integer parsing.
> 
> Even assuming we change the spec to allow coords to accept floating point
> values (which we should), it’s strange that Chrome/FF don’t do any
> normalization in this parsing – e.g. ".100" stays as such, and isn’t
> normalized to ".1" or "0.1".

Normalized where? The parsed value is not serialized anywhere per spec. .coords just reflects as a plain DOMString, i.e. same as getAttribute.

> So I recommend a few things:
> 1. Fix the integer list parsing algorithm so it doesn’t just drop leading
> decimals (instead these should become 0, for both positive and negative, in
> order to match the truncating behavior that other floating point numbers get
> in this algorithm):
> http://www.w3.org/TR/html5/infrastructure.html#rules-for-parsing-a-list-of-
> integers

OK.

> 2. Rephrase coords to allow a valid list of floating point numbers.  I don’t
> think there’s an existing algorithm for this (there’s an algorithm for
> floating point numbers, and a list of integers, but not a list of floating
> point numbers):
> http://www.w3.org/TR/html5/embedded-content-0.html#attr-area-coords

Should parsing of this list try to be compatible with IE's coords parsing (modulo the necessary differences to support floats), or Gecko/Blink? (I guess we could re-check the web compat situation.)

> 3. I didn’t look very hard at the floating point number parsing algorithm,
> but if it doesn’t specify normalization (e.g. trim trailing 0’s and enforce
> a digit before the decimal) that might be worthwhile to discuss.
> http://www.w3.org/TR/html5/infrastructure.html#rules-for-parsing-floating-
> point-number-values

Again normalization implies serialization...

> 4. There’s a lot of ambiguity around coords whether the requirements are for
> the UA or the author.  Would be good to clarify so these are actually
> testable.
> http://www.w3.org/TR/html5/embedded-content-0.html#attr-area-shape-circle

I don't see what's ambiguous? See https://html.spec.whatwg.org/multipage/infrastructure.html#conformance-classes

Comment 5 Michael[tm] Smith 2015-06-16 10:25:14 UTC

Making this a higher priority to actively seek more feedback on from implementers and webdevs.

Comment 6 Michael[tm] Smith 2015-06-16 10:26:22 UTC

Noting that this is blocked on waiting for more info from Travis.

Comment 7 Simon Pieters 2015-12-09 14:01:29 UTC

Travis, what does Edge do here now?

Comment 8 Travis Leithead [MSFT] 2015-12-09 18:35:55 UTC

(In reply to Simon Pieters from comment #7)
> Travis, what does Edge do here now?

Just tested in my latest Edge build, and its behavior is unchanged from comment #1. I'm not a fan of the behavior. Seems like Edge should change to match Gecko/Chrome

Comment 9 Simon Pieters 2016-01-13 22:58:26 UTC

Blink's coords parsing is at https://code.google.com/p/chromium/codesearch#chromium/src/third_party/WebKit/Source/platform/Length.cpp&sq=package:chromium&type=cs&l=71&rcl=1452688339

it seems to replace all characters other than [0-9\.-] with space and then split on spaces, e.g. coords="0x0x,x,10 10" is parsed the same as coords="0,0,10,10".

Gecko's coords parsing is described in https://lists.w3.org/Archives/Public/public-html/2009Jan/0086.html . Jonas notes that they have some bugs filed e.g. for not ignoring a leading comma.

In webdevdata I find some interesting cases but in particular this from babyneo.de:

<map name='header1'>
  <area shape="rect" alt="Zur Startseite" coords="='69,8,153,86' " href="http://www.babyneo.de/" title="Zur Startseite" >
  <area shape="rect" alt="Zur Startseite" coords="='5,85,223,140' " href="http://www.babyneo.de/" title="Zur Startseite" >
  <area shape="rect" alt="Warenkorb anzeigen" coords="='824,49,950,74'  " href="http://www.babyneo.de/shopping_cart.php" title="Warenkorb anzeigen" >
</map>

In Gecko the first value of each get parsed into 0, but IE/Blink/WebKit/Presto parse into the given number ignoring the leading garbage, which appears to more closely match what was intended.

I didn't see any using ; in coords. But some with consecutive commas or leading/trailing garbage.

I implemented the spec's algorithm in JS and wrote a new different implementation [1], and compared the output of real-world coords values [2], and checked the result against the some of the actual pages to see what would be an improvement in terms of web compat. I believe it's somewhat of a mix of what all browsers do give the best results, and I ended up with the following:

function newCoords(input) {
  var numbers = [];
  // split
  var tokens = input.split(/[\s,]+/);
  // for each token in tokens
  for (var i = 0; i < tokens.length; ++i) {
    var token = tokens[i];
    // replace garbage with spaces
    token = token.replace(/[^\d\.-]/g, ' ');
    // parse as float; add to numbers
    numbers.push(parseFloat(token, 10) || 0);
  }
  // return numbers
  return numbers;
}

[1] https://gist.github.com/zcorpan/baa697e081a3e1aa5da0
[2] https://gist.github.com/zcorpan/37050c2b556c5d0b5b88

Comment 10 Simon Pieters 2016-01-14 08:05:23 UTC

Forgot about ignoring leading commas:

function newCoords(input) {
  var numbers = [];
  // trim leading separators
  input = input.replace(/^[\s,]+/, '');
  // split
  var tokens = input.split(/[\s,]+/);
  // for each token in tokens
  for (var i = 0; i < tokens.length; ++i) {
    var token = tokens[i];
    // replace garbage with spaces
    token = token.replace(/[^\d\.-]/g, ' ');
    // parse as float; add to numbers
    numbers.push(parseFloat(token, 10) || 0);
  }
  // return numbers
  return numbers;
}

\s here should be HTML's "space characters".

I could go either way with ; as separator.

Maybe also support scientific notation? It would be easy to trim only leading garbage in each token, and let HTML's float parser deal with trailing garbage.

Comment 11 Simon Pieters 2016-01-14 09:28:38 UTC

https://github.com/whatwg/html/pull/514

Comment 12 Simon Pieters 2016-01-15 14:01:08 UTC

Travis, I filed https://connect.microsoft.com/IE/feedbackdetail/view/2245506/implement-area-coords-parsing-per-spec FYI

Comment 13 Arron Eicholz 2016-04-27 18:25:13 UTC

HTML5.1 Bugzilla Bug Triage: Fixed in the latest draft

If this resolution is not satisfactory, please copy the relevant bug details/proposal into a new issue at the W3C HTML5 Issue tracker: https://github.com/w3c/html/issues/new where it will be re-triaged. Thanks!

Comment 14 Simon Pieters 2016-04-29 13:20:54 UTC

Can you point to the commit?

Comment 15 Travis Leithead [MSFT] 2016-04-29 14:56:35 UTC

Apologies--double checked, and this does **not** appear to be in the latest draft. Filed: https://github.com/w3c/html/issues/319 to track the integration.

Changed resolution to MOVED.