This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 24769 - SVG Path BNF is ambigious
Summary: SVG Path BNF is ambigious
Status: NEW
Alias: None
Product: SVG
Classification: Unclassified
Component: Paths (show other bugs)
Version: SVG 2.0
Hardware: All All
: P2 normal
Target Milestone: Test Suite
Assignee: Doug Schepers
QA Contact: SVG Public List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-02-21 19:40 UTC by Joe Gregorio
Modified: 2014-02-24 18:21 UTC (History)
1 user (show)

See Also:


Attachments

Description Joe Gregorio 2014-02-21 19:40:00 UTC
http://www.w3.org/TR/SVG2/single-page.html#paths-PathDataBNF

There is no description in the text on how this BNF is to be used. For example, are the rules for the BNF alternatives to be parsed as 'first match wins'?

If so, then the following rules won't produce good results, as the may match part of a floating point number as an integer before matching the floating-point-constant rule:

nonnegative-number:
    integer-constant
    | floating-point-constant
number:
    sign? integer-constant
    | sign? floating-point-constant

If the BNF is intended to be 'first match wins' then these rules should be written as:

nonnegative-number:
    floating-point-constant
    | integer-constant
number:
    sign? floating-point-constant
    | sign? integer-constant

This is also true of SVG 1.1 and 1.2 Tiny.
Comment 1 Joe Gregorio 2014-02-24 18:21:54 UTC
OK, so I think part of the problem here is my use of the ambiguous terms
first-match vs longest-match. What we are really seeing here is a difference
between a PEG (parsing expression grammar) and a CFG (context free grammar).

http://en.wikipedia.org/wiki/Parsing_expression_grammar
http://en.wikipedia.org/wiki/Context-free_grammar
http://en.wikipedia.org/wiki/Comparison_of_parser_generators

The BNF supplied is a valid CFG, parsers built for CFGs do backtracking, which
is one way to handle the issue around the ambiguity in the "nonnegative-number" rule. The other way to handle it is to rearrange the grammar so that it is non-ambiguous.

From the page http://en.wikipedia.org/wiki/Parsing_expression_grammar:

"""Syntactically, PEGs also look similar to context-free grammars (CFGs), 
 but they have a different interpretation: the choice operator selects 
 the first match in PEG, while it is ambiguous in CFG."""

This isn't a theoretical problem, I am building a parser for SVG paths and as a start I copied the supplied BNF into a parser generator, unfortunately the parser generator I am using is a PEG and not a CFG parser generator:

  https://github.com/dmajda/pegjs#expression1--expression2----expressionn

I am apparently not the only one to get tripped up by this, as this parser
also has the same problem:

  https://www.npmjs.org/package/svg-path-parser

If you change the following line in their example:

    , 'A25,35 -80 1,1 450,220 Z'

to:

    , 'A25.0,35 -80 1,1 450,220 Z'

then that parser will crash on what should be a valid path, since the 25 gets 
parsed as an"integer-constant" and leaves .0 on the input, which fails to parse as anything in the grammar.

In this case I think the best solution is to rearrange the grammar slightly as suggested, as it makes it valid for both PEG and CFG. The alternative would be to add a sentence to the spec that the BNF
is a CFG and not to be used in a PEG.