progress on BNF conversion

Hi,

in the spirit of making small incremental steps, I'd like to propose 
that we first fix the BNF so that it actually becomes parseable with a 
parser written for the 822/2616 BNF format.

The current issues I'm aware of are:

1) missing whitespace, such as in

    Accept-Charset = "Accept-Charset" ":"
            1#( ( charset | "*" )[ ";" "q" "=" qvalue ] )

which should be

    Accept-Charset = "Accept-Charset" ":"
            1#( ( charset | "*" ) [ ";" "q" "=" qvalue ] )

2) multi-line prose values, such as

     field-content  = <the OCTETs making up the field-value
                      and consisting of either *TEXT or combinations
                      of token, separators, and quoted-string>

3) prose values containing prose delimiters, such as

     qdtext         = <any TEXT except <">>

4) illegal characters in rule names, such as in

     http_URL = "http:" "//" host [ ":" port ] [ abs_path [ "?" query ]]

5) duplicate rule names (which are case-insensitive), such as with

    trailer        = *(entity-header CRLF)
    Trailer  = "Trailer" ":" 1#field-name

6) attempts to do something in BNF that just does not work :-):

     chunk          = chunk-size [ chunk-extension ] CRLF
                      chunk-data CRLF
     chunk-size     = 1*HEX
     last-chunk     = 1*("0") [ chunk-extension ] CRLF

     chunk-extension= *( ";" chunk-ext-name [ "=" chunk-ext-val ] )
     chunk-ext-name = token
     chunk-ext-val  = token | quoted-string
     chunk-data     = chunk-size(OCTET)

(note the chunk-date rule in the last line).

7) editorial nit: Bill's ABNF parser (BAP) prefers to see all rule names 
to be indented by the same amount; I think this is just a matter of 
editorial quality and we simply should make the indentations consistent.

Here are my proposals to fix the individual issues:

1) just fix it.

2) try to get rid of prose value; when not possible, replace with a 
shorter one and add the remaining text as BNF comment.

3) Use DQUOTE instead of <">

4) Replace "_" with "-". In some cases we currently import rules from 
other older specs in which case we can write this as:

    abs-path = <abs_path defined in ...>

5) Keep the canonical rule names for the header productions, replace the 
other ones.

6) just fix it.

7) fix the indentation.

(Would it make sense to open a separate issue for this collection of 
problems?)

Best regards, Julian

Received on Tuesday, 13 November 2007 15:35:12 UTC