RE: [widgets] Potential bug in Rule for Identifying the Media Type of a File

A little corrections to the grammar:
- removal of dot (safe-char-no-dot rule)
- allowing file name ending with a dot (file-extension rule) [Is it a correct file extension?]

file-name                 = file-name-with-extension | file-name-no-extension

file-name-with-extension  = base-name file-extension

base-name                 = *allowed-char

file-extension            = "." [1*allowed-char-no-dot]

allowed-char-no-dot       = safe-char-no-dot / utf8-char

safe-char-no-dot          = ALPHA / DIGIT / SP / "$" / "%"
                           / "'" / "-" / "_" / "@"
                           / "~" / "(" / ")" / "&" / "+"
                           / "," / "=" / "[" / "]"

file-name-no-extension    = base-name-no-ext

base-name-no-ext          = 1*allowed-char-no-dot


________________________________________
From: public-webapps-request@w3.org [public-webapps-request@w3.org] On Behalf Of Marcin Hanclik [Marcin.Hanclik@access-company.com]
Sent: Tuesday, September 29, 2009 7:15 PM
To: marcosc@opera.com; public-webapps
Subject: RE: [widgets] Potential bug in Rule for Identifying the Media Type  of a       File

Hi Marcos,

Good spot!

>>2. If file has a file-extension, attempt to match the file-extension
>>to one in the file extensions column in the file identification table.
>>If there is a match, then return the media type value. (returns
>>"image/jpeg")
I think file-extension would not be matched, but only base-name.

I think the grammar is not ambiguous with regard to which rules would be matched.
The problem is that at present in case of .jpg, there would be no file extension.
A greedy parser would only match base-name and leave file-extension empty, since it is optional.
So we need to modify the grammar to clearly specify what the extension is.
With the current grammar, there is also a problem that "." is also allowed in the file-extension as part of the allowed-char.
Therefore any parser may be confused which dot is the "." from the file-extension rule (I am not sure whether a parser can be developed at all).
And thus, file-extension has problems. I assume that file extensions do not have dots, dot is to be the delimiter.

What about modifying the ABNF to:

file-name                 = file-name-with-extension | file-name-no-extension

file-name-with-extension  = base-name file-extension

base-name                 = *allowed-char

file-extension            = "." 1*allowed-char-no-dot

allowed-char-no-dot       = safe-char-no-dot / utf8-char

safe-char-no-dot          = ALPHA / DIGIT / SP / "$" / "%"
                           / "'" / "-" / "_" / "@"
                           / "~" / "(" / ")" / "&" / "+"
                           / "," / "." / "=" / "[" / "]"

file-name-no-extension    = base-name-no-ext

base-name-no-ext          = 1*allowed-char-no-dot

This would make the base-name optional.
.jpg is a valid file name, specifically on Linux platforms.
Then, .jpg would have (only) a file extension and probably the prose of P&C would not need to be changed.

Thanks,
Marcin

Marcin Hanclik
ACCESS Systems Germany GmbH
Tel: +49-208-8290-6452  |  Fax: +49-208-8290-6465
Mobile: +49-163-8290-646
E-Mail: marcin.hanclik@access-company.com

-----Original Message-----
From: public-webapps-request@w3.org [mailto:public-webapps-request@w3.org] On Behalf Of Marcos Caceres
Sent: Tuesday, September 29, 2009 4:51 PM
To: public-webapps
Subject: [widgets] Potential bug in Rule for Identifying the Media Type of a File

Hi, I think I found another bug :(

The current ABNF for a zip relative path allows the first character of
a file name to be a ".".

So, imagine you have a file in the zip archive called ".jpg" which is
actually a text file.

In the Rule for Identifying the Media Type of a File, it reads:

1. Let file be the file to be processed. (in this case, ".jpg")
2. If file has a file-extension, attempt to match the file-extension
to one in the file extensions column in the file identification table.
If there is a match, then return the media type value. (returns
"image/jpeg")
3. If file extension is absent, the media type of a file is determined
by using the rules set forth in the [SNIFF] specification.

So, the rule has incorrectly matched the type and returns "image/jpeg".

Options:

1. Disallow "." in the base-name of a file (this means that files
named "a...b...c." will be ignored, and so are any file starting with
a ".": ".foobar").
2. Modify 2 above to say:
  " If file has a file-extension and a base-name, ... "
  And modify 3, to say "Otherwise, the media type of a file is
determined by using the rules set forth in the [SNIFF] specification."

However, because of the ambiguity caused by allowing "." in base
names, it is basically not possible to determine if the "file
extension" of a file is in fact a file extension or a base name.

Unsure how to proceed as it is likely that ".filename" type files will
end up in widget packages.... it might be safe for user agents to
ignore those files.

--
Marcos Caceres
http://datadriven.com.au


________________________________________

Access Systems Germany GmbH
Essener Strasse 5  |  D-46047 Oberhausen
HRB 13548 Amtsgericht Duisburg
Geschaeftsfuehrer: Michel Piquemal, Tomonori Watanabe, Yusuke Kanda

www.access-company.com

CONFIDENTIALITY NOTICE
This e-mail and any attachments hereto may contain information that is privileged or confidential, and is intended for use only by the
individual or entity to which it is addressed. Any disclosure, copying or distribution of the information by anyone else is strictly prohibited.
If you have received this document in error, please notify us promptly by responding to this e-mail. Thank you.

________________________________________

Access Systems Germany GmbH
Essener Strasse 5  |  D-46047 Oberhausen
HRB 13548 Amtsgericht Duisburg
Geschaeftsfuehrer: Michel Piquemal, Tomonori Watanabe, Yusuke Kanda

www.access-company.com

CONFIDENTIALITY NOTICE
This e-mail and any attachments hereto may contain information that is privileged or confidential, and is intended for use only by the
individual or entity to which it is addressed. Any disclosure, copying or distribution of the information by anyone else is strictly prohibited.
If you have received this document in error, please notify us promptly by responding to this e-mail. Thank you.

Received on Tuesday, 29 September 2009 19:10:09 UTC