Re: Review of 22 Jan editor's draft from Thomas Roessler on 2008-01-29 (public-appformats@w3.org from January 2008)

From: Thomas Roessler <tlr@w3.org>
Date: Wed, 30 Jan 2008 00:46:21 +0100
To: Anne van Kesteren <annevk@opera.com>
Cc: "WAF WG (public)" <public-appformats@w3.org>
Message-ID: <20080129234621.GH3549@iCoaster.does-not-exist.org>
On 2008-01-27 15:38:48 +0100, Anne van Kesteren wrote:

> Many thanks for your comments! Some inline responses below with
> questions for you and the rest of the WG...

I'll mostly respond to your questions in this message.  I have not yet
reviewed the updated editor's draft.

>> - There's a lot of very detailed stuff about white space
>>   separated lists going on in section 2; I'd rather see this
>>   dropped, and grammars and useful language used closer to the
>>   parsing steps.

> I rather not try to change this at this point at the risk of
> introducing errors.

I disagree.  The over-precise description that's currently in there
probably has a *greater* risk of leading to errors than a more
abstract specification style.

>> - The security considerations should ideally be a discussion of
>>   security effects, i.e., "can trigger GET, and here's why this
>>   is harmless"; "we care about POST, because".  Instead,
>>   there's a lot of normative material clumped together in this
>>   section that would better go to the places where actual
>>   processing is described.

> I don't really see how. The actual security implications of the
> processing defined by this specification is defined in the
> processing sections already.

The clauses that were - in the 22 Jan version - listed in the bullet
points under 3 are, to my knowledge, not repeated in the processing
model.

Please don't mix the processing model into the security
considerations.

>> - "Authors sharing content with domains that are on shared hosting
>>   environments" rather misses the point by just talking about ports:
>>   Namely, that -- because we assume that the protocol / technology
>>   that hosts the access-control framework uses a same-origin policy
>>   -- authorizations can only be given with a granularity of origins.
>>   Anything below that is futile.
>
> Origin includes ports (and schemes).

We're talking past each other here.  You are calling out a specific
way in which policy authors may extend authorizations to more
origins than intended.  I was concerned about a section that should
explain the limitations of what an origin is.

>> - "Authors are encouraged to check the Referer-Root HTTP header" --
>>   this should be somewhere in the processing model, not a side
>>   remark in the security considerations.  It *is* an additional
>>   policy enforcement point, and should be called out clearly.

> The processing model is not for authors.

Policy authors *must* be able to understand the processing model --
otherwise, they won't understand the effects that their policies
have, or how to deploy them.

(However, your statement explains the current style, and is rather
worrisome.)

>> - The design seems a bit inconsistent about IDNs: The syntax permits
>>   them, but HTTP doesn'tl the latter is called out in a note.  I'd
>>   rather see that done consistently.  When speaking about IDNs, it
>>   might be useful to adapt the A-label and U-label terminology from
>>   this I-D:
>>  http://tools.ietf.org/html/draft-klensin-idnabis-issues-05

> Do you have any concrete suggestions?

Have you actually followed the reference?

>> - '"method" rule' is oddly phrased.

> I couldn't figure out where this was referring to.

What you call "method rule" isn't actually a rule, but rather a
phrase or a clause -- pick one, but don't call it rule.

>> - "The referrer root URI ..." assumes an HTTP-like URI syntax.
>>   That's not necessarily present everywhere.  Needs clean-up!

> I'm not sure what this comment is referring to. As far as I can
> tell referer root URI is defined as it should be.

"The referrer root URI is the scheme followed by ://, followed by
the domain without any trailing ...." and so on -- that construction
assumes that the URI syntax in use is HTTP-like.  Not every possible
URI syntax is.

So, this needs to either be generalized, or it needs to be said that
the referrer root URI is undefined for URI schemes that different
from http, ftp, and file.

(Example: The referrer might be a URN or an about: URI. What now?)

>> - Much of the processing model is phrased in terms of forward
>>   references to generic steps.  I find this pseudo-code like
>>   configuration style extremely hard to read, and suspect that it'll
>>   make useful security review more difficult than necessary.

> Generic steps were before the actual processing model at some
> point, but a set of commenters felt that forward referencing
> would be better. I'm not inclined to revert this change.

I would rather see the processing model phrased in more natural
language than the current one.

>> - Why is the authorization request cache mandatory?

> Is there a good reason to make it optional?

The algorithm can function without it.  That suggests optional.
Also, the way the cache handling is specified is suboptimal to say
the least (see below).

>> - The authorization request cache isn't actually an authorization
>>   request cache, but an authorization decision cache.  The current
>>   name is confusing at least.

> I think the current name is consistent with "authorization
> request" and I rather keep it that way.

The specification defines authorization requests.  These requests
are *not* what is cached.  Therefore, "authorziation request cache"
is simply the wrong name for that thing.

>> - There is no discussion as to how Vary or Cache-control headers on
>>   HTTP responses that were received are handled.  How do these
>>   interact with the separate caching model specified here

> OPTIONS is not cached so those headers are not relevant.

Consider an XML document with the processing instruction that has
its own set of cache-control headers.  What assumptions are we
making about the interaction between the different caches?

>>   The current specification material around redirects looks like
>>   it's pseudo-code ripped out of context; this needs more work to be
>>   comprehensible, and a clear explanation what the expectations are
>>   for a hosting specification.  Either the processing model or the
>>   security considerations should explain very clearly what tradeoffs
>>   a hosting specification faces in specifying any behavior
>>   concerning redirects.

> What do you have in mind?

Well, simply that a discussion of how a hosting protocol deals with
these things, in generic terms, would be very useful.

>> - The access control check algorithm goes to an excrutiating level
>>   of detail, while confusing the reader.  It is probably much easier
>>   to write up how to parse the various headers into the mapping from
>>   origins to methods, and how to deal with that.

> At least for me, it wasn't.

Well, there have been several proposals to phrase this differently.

>> - Once more, we have forward references to generic material,
>>   undeclared variables used to pass around information between
>>   different sections, and a general lack of readability.  For
>>   example, temp method list isn't temporary, not introduced before
>>   its first appearance, and only specified in the "allow list check"
>>   section.

> "temp method list" is introduced well in advance.

It's not introduced in a useful way.  In particular, there is no
indication in section 5.2.1 that the "allow list check" step changes
temp method list as a side effect.  If this was code, it would be
seriously bad programming style.  Here, it's seriously bad
spec-writing style.

>> - "parse ... using a streaming XML parser" -- I'm pretty sure you
>>   don't mean to prescribe use of a streaming XML parser, but rather
>>   want to allow use of one, right?

> I do mean to prescribe the use of an XML parser. Otherwise
> results would be inconsistent with different clients picking
> different strategies.

The emphasis is on *streaming*.  Please spell out what requirements
you specifically expect such a parser to fulfill, beyond being an
XML Processor.

  http://www.w3.org/TR/xml/#dt-xml-proc

>> - In the allow list check, item 3 of the algorithm looks like it's
>>   wrong.  This actually prunes the list of methods that are added to
>>   the temp method list depending on the current request's method.

> I don't think the specification is wrong here. Could you elaborate?

The authorization decision cache is updated when an access check is
performed.  This update maps an (origin, request URI) pair to a
(method list, expiry) pair.

The method list is calculated in the access control check.

Consider this access control policy:

	allow a.com method POST PUT		(1)
	allow a.com method DELETE		(2)

An access control check is performed for a cross-origin POST from
a.com.  In this case, step 3 will cause rule (2) to be skipped.  The
authorization decision cache will therefore record (1) only.

A subsequent DELETE will *remove* the entry for POST and PUT and
cause a new access control check.  Now, the policy for DELETE will
be stored.

A subsequent PUT will *remove* the entry for DELETE, and cause a new
access control check.

In short, the handling in step 3 is inconsistent with the aim of the
cache, an actually suggests that the cache design needs some serious
re-thinking.

E.g., if a caching model needs to be specified, then it might be
more useful to use the triple (origin, request URI, method) as a key
for the cache.

>> - Having atomic steps like "set the allow access flag to true"
>>   (point 5) might be a useful technique in programming.  In
>>   English text, it doesn't actually help understand the
>>   algorithm.

> The algorithm requires careful reading, yes.

I'm sorry, but that doesn't address my comment.  Please refrain from
writing pseudo-code and write English language instead.  The
specification's suitability for review is an important piece of
specification quality.

>> - Starting at item 10 of the access item check algorithm, we go
>>   into defining how domain names are parsed and compared.  That
>>   can be said in much shorter terms by referring to terms from
>>   the relevant specs. Roughly: Origin and item are converted to
>>   ASCII.  They are compared string-insensitively, with the
>>   additional property that the leftmost label of item might be
>>   "*", and can match an arbitrary number of labels.  (Or
>>   something like this.)

> I rather not make changes unless mistakes are identified at the
> risk of introducing errors.

The algorithm given re-specifies parts of the URI and DNS
specifications, is opaque, and requires review for correctness from
the perspective of these specifications.  A high-level description
which says that the domain names are compared, with the "*"
wildcard, is simply not subject to this class of errors.

>> - The requirements tend to confuse authentication and authorization.
>>   E.g., under 1, you're really talking about deployments that base
>>   their authorization decisions exclusively on somebody being on the
>>   right side of a firewall.

> The requirements recently got changed. Is this still an issue?

Yes.

>> - In the part that talks about cross-site POST, it might be useful
>>   to speak of UPNP as a possible target.

> I'm not sure what you mean here.

http://www.google.com/search?q=upnp+xsrf

>> - I continue to disagree with requirement 3, "must be deployable to
>>   existing..."; this is highly dependent on the cricumstances of a
>>   particular deployment.  I suggest saying clearly what is really
>>   meant.  E.g., what abilities should be sufficient in order to
>>   deploy the thing -- like, ability to write to XML files.

> I'm not sure I understand.

"Must be deployable to existing, commonly used, servers without
requiring actions by the server administrator in a typical
configuration."

This is a negative, phrased in terms of an uncertain set of actions
by the server administrator.  Please replace this by a positive
statement of the capabilities that are deemed acceptable
preconditions to being able to deploy an access-control policy.

>> - requirement 4 (more "easily deploy" that I actually disagree with)
>>   only holds for XML content.  Please qualify this requirement.

> Requirement 4 is actually not about XML content specifically.

Please elaborate.

>> - req 6 could use some elaboration. The current text could be
>>   misread to say that the authorized party should be identified with
>>   resource-level granularity, which we know is a bad idea.

> This was reworded.

Actually, it wasn't.  Please address.

>> - req 9 is somewhere between an implementation requirement and a use
>>   case.  Strikes me as somewhat wierdly phrased..

> This was reworded as well.

The rewording on this one (comparing the 27 Jan version to the 22
Jan version that I had reviewed) consists of changing "wrongly" to
"incorrectly" and does *not* address the concern.

Please don't falsely claim that you have addressed a comment when
you actually haven't.

>> - req 10 effectively says "APIs for cross-site data access
>>   shouldn't differ from these for same-origin data access"; I'd
>>   suggest changing to that

> I changed this.

The real requirement still isn't usefully expressed.

Regards,
-- 
Thomas Roessler, W3C  <tlr@w3.org>
Received on Tuesday, 29 January 2008 23:46:31 UTC