Re: Web-keys (was: Re: Draft minutes of TAG teleconference of 21 January 2010) from Tyler Close on 2010-02-02 (www-tag@w3.org from February 2010)

From: Tyler Close <tyler.close@gmail.com>
Date: Mon, 1 Feb 2010 17:46:40 -0800
To: noah_mendelsohn@us.ibm.com
Cc: www-tag@w3.org
Message-ID: <5691356f1002011746ld12199ek640ac602aacc4bca@mail.gmail.com>
Hi Noah,

First, thank you for taking the time to produce a thorough response covering
your thoughts on web-keys. This looks like the beginning of the discussion
I've been hoping to have with the TAG. I believe I have excellent answers
for all of the issues raised in your email and look forward to discussing
them with you and the rest of the TAG. (Responding inline to HTML email
doesn't always work out well, so I'll top-post my reply. Please let me know
if I missed any of your arguments.)

1. We're talking about cross-domain access control

I think the most important thing to clear up is the difference between
access control within a single administrative domain versus access control
across administrative domains. The former includes what may be controlled
within a single computer, while the latter covers what can be controlled on
the Web. You've pointed out your experience consulting on a capability-based
operating system. This experience may be relevant to the single domain case,
but may be misleading for the cross-domain case. For example, in your email,
you go on at length about how the read permission I granted to the TAG
mailing list became widely propagated and how you believe this to be a
failing. It is not. It is the correct and expected behavior in the
cross-domain case. Within a single computer, it may be possible to control
the propagation of information, since the computer's operating system may be
able to control all the lines of communication. In the cross-domain case,
this control over the lines of communication does not exist and so it is not
possible for one domain to prevent another domain from sharing information,
regardless of the access-control model or mechanism. Giving users the
impression that they have such control, when such control is impossible, is
a serious logical flaw in an access control mechanism. You were quick to
realize that the web-key does not claim to prevent the impossible. That is a
positive feature of the web-key design.

Though you chose not to discuss it, I did point out in my email that I knew
I was sharing the read permission widely, that I did so voluntarily, and
knowing that I had the option of keeping it more closely held. I did keep
the corresponding write permission closely held. Though everyone has read
permission to that document, they do not have write permission. You are
unable to modify the contents of the document. It is again a positive
feature that a capability URL allowed me to easily share a document, and so
drop its confidentiality but still keep its integrity under tighter control.

In several parts of your email, you seem to believe that web-keys require
that user-agents place new limitations on how users may voluntarily share
information. This is not the case. It is not the responsibility of
user-agents to restrict a user's access to web-keys. It is only necessary
that user-agents not automatically violate their user's wishes on the
transmission of information.  For example, it would be a bad thing if my
email user-agent automatically broadcast all my email to the world, but it
should not prevent me from sending an email to everyone. I think this
constraint is in keeping with the normal expectations of users.

Simarly, this distinction between single domain versus cross-domain also
overturns your assessment of the suitability of the URI, HTTP, and TLS
specifications as the basis for a distributed object-capability protocol.
These RFCs meet the requirements needed to implement distributed
object-capabilities. This is not only my assessment, but also that of other
projects that have used URIs and TLS and even HTTP in their implementations.

2. We're talking about secure resources

The web-key is a mechanism for controlling access to HTTPS resources.
Problems applying the technique to plain HTTP resources are not relevant.
These are strawman arguments.

3. There is no defined security model for the Web

Your email argues that RFCs do not anticipate the web-key technique and that
therefore it's fine for user-agents to do things that would break the
security of a web-key. I can make the same argument against any other access
control mechanism for the Web. For example, where is the normative statement
that says a user-agent must not automatically broadcast to third-parties all
viewed representations? Without this constraint there is no confidentiality
for any Web resource. The reality of the status quo is that no security
model has been standardized. The adhoc mechanisms that exist are error
prone, damaging to the core functionality of the Web, and fragile in the
face of unbounded "innovation" in the user-agent.

The web-key paper explains how the deployed HTTPS infrastructure can better
support security with better webarch properties. As you say, we are in need
of alternatives and web-key provides a compelling one. Does it makes sense
to prohibit this technique because it is not anticipated, especially when
the RFCs have so little to say on security?

4. Novel use of the fragment.

If you're terribly concerned about violating the letter of the law for
fragment identifiers, then it is perfectly feasible for the JavaScript code
in the page to add an <A> tag to the document with an "id" attribute
containing the fragment text.

5. Dynamically constructed DOM

A web-key does not do anything out of the ordinary for an AJAX page.
Updating the DOM based on an XHR request is the essense of AJAX.

Conclusion:

You wrote: "My personal bottom line for the moment is:  it's inappropriate
to assume that clients, email systems, and the like will in general limit
the distribution of or protect the storage of URIs."

That's fine, I'm not asking for any changes there. There's only a major
problem if a user-agent acts as above without the explicit consent of the
user, essentially stealing data from the user.

You wrote: "So, those using URIs as capabilities should be responsible for
the risk that those URIs will wind up in unintended places. "

The resource host is always responsible for the risks, regardless of the
mechanism used. I'm asking that you not create new risks, or encourage them,
by cautioning against the use of this technique. There's a viable technique
here with valuable properties. Why discourage it or worse, encourage its
ruin?

Also, realize that the TAG has nothing better to offer in terms of a
security model. The "password-for-a-site + cookies + same origin policy"
model doesn't actually work for scenarios involving more than two parties.
CSRF, clickjacking and other Confused Deputy attacks aren't bugs, they're
design flaws in the security model.

Honestly, the web-key is the only viable fix for access control on the Web.
Discouraging or prohibiting the technique would be a great failure.

--Tyler

On Mon, Feb 1, 2010 at 8:04 AM, <noah_mendelsohn@us.ibm.com> wrote:

>
> Tyler:
>
> Here are some comments on web-keys and the web-key paper [1] including a
> response to your email [2].  I'll be mixing in quotes from your email and
> the paper.  I'll leave for a separate email responses to your specific
> concerns about our TAG findings and their advice on secrets in URIs.
>
> By the way, it may be of passing interest that a very long time ago, I
> spent nearly two years consulting on a capability-based operating system [3]
> at the Stanford CS department.  So, while I don't claim any expertise on
> recent developments, I did at one point have a pretty good understanding of
> the fundamental principles of capability-based computing, and some
> experience designing an actual system.
>
> *Is there a need for some new access and protection model for the Web?*
>
> A lot of the web-key paper is devoted to justifying the need for something
> other than the password-for-a-site + cookies + same origin policy that are
> in widespread use on the Web.  While I don't necessarily agree with every
> point made, I do agree with the conclusions:  those mechanisms have serious
> drawbacks, CSRF is a serious problem, fine-grained access control is
> desirable, etc.  So, looking for alternative security mechanisms is indeed
> worthwhile.
>
> *Is the Web a good foundation for a capability-based system?*
>
> The fundamental premise of the web-key work is that the Web, used
> carefully, can approximate the characteristics of a capability-based system.
>  That is, web-keys use URI's as "capabilities", which can be informally
> described as a tokens that simultaneously provide addressing of, and convey
> permission to access, a resource.
>
> The classic security model for a capability system involves using
> mechanisms of the system itself to grant or transfer capabilities in a
> protected way.  That is, capability tokens are usually managed by the
> protected kernel of the system, much as file handles are managed in Unix,
> and transmission of a capability from one user to another is mediated by
> system mechanisms that ensure capabilities are given only to those who
> should have them.  Indeed, in the purest form of capability system, the
> ability to transfer a capability is itself a capability.
>
> The RFCs that define URIs, HTTP, HTTPS and the associated http and https
> URI schemes do not, by my reading, provide such an architecture or such
> guarantees.  Unlike a classic capability-based system, in which capabilities
> are managed by a protected kernel or similar hardware/software, URIs
> retrieved from servers (e.g. links in a Web page), are returned in clear
> text to user agents that are free to copy those URIs most anywhere.
>  Transmission using HTTPS is reasonably well protected, but subsequent
> manipulation or redistribution by user agents mostly isn't.
>
> Ironically, the web-key paper makes just this point, discussing some of the
> pertinent RFC's:
>
> > Both RFC 2616 on HTTP/1.1 [HTTP] and RFC 3986 on the URI [URI]
> > provide security guidance advising against the inclusion of
> > sensitive information in a URI. The text from Section 7.5 of
> > RFC 3986 provides a good summary of the arguments presented:
>
> > "URI producers should not provide a URI that contains a
> > username or password that is intended to be secret. URIs are
> > frequently displayed by browsers, stored in clear text
> > bookmarks, and logged by user agent history and intermediary
> > applications (proxies)."
>
> The problem is, the web-key itself (the hash) is a secret, and is therefore
> subject to this advice.  The whole model for web-keys is:  use them when
> you're confident that the URIs won't wind up in unintended places.  The
> above quotes make clear that users agents are, with some important
> exceptions, not responsible for protecting the text of links that they
> receive.  While the particular secrets mentioned are usernames and
> passwords, the advice about URIs being displayed, stored in clear text,
> etc., clearly applies to all information that must be protected.  Also: RFC
> 2818 [4] is the URI scheme registration for the https scheme, and as far as
> I can see it does not mandate or even suggest limiting the transmission or
> transcription of https scheme URIs.
>
> The web-keys paper goes on to try and deal with some of the particular
> exposures by cases:  it points out that with care, browser caches can be
> emptied promptly (which only reduces but does not eliminate risk),
> shoulder-surfing isn't always a real issue, etc. The problem is that one can
> only deal with security by cases if the cases in question are known and
> bounded.  That's not true of the Web.  User agents can, without violating
> RFCs, do all sorts of problematic things we might not have even thought of.
>  Ironically, the web-key paper in an earlier section discusses just the sort
> of unbounded innovation that's perfectly consistent with pertinent RFCs, but
> problematic for the web-key security model:
>
> > Many modern browsers include an option to report each visited
> > URL to a central phishing detection service. The IE7
> > implementation of this feature first truncates the URL to omit
> > the query string. The IEblog indicates this approach was taken
> > to protect user privacy and security [phishing filter].
> > Unfortunately, this precaution is not taken in other browsers.
> > Users of these other browsers who enable online phishing
> > detection must trust that confidentiality is adequately
> > maintained by the remote service. Automatically extracting data
> > from an end-to-end encrypted communications channel and
> > transmitting it to a third party defeats the intent of the
> > encryption. Hopefully this iatrogenic security flaw can be
> > fixed in future releases of these other browsers.
>
> The paper claims that these systems represent a security "flaw";  I view
> them somewhat differently.  In particular, I'm not aware of any normative
> specification that they violate, and it's not clear that what they're doing
> represents bad practice; there's nothing in Web architecture that requires a
> centralized implementation of a user-agent.  The browsers mentioned above
> are choosing to delegate an aspect of their security logic to an off-site
> service -- seems to me that's fine if it meets their needs.
>
> In the future, other user agents may do different things in this spirit.  I
> don't think you can stamp them out "by cases";  you have to admit that the
> Web architecture does not bound, very much, what user agents do with the
> links that they find in pages retrieved from the Web.
>
> Furthermore, we can't even bound the cases of interest to links retrieved
> using HTTP or HTTPs;  URIs are normally dereferenced using HTTP or similar
> Web protocols, but they are also passed around in many other ways, including
> in filesystems, and in emails.  Ironically, the email to which I am replying
> has triggered an example of just this exposure:
>
> > The TAG understands that unguessable URLs are used for access-
> > control by many of the most popular sites on the Web. For
> > example, this email contains a Google Docs URL [1] for a
> > document I have chosen to make readable by all readers of this
> > mailing list, even those who have never used Google Docs.
>
> This says that the intention is to give access "to readers of [the
> www-tag@w3.org] mailing list" , but it so happens that the owners of the
> www-tag mailing list have chosen to archive it publicly.  Not surprisingly,
> search engines can find the email there, and so it turns out that the Google
> Doc [1] is available not just to "all readers of this mailing list", but to
> everyone on the Web.  To see this, try a Bing search for "web-key google
> docs unguessable" [5] (note, there's no reference to the mailing list in the
> query)].  You'll see that the email comes up, and of course it has the link
> to the Google Doc. Indeed, the email with the capability is available
> directly from the Bing cache [6], without going anywhere near W3C servers.
>
> Of course, by emailing the web-key to me and to others, you have also
> (atypically of more rigorous capability-based systems) delegated to us the
> ability to pass on the capability.  All I would have to do is cc: some other
> list in this email, post the link on my Web blog, etc.
>
> Again, all of this is in conformance with applicable RFCs;  there's nothing
> that should be "fixed", and it's not clear that these things can be fixed at
> this late date.
>
> *Are web-key style capabilities a bad idea?*
>
> Several commentators have pointed out, correctly, that web-keys or similar
> techniques have been deployed on the Web, and the Google Doc example
> illustrates this.  So, someone thinks they're a good idea and is getting
> value out of them.
>
> True, but we've also just shown that the semantics are, in some ways,
> fragile.  I presume that the engineers at Google and similar sites are aware
> of these limitations, and they find that web-keys provide some added
> protection anyway.  That's fine, but I don't think we can then claim that
> user agents, anti-phishing schemes, etc. are "broken" just because they are
> inconvenient for the web-key security model.
>
> I'll respond separately with more specifics regarding the Metadata In URI
> finding, but roughly what I'd suggest is:
>
> * Stick with the suggestion that one "should not" put secrets into URIs.  I
> think that's good advice, and the RFC's quoted above support it.
>
> * We could/should change the finding to indicate that, although the Web
> does not in general guarantee the confidentiality or careful management of
> link URIs, there are often circumstances in which the practical risk of
> leakage may be sufficiently low that encoding capabilites in a URI may in
> fact be a useful tradeoff.  Such implementations are to that degree
> acceptable, but the burden is on the designers to deal with associated
> risks; it is not anticipated that a large scale effort will be made to
> manage the distribution of URIs more tightly than is the case today.
>
> *Use of fragment identifiers*
>
> The web-key paper suggests:
>
> "Putting the unguessable permission key in the fragment segment produces an
> https URL that looks like: <https://www.example.com/app/#mhbqcmmva5ja3>."
>
> The normative specification of fragment identifiers in RFC 3986 [1] says:
>
> "The fragment identifier component of a URI allows indirect identification
> of a secondary resource by reference to a primary  resource and additional
> identifying information.  The identified secondary resource may be some
> portion or subset of the primary resource, some view on representations of
> the primary resource, or some other resource defined or described by those
> representations."
>
> So far, so good, I think.
>
> "The semantics of a fragment identifier are defined by the set of
> representations that might result from a retrieval action on the primary
> resource.  The fragment's format and resolution is therefore dependent on
> the media type [RFC2046] of a potentially retrieved  representation, even
> though such a retrieval is only performed if the URI is dereferenced."
>
> So, now we have to look at the media-type of the retrieved representations:
>
> "For some set of resources, all issued web-keys use the same path and
> differ only in the fragment. The representation served for the corresponding
> Request-URI is a skeleton HTML page specifying an onload event handler. When
> invoked, the onload handler extracts the key from the document.location
> provided by the DOM API. The handler then constructs a new https URL that
> includes the key as a query string argument. This new URL is made the target
> of a GET request sent using the XMLHttpRequest API. The response to this
> request is a representation of the referenced resource. "
>
> The media type registration for text/html is at [8].  The pertinent section
> says:
>
> " For documents labeled as text/html, the fragment identifier designates
> the correspondingly named element; any element may be named with the "id"
> attribute, and A, APPLET, FRAME, IFRAME, IMG and MAP elements may be named
> with a "name" attribute.  This is described in detail in [HTML40] section
> 12."
>
> I think it's fair to say that web-key's use of fragment identifiers to
> designate external documents is therefore in violation of the pertinent
> normative RFCs, so that's a concern.  I understand that Ajax applications in
> general are putting some stress on the architecture of fragment ids, and
> web-keys are somewhat in that spirit.  Indeed, the TAG has done one round of
> work to explore client-side use of fragment ids in AJAX applications;  maybe
> we should expand that to consider server-side innovations like webkeys as
> well.  Nonetheless, I believe that the use of fragids in webkeys is, at
> least for now, nonconforming to the pertinent RFCs, and in that way makes
> the Web less self-describing.
>
> Another drawback of web-key fragids is that one loses the ability to use
> fragment-ids for their intended purpose in HTML, I.e. to directly reference
> some element within the document.
>
> *The Dynamically constructed DOM*
>
> The suggested webkey  implementation, in which Javascript dynamically
> updates the DOM of the root document to reflect the contents of the
> dynamically retrieved particular document also represents complexity that I
> find somewhat unfortunate.   That is, the HTML document retrieved by the
> "URL that includes the key as a query string argument" is not treated
> according to the usual rules of text/html, but rather is grafted into an
> existing DOM.  I can't point to specific breakage that results from this,
> but it's at least a bit troubling that this document is not processed in the
> usual manner.  I can't quite decide whether I think this is a serious
> concern.
>
> *Conclusions*
>
> I've tried to be somewhat careful and specific in setting out these
> concerns.  I hope that won't be viewed as inflammatory or piling on.  As I
> said at the top of my note, web-keys do address a real need.  The widely
> used mechanisms on the Web do have problems.  My personal bottom line for
> the moment is:  it's inappropriate to assume that clients, email systems,
> and the like will in general limit the distribution of or protect the
> storage of URIs.  The normative RFCs don't require that they do (Tyler does
> point to one admonition in RFC 2616, but it covers a quite narrow case).
>  So, those using URIs as capabilities should be responsible for the risk
> that those URIs will wind up in unintended places.  I think the advice in
> the Metadata finding that one "should not" put secrets in URIs appropriately
> reflects these risks, and the advice in the normative RFCs.  I would have no
> objection to keeping the existing good practice note, but adding a section
> indicating that in practice some systems do get value out of putting secrets
> into URIs, but that the burden is on those systems to do so only when the
> risks are deemed acceptable for the purpose.
>
> I also remain somewhat troubled by the fact that the web-key trick of using
> the fragment id seems to be in violation of the pertinent specifications and
> loss of use of the fragid for its intended purpose;  I'm also at least a bit
> concerned about the dynamic construction of the DOM.
>
> I hope these comments are useful.  Again, I'm speaking for myself;  I don't
> believe that the TAG as a whole has taken a position on web-keys.
>
> Noah
>
> [1] http://waterken.sf.net/web-key
> [2] http://lists.w3.org/Archives/Public/www-tag/2010Jan/0100.html
> [3]
> ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/83/945/CS-TR-83-945.pdf
> [4] http://tools.ietf.org/html/rfc2818
> [5]
> http://www.bing.com/search?q=web-key+google+docs+unguessable&form=OSDSRC
> [6]
> http://cc.bingj.com/cache.aspx?q=%22web+key%22+google+docs+unguessable&d=346695414695&mkt=en-US&setlang=en-US&w=3723de06,f1f4d797
> [7] http://tools.ietf.org/html/rfc3986#section-3.5
> [8] http://www.rfc-editor.org/rfc/rfc2854.txt
>
> --------------------------------------
> Noah Mendelsohn
> IBM Corporation
> One Rogers Street
> Cambridge, MA 02142
> 1-617-693-4036
> --------------------------------------
>
>
>
>
>
>
>   *Tyler Close <tyler.close@gmail.com>*
> Sent by: www-tag-request@w3.org
>
> 01/23/2010 05:24 AM
>
>         To:        noah_mendelsohn@us.ibm.com
>         cc:        www-tag@w3.org
>         Subject:        Re: Draft minutes of TAG teleconference of 21
> January 2010
>
>
> I understand that sometimes meaning is lost in email and especially in
> meeting transcripts, so I just want to check that I understand the
> current status of the discussion on ACTION-278.
>
> 1. The TAG does not dispute any of the arguments made in my web-key
> paper <http://waterken.sf.net/web-key>.
>
> 2. The TAG understands that unguessable URLs are used for
> access-control by many of the most popular sites on the Web. For
> example, this email contains a Google Docs URL [1] for a document I
> have chosen to make readable by all readers of this mailing list, even
> those who have never used Google Docs. Had I not so chosen, these
> readers would not have access and I could have shared access with a
> smaller group of people, or no one at all.
>
> 3. Some members of the TAG believe that an unguessable https URL is a
> "password in the clear", but that sending someone a URL and a separate
> password to type into the web page is not a "password in the clear".
>
> 4. The TAG is currently sticking to its finding that prohibits use of
> the web-key technique because Noah Mendelsohn says: "I don't like
> that". There are no other substantive arguments that I could attempt
> to refute.
>
> 5. The TAG does not dispute my argument that the current finding is
> self-contradictory.
>
> I'm hoping there is some significant nuance I have missed. If so,
> please point out which of the above statements is false and exactly
> why, so that I can engage with that part of the discussion.
>
> --Tyler
>
>
>


-- 
"Waterken News: Capability security on the Web"
http://waterken.sourceforge.net/recent.html
Received on Tuesday, 2 February 2010 01:47:14 UTC