This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 23946 - Lift the ban on query parts in “blob:” URIs
Summary: Lift the ban on query parts in “blob:” URIs
Status: RESOLVED FIXED
Alias: None
Product: WebAppsWG
Classification: Unclassified
Component: File API (show other bugs)
Version: unspecified
Hardware: PC Linux
: P2 normal
Target Milestone: ---
Assignee: Arun
QA Contact: public-webapps-bugzilla
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-11-30 22:08 UTC by Manuel Strehl
Modified: 2014-02-27 22:49 UTC (History)
5 users (show)

See Also:


Attachments

Description Manuel Strehl 2013-11-30 22:08:54 UTC
The blob:-URI scheme explicitly forbids query parts. RFC 3986 on the other hand explicitly allows query parts, <http://tools.ietf.org/html/rfc3986#section-3.4>. (The “(if any)” refers to the existance of a naming authority in my reading.)

So the forbidding of query parts makes blob: a non-URI in the sense of that RFC. Are there any hard reasons to forbid query parts? Just from reading the spec, that move seems arbitrary. There might be useful applications, though, e.g., parametrising dynamically generated JS in a blob.
Comment 1 Masatoshi Kimura 2013-12-01 02:18:42 UTC
Banning query part doesn't contradict with the generic URI syntax. The generic syntax only defines the syntax. It doesn't affect the scheme-specific semantics. 

Blob URLs containing a query part is invalid just like HTTP URLs containing a userinfo part. Although the fragment part is forbidden to have the scheme-specific semantics [1], the query part is not.

BTW File API refers the URL Standard [2] rather than RFC 3986, although the result is the same regarding the blob URL parsing.

[1] http://tools.ietf.org/html/rfc3986#section-3.5
    > Fragment identifier semantics are independent of the
    > URI scheme and thus cannot be redefined by scheme specifications.
[2] http://url.spec.whatwg.org/
Comment 2 Boris Zbarsky 2013-12-01 02:19:52 UTC
> So the forbidding of query parts makes blob: a non-URI in the sense of that RFC.

It's not clear to me why.  RFC 3986 section 1.1.1 says:

   As such, the URI syntax is a federated and extensible naming
   system wherein each scheme's specification may further restrict the
   syntax and semantics of identifiers using that scheme.

so it seems ok to have a scheme that uses the RFC 3986 parser and then says that only URIs which do not have a query section are valid.

As a side note, RFC 3986 isn't compatible with how URIs are actually used on the web, so browsers don't actually implement it, and RFC 2396 is much more clear that "the URI syntax does not require that the scheme-specific-part have any general structure or set of semantics which is common among all URI" (RFC 2396, section 3).

The argument that there might be use cases for allowing a query here is much more interesting than an argument based on RFC 3986.
Comment 3 Manuel Strehl 2013-12-01 06:10:43 UTC
The problem with use cases is often, that they might to rise only after implementation was completed... One possible future use however that I just remembered: The SVG WG thought about parametrized SVG in 2010, where the query part determines part of the layout of an SVG image. Something along

    image.svg?mycolor=red

    <rect fill="{$mycolor}"/>

or similar. I haven't heard about writing that to a spec since, but this or other uses of static content listening or depending on the query may come to be in the future. (The probability rises the more we slid into a server-less time.) Then blob URLs are all of a sudden the IE6 of URI schemes, so to say.
Comment 4 Manuel Strehl 2013-12-02 11:59:28 UTC
What bugs me most here is the difference to the data: URI scheme (which is, by the way, under-speced in this context). I'd love to see both schemes at par concerning queries and fragments, since the circumstances of their appearances (especially dynamic creation / manipulation in JS) are often somewhat similar.
Comment 5 Boris Zbarsky 2013-12-02 12:30:46 UTC
> What bugs me most here is the difference to the data: URI scheme 

data: doesn't support a query section.  Simple testcase:

  data:text/html,<script>alert(location.search)</script>aaa?bbb

Shows "aaa?bbb" as the text in browsers and alerts empty string.
Comment 6 Arun 2013-12-02 16:11:24 UTC
(In reply to Manuel Strehl from comment #3)
> The problem with use cases is often, that they might to rise only after
> implementation was completed... One possible future use however that I just
> remembered: The SVG WG thought about parametrized SVG in 2010, where the
> query part determines part of the layout of an SVG image. Something along
> 
>     image.svg?mycolor=red
> 
>     <rect fill="{$mycolor}"/>
> 
> or similar. I haven't heard about writing that to a spec since, but this or
> other uses of static content listening or depending on the query may come to
> be in the future. (The probability rises the more we slid into a server-less
> time.) Then blob URLs are all of a sudden the IE6 of URI schemes, so to say.


The use case above is still hypothetical at best.  If a compelling use case comes up for which there's active demand, we might be able to revisit this.  Bear in mind, however, that the majority use case for blob: URLs (not counting mediastream: and friends) is for files which user agents have a temporary (readonly) reference to.
Comment 7 Arun 2013-12-03 16:15:14 UTC
In general, with blob: URLs there has been a trade-off between including the entirety of the range of HTTP behavior (by including more request-response options, for example) and keeping a limited protocol with tangible use cases in mind.

The argument for having a versatile protocol (e.g. adding all of HTTP's headers, or even including stuff from HTTP URLs, like query) really hinges on future uses, which haven't emerged yet.  Additionally, it comes with the risk of introducing security implications that HTTP has resolved over much use.

On balance, I prefer use-case driven bugs as a pragmatic and safe approach to expanding the blob: protocol.  We're free to introduce changes over time.  I'm marking this closed.  Please re-open or file a different bug when the need arises.
Comment 8 Anne 2013-12-03 20:57:01 UTC
(In reply to Boris Zbarsky from comment #5)
> data: doesn't support a query section.  Simple testcase:
> 
>   data:text/html,<script>alert(location.search)</script>aaa?bbb
> 
> Shows "aaa?bbb" as the text in browsers and alerts empty string.

Alerts "?bbb" in Safari and per http://url.spec.whatwg.org/

The same should happen for blob URLs in my opinion. We should only look at scheme data though when processing blob URLs and ignore the query component.
Comment 9 Arun 2013-12-03 22:25:37 UTC
(In reply to Anne from comment #8)
> (In reply to Boris Zbarsky from comment #5)
> > data: doesn't support a query section.  Simple testcase:
> > 
> >   data:text/html,<script>alert(location.search)</script>aaa?bbb
> > 
> > Shows "aaa?bbb" as the text in browsers and alerts empty string.
> 
> Alerts "?bbb" in Safari and per http://url.spec.whatwg.org/
> 
> The same should happen for blob URLs in my opinion. We should only look at
> scheme data though when processing blob URLs and ignore the query component.

If we should *ignore* query component when processing blob: why exactly is this bug reopened? What is your proposed spec. change?
Comment 10 Jonas Sicking (Not reading bugmail) 2013-12-03 23:46:17 UTC
(In reply to Anne from comment #8)
> (In reply to Boris Zbarsky from comment #5)
> > data: doesn't support a query section.  Simple testcase:
> > 
> >   data:text/html,<script>alert(location.search)</script>aaa?bbb
> > 
> > Shows "aaa?bbb" as the text in browsers and alerts empty string.
> 
> Alerts "?bbb" in Safari and per http://url.spec.whatwg.org/

That seems like a very bad idea. Treating "?" in a data URL seems surprising and a source of subtle bugs. It seems much less useful to be able to pass "search" parameters to a page by prepending "?..." to a data: URL than to be able to use '?' characters inside the URL itself.

However I think data-URLs are special (probably together with javascript URLs). I have no problem with supporting query parameters in data URLs.
Comment 11 Anne 2013-12-04 11:55:37 UTC
Jonas, did you mean to say that data URLs are special and that you have no problem with having query parameters in blob URLs?

The way HTML defines javascript URLs btw they also have a query and fragment component.

We could special case them all in the URL parser, but subtle bugs come from special cases, so I'm not sure that would be better.
Comment 12 Anne 2013-12-04 11:58:56 UTC
Arun, I don't think it makes sense to forbid the query component. I would also define createObjectURL() more clearly in the specification as emitting a serialization of a URL that has scheme "blob", scheme data {generated string}, and no query or fragment (forgot whether they have to be null or the empty string for that). If you do that a lot of the other text you currently have about blob URLs does not seem to be needed.
Comment 13 Arun 2013-12-04 16:24:55 UTC
(In reply to Anne from comment #12)
> Arun, I don't think it makes sense to forbid the query component. 

Well, if it's ignored, it doesn't make sense to have it, but I suppose we can say query components can be ignored.

>I would
> also define createObjectURL() more clearly in the specification as emitting
> a serialization of a URL that has scheme "blob", scheme data {generated
> string}, and no query or fragment (forgot whether they have to be null or
> the empty string for that). If you do that a lot of the other text you
> currently have about blob URLs does not seem to be needed.


I agree with this.
Comment 14 Jonas Sicking (Not reading bugmail) 2013-12-04 21:19:52 UTC
(In reply to Anne from comment #11)
> Jonas, did you mean to say that data URLs are special and that you have no
> problem with having query parameters in blob URLs?

Yes.

> The way HTML defines javascript URLs btw they also have a query and fragment
> component.
> 
> We could special case them all in the URL parser, but subtle bugs come from
> special cases, so I'm not sure that would be better.

Like I said, I think that treating "?" special, by treating it as query delimiter, in data URLs will lead to more surprising behavior and subtle bugs than not doing so. But it's a judgement call I agree.
Comment 15 Manuel Strehl 2013-12-04 21:58:04 UTC
> Like I said, I think that treating "?" special, by treating it as query
> delimiter, in data URLs will lead to more surprising behavior and subtle
> bugs than not doing so. But it's a judgement call I agree.
Well, the _hash_ not being parsed in data: URIs as fragment in Chrome certainly puzzled some, at least there are already three duplicates to this bug:

http://code.google.com/p/chromium/issues/detail?id=123004

I'd like to think, that query parts will be no different once there's a use case for reacting to them from within the data:-URI-referenced resource. (That is, I think many developers simply haven't experienced yet, whether '?' is or is not special in data: URIs.)

The remaining question is twofold, a fundamental and a pragmatic part:

1. How does a generic URL look like, scheme:any-stuff or scheme:hier?query#fragment? It's my understanding so far, that the first is the basic definition of an URI, which also allows, e.g., URNs like isbn:, while the second is, what URLs look like. Hence, blob-_URLs_ should handle '?' and '#' specially. Of course, if I'm misguided here, that point is invalid.

2. How do developers use the blob: URLs? E.g., will there be significant use cases, what would be the more surprising behaviour, ... With my anecdotal evidence drawn from the data: URL / fragment identifier analogy above, I'd still argue for treating '?' specially, but I haven't got more profound data to back this. Just my guts.
Comment 16 Arun 2013-12-05 22:28:32 UTC
(In reply to Anne from comment #12)
> Arun, I don't think it makes sense to forbid the query component. I would
> also define createObjectURL() more clearly in the specification as emitting
> a serialization of a URL that has scheme "blob", scheme data {generated
> string}, and no query or fragment (forgot whether they have to be null or
> the empty string for that). If you do that a lot of the other text you
> currently have about blob URLs does not seem to be needed.

OK, I've made the definitions of URL.createObjectURL and URL.createFor explicit about what they emit. The WHATWG URL specification doesn't specify that fragment and query have to be null or empty to not be present, so I simply say "no fragment" etc., in keeping with what the URL spec says -- jump in if wrong on this count.

It turns out, however, that although URL.createObjectURL and URL.createFor don't explicitly emit fragments and query components, we DO allow fragment (by popular demand and by use case):

var blobAnchorRef = URL.createFor(file) + "#applicationData";

where #applicationData could be a named element (e.g. anchor link) within an HTML file.

By explicitly allowing that, we have to state clearly what happens if a query is appended.  I've spec'd that it be ignored, which doesn't make it illegal to have one.  We can start paying attention to query in blob: URLs when we figure out what to do with them, just like we've specified what a fragment might mean in a Blob URL.  

(In reply to Jonas Sicking from comment #10)
> That seems like a very bad idea. Treating "?" in a data URL seems surprising
> and a source of subtle bugs. 

Separate bugs should be filed about data URLs.


(In reply to Manuel Strehl from comment #15)

> 
> The remaining question is twofold, a fundamental and a pragmatic part:
> 
> 1. How does a generic URL look like, scheme:any-stuff or
> scheme:hier?query#fragment? It's my understanding so far, that the first is
> the basic definition of an URI, which also allows, e.g., URNs like isbn:,
> while the second is, what URLs look like. 

I defer to http://url.spec.whatwg.org/

>Hence, blob-_URLs_ should handle
> '?' and '#' specially. Of course, if I'm misguided here, that point is
> invalid.


I *think* that we don't have to special case "#".  I think ignoring query for now isn't quite a special case.


> 
> 2. How do developers use the blob: URLs? E.g., will there be significant use
> cases, what would be the more surprising behaviour, ... With my anecdotal
> evidence drawn from the data: URL / fragment identifier analogy above, I'd
> still argue for treating '?' specially, but I haven't got more profound data
> to back this. Just my guts.


I think the lion's share of use cases will involve short-lived identifiers to file references.  When use cases emerge that really clamor for query, we can accommodate them.  Of course, if that happens, I think we should break out blob: URLs from File and deal with them properly, along with mediastream:.
Comment 17 Arun 2013-12-06 16:21:51 UTC
(Relevant spec. update to review: http://dev.w3.org/2006/webapi/FileAPI/#creating-revoking)
Comment 18 Anne 2013-12-09 14:54:06 UTC
Yes, you have the creation bit, and the fetch bit. The fetch bit for blob URLs should simply look at the "scheme data" component of the parsed URL (by the time you established it is a blob URL you already checked the "scheme" and found it saying "blob"). By just looking at "scheme data" you don't have to say anything about query (ignored) or fragment (ignored for the purposes of fetching, used by other specifications).
Comment 19 Arun 2014-01-20 19:50:59 UTC
(Filed Bug 24338 for Fetch for Blob URLs in File API, which is the "fetch" bit).
Comment 20 Arun 2014-02-27 22:49:20 UTC
I think this is fixed.

There isn't an "active ban" now.

1. The query isn't defined (http://dev.w3.org/2006/webapi/FileAPI/#DefinitionOfScheme) but isn't "banned."
2. What is emitted by the methods URL.createFor and URL.createObjectURL (http://dev.w3.org/2006/webapi/FileAPI/#createRevokeMethodsParams) and what is consumed by parse and fetch (http://dev.w3.org/2006/webapi/FileAPI/#lifeTime and http://dev.w3.org/2006/webapi/FileAPI/#requestResponseModel) have been more tightly defined.

See url.spec.whatwg.org and fetch.spec.whatwg.org for future updates on parsing and fetching blob URLs.