This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 13173 - WF2: <input type="url"> should accept URLs with protocol omitted
Summary: WF2: <input type="url"> should accept URLs with protocol omitted
Status: RESOLVED WONTFIX
Alias: None
Product: HTML WG
Classification: Unclassified
Component: LC1 HTML5 spec (show other bugs)
Version: unspecified
Hardware: All All
: P3 enhancement
Target Milestone: ---
Assignee: contributor
QA Contact: HTML WG Bugzilla archive list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-07-07 15:47 UTC by Marat Tanalin | tanalin.com
Modified: 2011-08-17 04:01 UTC (History)
7 users (show)

See Also:


Attachments

Description Marat Tanalin | tanalin.com 2011-07-07 15:47:26 UTC
<input type="url"> should accept URLs with protocol omitted

http://dev.w3.org/html5/spec/states-of-the-type-attribute.html#url-state

Often, users on websites are allowed to input URL with no protocol. E.g., "example.com" instead of full "http://example.com". ("http://" protocol is assumed in such cases.)

This is important for improving usability.

Even browsers tend to not display protocol in location bar due to its redundancy.

Currently, <input type="url"> forces user to always input protocol explicitly, thus usability of such form fields is low.

So, it makes sense to allow entering URL without protocol in <input type="url">. Otherwise, this input type will be just ignored by web-developers in real practice, and regular <input type="text"> will be used instead. For example, I personally am using <input type="email"> in my blog, but not <input type="url"> -- exactly for this reason.

(It's unlikely that <input type="url"> is invented to be ignored.)

Thanks.
Comment 1 Aryeh Gregor 2011-07-07 17:28:03 UTC
The spec doesn't require particular UI for the new input types.  Browsers are allowed to add protocols to user input if they want, as long as the value visible to scripts and the value actually submitted are absolute URLs:

"""
User agents may allow the user to set the value to a string that is not a valid absolute URL, but may also or instead automatically escape characters entered by the user so that the value is always a valid absolute URL (even if that isn't the actual value seen and edited by the user in the interface).
"""
http://www.whatwg.org/specs/web-apps/current-work/multipage/states-of-the-type-attribute.html#url-state

Here as in several other cases, it might be a good idea to suggest some ways browsers might want to convert user input to a URL.  But the philosophy so far has been that we don't want to *require* that browsers do specific transformations on user input, if those transformations aren't visible to script.

Bug 11579 is a similar idea.
Comment 2 Marat Tanalin | tanalin.com 2011-07-07 17:34:36 UTC
(In reply to comment #1)

> it might be a good idea to suggest some ways
> browsers might want to convert user input to a URL.

Idea of my report is that <input type="url"> should _accept_ URLs without protocol -- just like <input type="text"> -- and send this URL exactly as it has been typed -- _without_ any transformations.
Comment 3 Aryeh Gregor 2011-07-07 21:37:20 UTC
That's a bad idea.  It makes it harder for authors to deal with the URL -- it will no longer resolve using any normal means.  The format of the URL submitted to the server should be consistent, so that it's as easy to use as possible.
Comment 4 Marat Tanalin | tanalin.com 2011-07-07 23:42:14 UTC
(In reply to comment #3)
> That's a bad idea.  It makes it harder for authors to deal with the URL -- it
> will no longer resolve using any normal means.  The format of the URL submitted
> to the server should be consistent, so that it's as easy to use as possible.

All we know that client-side validation is solely for user, not for developer. Client-side validation just makes user experience better by providing fast validation-error messages without need for page reloading and waiting for server response; such validation does not ensure data that server receives is actually correct. Data that goes from browser should be revalidated on server-side anyway according to good practices as for security.

To make <input type="url"> _usable_ for user, there should be an _explicit_ way (understandable by user-agents) to make explicit protocol specifying optional. This is necessary since most URLs that are typed into form fields in internet are HTTP URLs, and forcing user to always specify protocol explicitly is redundant, descreases usability, and forces developer to use old <input type="text"> instead.

A possible solution is to make protocol in <input type="url"> optional by default. Another solution is to provide an attributes for setting options specific to input's type.

For example, in this case, this may be a boolean option attribute "type-protocol-required":

<input type="url" type-protocol-required>

which makes protocol specifying by user required (if it's optional by default),

or "type-protocol-optional":
<input type="url" type-protocol-optional>

which makes protocol optional (if it's required by default).

"type-" here could be a common prefix (sort of namespace like "data-") for attributes intended to store specific input type options.

A relatively acceptable alternative is to add new separate input type: "http-url" or "httpurl":
<input type="http-url">

where URL value can have only "http://" protocol -- either explicitly entered or omitted.
Comment 5 Jonas Sicking (Not reading bugmail) 2011-07-08 00:08:09 UTC
What problem would that solve? Compared to a proper implementation of
<input type="url"> which didn't require the user to type "http://"?
Comment 6 Marat Tanalin | tanalin.com 2011-07-08 22:57:29 UTC
(In reply to comment #5)
> What problem would that solve? Compared to a proper implementation of
> <input type="url"> which didn't require the user to type "http://"?

What do you mean as "that"? The bug is exactly about <input type="url"> should not require a user to type "http://".
Comment 7 Jonas Sicking (Not reading bugmail) 2011-07-08 23:43:20 UTC
Sorry, let me try to phrase it more clearly:

The problem you have raised is it's a pain to ask users to add "http://" when entering URLs.

There are two solutions suggested in this bug:

1.
Fix browsers such that when they see markup like <input type=url> they provide UI that allow the user to enter a string which does not start with "http://" but still conform to the current HTML5 drafts. I.e. HTMLInputElement.value and the submitted string still return values that start with "http://" and thus are valid URIs.

2.
What you suggest in comment 4.


What advantages that solution 2 have over solution 1?

The advantage that I can see with solution 1 is that it requires less of the page author.
Comment 8 Marat Tanalin | tanalin.com 2011-07-09 00:06:22 UTC
(In reply to comment #7)
Comment 4 contains several possible solutions. One common characteristic of them is that user should not be forced to type "http://", this protocol should be assumed by default.

Additionally, as web-developer I would prefer browsers to submit exactly string typed by user anyway (i.e., without transformations such as automatic adding "http://")--just to have full control of input data, though this is secondary thing.

Primary thing the bug is dedicated to is usability for user--he should be free to not type "http://" since this protocol is usually assumed by default.
Comment 9 Marat Tanalin | tanalin.com 2011-07-09 00:19:20 UTC
As far as I know, currently only Opera _partially_ accepts URLs without protocol by adding "http://" when <input type="url"> with incomplete URL is _blurred_.

Other browsers (at least Firefox and Chrome) do not allow URLs without protocol inside <input type="url"> at all.

This situation is very likely caused by current ambiguity in the HTML spec, and this ambiguity should be resolved.
Comment 10 Aryeh Gregor 2011-07-10 20:09:59 UTC
(In reply to comment #8)
> Comment 4 contains several possible solutions. One common characteristic of
> them is that user should not be forced to type "http://", this protocol should
> be assumed by default.

So does Jonas' solution 1.

> Additionally, as web-developer I would prefer browsers to submit exactly string
> typed by user anyway (i.e., without transformations such as automatic adding
> "http://")--just to have full control of input data, though this is secondary
> thing.

Why do you want this?  In what specific situation would you prefer to have the exact thing the user typed, instead of a cleaned-up version?

> Primary thing the bug is dedicated to is usability for user--he should be free
> to not type "http://" since this protocol is usually assumed by default.

So Jonas' solution 1 is okay with you?
Comment 11 Julian Reschke 2011-07-10 20:20:11 UTC
(In reply to comment #10)
> ...
> > Additionally, as web-developer I would prefer browsers to submit exactly string
> > typed by user anyway (i.e., without transformations such as automatic adding
> > "http://")--just to have full control of input data, though this is secondary
> > thing.
> 
> Why do you want this?  In what specific situation would you prefer to have the
> exact thing the user typed, instead of a cleaned-up version?
> ...

One reason that comes to mind is showing accurate diagnostics to the user.
Comment 12 Marat Tanalin | tanalin.com 2011-07-10 20:34:20 UTC
(In reply to comment #10)
> > Additionally, as web-developer I would prefer browsers to submit exactly string
> > typed by user anyway (i.e., without transformations such as automatic adding
> > "http://")--just to have full control of input data, though this is secondary
> > thing.
> 
> Why do you want this?  In what specific situation would you prefer to have the
> exact thing the user typed, instead of a cleaned-up version?

Quoting myself: "To have full control of input data". In development, full control is always preferable over partial control when something is unknown. Full control provides more flexibility regardless of a specific case.

> > Primary thing the bug is dedicated to is usability for user--he should be free
> > to not type "http://" since this protocol is usually assumed by default.
> 
> So Jonas' solution 1 is okay with you?

Yes, this is main idea/purpose of the bug. Though I as web-developer would prefer to have full control over input data.
Comment 13 Aryeh Gregor 2011-07-10 20:41:58 UTC
(In reply to comment #11)
> One reason that comes to mind is showing accurate diagnostics to the user.

More specifically?

(In reply to comment #12)
> Quoting myself: "To have full control of input data". In development, full
> control is always preferable over partial control when something is unknown.
> Full control provides more flexibility regardless of a specific case.

<input type=text> gives full control.  Why do you want to use <input type=url>, if you want it to be able to contain things that aren't URLs?  Something without a protocol is not a URL, and will not work in practically any scenario where a URL is needed -- e.g., <a href> or other types of links embedded in content.  The only place where it will usually behave like a URL is if you type it into a navigation bar.
Comment 14 Marat Tanalin | tanalin.com 2011-07-10 20:44:05 UTC
(In reply to comment #11)
> > Why do you want this?  In what specific situation would you prefer to have the
> > exact thing the user typed, instead of a cleaned-up version?
> > ...
> 
> One reason that comes to mind is showing accurate diagnostics to the user.

Yes, for example, web-developer may show dynamic JS-tip like "You have not entered a protocol, so "http://" is currenly assumed".

Web-developer should exactly know what is entered into form field just for flexibility without any browser's disservice such as unauthorized transformation of user input.
Comment 15 Julian Reschke 2011-07-10 20:51:42 UTC
(In reply to comment #13)
> (In reply to comment #11)
> > One reason that comes to mind is showing accurate diagnostics to the user.
> 
> More specifically?

If you do not know that the user typed, you can't tell him/her what was wrong.

(Yes, that's a general problem with client-side validation when what the user types is not what's being sent to the server)

> (In reply to comment #12)
> > Quoting myself: "To have full control of input data". In development, full
> > control is always preferable over partial control when something is unknown.
> > Full control provides more flexibility regardless of a specific case.
> 
> <input type=text> gives full control.  Why do you want to use <input type=url>,
> if you want it to be able to contain things that aren't URLs?  Something
> without a protocol is not a URL, and will not work in practically any scenario
> where a URL is needed -- e.g., <a href> or other types of links embedded in
> content.  The only place where it will usually behave like a URL is if you type
> it into a navigation bar.

Not true. <a href> takes a references, not a what the HTML spec calls a "URL". So, yes, it will absolutely work with something like "foo.html", and thus this input type doesn't help at all for inserting hrefs.
Comment 16 Marat Tanalin | tanalin.com 2011-07-10 21:01:42 UTC
(In reply to comment #13)
> (In reply to comment #11)
> > One reason that comes to mind is showing accurate diagnostics to the user.
> 
> More specifically?

See comment 14.

> (In reply to comment #12)
> > Quoting myself: "To have full control of input data". In development, full
> > control is always preferable over partial control when something is unknown.
> > Full control provides more flexibility regardless of a specific case.
> 
> <input type=text> gives full control.

Thats why I'm currenly ignoring <input type="url"> completely.

> Why do you want to use <input type=url>,
> if you want it to be able to contain things that aren't URLs?

There is big difference between any arbitrary string and URL with just protocol omitted and assumed to be most widely used "http://" one. HTTP protocol is default in most cases (for example, most websites has HTTP protocol like http://example.com/, not HTTPS or, say, FTP).

> The only place where it will usually behave like a URL is if you type
> it into a navigation bar.

URL entered by user is not to _behave_ like a URL. It is just input string that comforms to certain format.

By the way, to make it easier for client-side scripting to get full URL (with protocol included regardless of whether it is entered by user), there may be a special JavaScript method of the <input type="url"> element--for example, getNormalizedUrl() or getFullUrl() (this is much different from and better than situation when browser is transforming input.value itself).
Comment 17 Marat Tanalin | tanalin.com 2011-07-11 01:14:11 UTC
To summarize just in case:

1. Protocol in <input type="url"> should be optional. E.g., "example.com" should be considered valid value, user should not be forced to type "http://example.com". The behavior may be controlled by using a boolean attribute such as type-protocol-required or type-protocol-optional (see comment 4). (type-protocol-optional may be more backwards-compatible with existing HTML5 implementations since most of them require protocol to be typed explicitly.)

2. It is desirable (though not required) that browser submits (as well as returns with JavaScript) exact string value as typed by user, as this provides more flexibility for web-developer.

3. If browser autocorrects URL before submit (by autoadding protocol if user has omitted it), then default protocol should be "http://".
Comment 18 Marat Tanalin | tanalin.com 2011-07-11 01:24:18 UTC
And even more briefly:
compared to current implementations, it would be enough to just allow protocol in <input type="url"> to be omitted, without need for any transformations (such as autoadding "http://" before form is submitted) of string value typed by user.
Comment 19 Aryeh Gregor 2011-07-11 17:13:50 UTC
(In reply to comment #14)
> Yes, for example, web-developer may show dynamic JS-tip like "You have not
> entered a protocol, so "http://" is currenly assumed".

Why would you want to show the user that?  Are there existing applications that show such tooltips, or other evidence of author demand?  In what specific real-world case would such a tooltip help the user?

> Web-developer should exactly know what is entered into form field just for
> flexibility without any browser's disservice such as unauthorized
> transformation of user input.

There's always a tradeoff between flexibility and utility.  If we make a feature handle more use-cases, it will become less suited to each use-case it does handle.  <input type=url> is meant to be useful for a limited range of specific use-cases, not a general-purpose tool for handling any possible kind of URL input for any conceivable purpose.  It's meant to handle 80% of use-cases well, not all use-cases passably.  If you want complete flexibility, again, use <input type=text>.

So requests that input types should be more flexible on general principle miss the point.  You need to give very specific real-world use-cases if you expect to get changes made.  "Real-world" here means not just conceivable or theoretical, it means use-cases where it's clear (preferably provable) that a significant percentage of authors would want the functionality.

(In reply to comment #15)
> (In reply to comment #13)
> > More specifically?
> 
> If you do not know that the user typed, you can't tell him/her what was wrong.

That's not "specific".  I'm asking for specific real-world use-cases.  That means preferably actual existing websites that accept URL input and would be able to switch to <input type=url> except for this problem.  Failing that, you need to at least sketch a specific scenario and argue convincingly that it's likely to be common, even if you can't find actual occurrences.

> Not true. <a href> takes a references, not a what the HTML spec calls a "URL".
> So, yes, it will absolutely work with something like "foo.html", and thus this
> input type doesn't help at all for inserting hrefs.

<input type=url> is of course not useful for the general case of inserting <a href=>.  It's intended for typical cases where you're supposed to enter a URL in a form, like "Home page" in a profile.  The user might enter "example.com", expecting it to work the same as "http://example.com".

If the browser only ever submits a valid absolute URL, the author can just output it into <a href=> without worrying about anything except XSS.  If the browser submits "example.com" when the user enters that, you don't want to output <a href=example.com>, since it will have a totally incorrect effect.  Thus you'll need to add URL manipulation logic into your application, which is the kind of complexity this input types are supposed to remove.

(Of course, you still have to deal with untransformed user input for the near term anyway, when old browsers are still around.  But we should be planning for the future here.  Eventually, authors should be able to escape input to avoid XSS and not have to worry about other invalidity on the server side.  "Eventually" could be soon in the presence of JS shims.)
Comment 20 Marat Tanalin | tanalin.com 2011-07-11 19:23:50 UTC
(In reply to comment #19)
> > Yes, for example, web-developer may show dynamic JS-tip like "You have not
> > entered a protocol, so "http://" is currenly assumed".
> 
> Why would you want to show the user that?  Are there existing applications that
> show such tooltips, or other evidence of author demand?  In what specific
> real-world case would such a tooltip help the user?

For example, URL in web-interface (this may be "URL" field in bugzilla's bug-reporting form) is allowed to not be http one ("http://example.com", "https://example.com", "ftp://example.com" are all valid URLs), but http is default (since most often used) protocol in the interface and therefore is allowed to be omitted (URL can be just "example.com"). Not all users make a distinction between "https://example.com" and "http://example.com", so a tip that clarifies this difference is quite likely to be present in usable interface.

> > Web-developer should exactly know what is entered into form field just for
> > flexibility without any browser's disservice such as unauthorized
> > transformation of user input.
> 
> If you want complete flexibility, again, use <input type=text>.

When _complete_ flexibility is needed, then <input type="text"> is of course most suitable. But if we need to just free user from typing redundant protocol (while still dealing with URL as a whole and native URL validation in particular), then _semantic_ <input type="url"> element (that semantically _means_ that string is URL) is more suitable.

All we need here is to make protocol optional. No another changes or additions to browser's behavior and/or UI are needed. Very simple to describe in the spec, and very easy to implement.

There are no form fields types which values are transformed by browser before submit, and there is no need for <input type="url"> to be different.

But again, even if browsers will add "http://" if protocol is omitted in user input, then we at least could begin using <input type="url"> in real world without harm for usability. Will browsers add "http://" before submit or not, this is quite secondary thing.

Additionally, I personally would prefer that, if browsers will autoadd "http://", then this would be added only to _data_ before its sending, _not_ inserted into field itself immediately after, say, the field blurring.

> If the browser only ever submits a valid absolute URL, the author can just
> output it into <a href=> without worrying about anything except XSS.

No, author (well, _good_ author) do not. Browser validation at all does not guarantee that data sent to server is actually valid (and does not prevent XSS in particular) --just because a hacker can send invalid data directly to server with not use of browser at all. Input data should be validated on server-side ANYWAY.

Purpose of browser validation is ONLY to make USER experience better by providing _fast_ validation messages without need for page reload. Browser validation is NOT a security measure at all, considering contrary is mistake.
Comment 21 Michael[tm] Smith 2011-08-04 05:36:08 UTC
mass-move component to LC1
Comment 22 Ian 'Hixie' Hickson 2011-08-17 04:01:53 UTC
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Rejected
Change Description: no spec change
Rationale: 

<input type="url"> can already accept URLs with protocol omitted, it's up to the UA.

A UA can in fact use whatever UI it wants here. It could have a UI consisting of a large full-screen selection of the user's three most-visited sites, and only allow the user to select one of those three sites' URLs, without the user ever seeing the URL, only ever seeing screenshots of those sites. It could require that the site dictate the URL character by character. It could prevent the user from giving http:// URLs altogether, always forcing the domain and path given by the user to be prefixed by https://. It could provide a complicated multi-segment editor where you get to select the scheme by drop-down, then type in the domain, then select the port using a spinner control. It could allow you to select the URL only by asking you to type in keywords which it then uses to perform Bing searches from which it uses the URL of the first result. It could require you to mime the description of the page whose URL you want to give. It could display pretty icons for the scheme and not show it at all.

If you want something different  for example, if you want the browser to just send to the server the exact string that the user typed  then you should not use type=url, you should use type=text.

Note that nothing requires that the UA show to the user that it is automatically prefixing an incomplete URL with "http://". Indeed, nothing requires that the UA show the URL to the user at all. The UA could just always show "http://foo.example.com/" and never show the user's input to the user, instead requiring the user to type blindly.