Currently, the HTML spec (http://dev.w3.org/html5/spec/Overview.html#text-rendered-in-native-user-interfaces) states that "text from elements (either attribute values or the contents of elements) is expected to be rendered in a manner that honors the directionality of the element from which the text was obtained." While this is usually what one wants, there are cases that it is does not suit the placeholder attribute.
We had a similar problem with the title attribute in https://www.w3.org/Bugs/Public/show_bug.cgi?id=10818: the title's value sometimes needs to have one direction while the element needs another. With the title attribute, however, we at least had a workaround: wrap the element in a span, move the title to the span, and set the dir attribute on both elements as each needs. This does not work for placeholder because placeholder does not work on a <span>.
Two possible solutions to this problem are:
1. define a placeholderdir attribute for <input>.
2. always display the placeholder as if it had dir=auto.
The second possibility is not perfect, but at least setting the placeholder description explicitly is more easily done by prefixing it with an ‎ or ‏ than by wrapping it in LRE|RLE and PDF characters.
Why would it be garbled? dir=rtl isn't an override, just an embedding. The most it would do is change the positioning of punctuation (easily fixed with explicit embedding information) or the alignment.
I guess I don't mind if we make it dir=auto, but I really find this allergy to Unicode bidi formatting characters to be getting out of control. This is an extreme edge case, we shouldn't add an attribute for it.
> Why would it be garbled? [...] The most
> it would do is change the positioning of punctuation
Well, first of all, misplaced punctuation is already sufficiently annoying.
But it can certainly get worse. For example, let's say that I have a site named "foo", with my own set of user accounts. The account names are limited to Latin letters, numbers, periods, underscores, and dashes. I support user interfaces in several languages, including RTL ones. Since "foo" is my brand, it remains "foo" in all locales.
Where the English UI has an <input type="text" placeholder="your foo username">, an RTL one has it as <input type="text" dir="ltr" placeholder="YOUR foo USERNAME">. (I am using the convention of uppercase Latin for RTL characters here to make this example intelligible to all readers.)
Why did I make it dir="ltr"? So that a username like john.doe will not go through the stage of looking like ".john" while being typed, with the caret jumping around and/or being displayed in strange places.
To be intelligible, my placeholder has to be displayed RTL, as:
EMANRESU foo RUOY
Instead, because of the dir=ltr, it is displayed in LTR, as
RUOY foo EMANRESU
This is as intelligible as "username OOF your" would be in English.
So, I try to fix it by making my input dir="auto". It does not help, since the spec says that the value of any attribute has to be displayed in the element's *directionality*. This is either "ltr" or "rtl", never "auto". And for an empty <input> (which it has to be for the placeholder to be displayed), the dir=auto evaluates to "ltr" directionality.
> This is an extreme edge case
Not at all. My guess is that a very significant percentage of inputs is for types of data that has to be LTR, such as numeric data (e.g. phone number, age, item count) and always-ltr text data like the username above. In a well-designed RTL page, these should all be marked with dir=ltr. And once dir=auto becomes available in more browsers, most of the rest should be marked with dir=auto. In either case, the placeholder will be displayed LTR, and thus will be garbled in an RTL page if it (besides containing some RTL words):
- starts with a number, or
- ends with punctuation, or
- contains an LTR word (e.g. a brand name)
> easily fixed with explicit embedding information
There is nothing easy about using LRE/RLE + PDF for the average human being. By and large, users do not even know that they exist. They can not generate them on their keyboards, and if they could, their invisibility makes it a challenge to edit the placeholder later. And if they type them as entities, they wind up becoming discombobulated. For example, here is what "‪hello‬" looks like once I substitute actual RTL character for the "hello":
Having said all this and hopefully shown that the problem is real, I must admit that I do not know of a solution that really makes me happy.
This bug was cloned to create bug 17814 as part of operation convergence.
Silvia: assigning to you since there actually were changes made to the spec (added examples) for this.
Aharon: can you indicate if you are happy with the examples?
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If
you are satisfied with this response, please change the state of
this bug to CLOSED. If you have additional information and would
like the Editor to reconsider, please reopen this bug. If you would
like to escalate the issue to the full HTML Working Group, please
add the TrackerRequest keyword to this bug, and suggest title and
text for the Tracker Issue; or you may create a Tracker Issue
yourself, if you are able to do so. For more details, see this
Change Description: applied patch
Rationale: adopted resolution by WHATWG
(In reply to comment #5)
The examples could be made a bit more clear by adding a dir="ltr" to each of the <input> tags (as would be typically done in an RTL page). Otherwise, they are fine.
I am still unhappy about having to use LRE/RLE/PDF to indicate the overall directionality of a placeholder. An attribsdir attribute as proposed by https://www.w3.org/Bugs/Public/show_bug.cgi?id=16160 (a.k.a. https://www.w3.org/Bugs/Public/show_bug.cgi?id=17829) would be a better solution to this problem.