This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
Comment from the i18n review of: http://dev.w3.org/html5/spec/ Comment 3 At http://www.w3.org/International/reviews/html5-bidi/ Editorial/substantive: S Tracked by: AL Location in reviewed document: undefined [http://dev.w3.org/html5/spec/spec.html#contents] Comment:This is a part of the proposals made by the "Additional Requirements for Bidi in HTML" W3C First Public Working Draft. For a full description of the use cases, please see http://www.w3.org/International/docs/html-bidi-requirements/#reporting-direction [http://www.w3.org/International/docs/html-bidi-requirements/#reporting-direction] . Here is the proposal made there: Support a new attribute, tentatively named submitdir, in <input> and <textarea>. Its presence will specify that when the element is a "successful control" (i.e. its value is to be included in the form submission), then the value of the element's computed direction (at submission time) is also to be included in the submission, as an additional "successful control". (Reminder: the computed direction is the bottom-line "ltr" or "rtl" being used to display the element; it never takes on any other value. It is available as the value of the CSS direction property for the element.) The additional control's name is to be the element's control name suffixed with "_dir". If the form contains other control(s) with the same control name as the additional control, the additional control will still be submitted alongside them; it is up to the application to sort out what the different control values mean. The value of the submitdir attribute is immaterial; it would normally be an empty string (when the attribute is present without a value) or "submitdir". For example, let's assume that a dir attribute value to indicate direction estimation is "auto", and an RTL page contains the following form: <form action="foo" method="get"> <input type="text" name="mytest" dir="auto" submitdir /> </form> Then, if the user typed in the LTR value "hello", the submission URL would be "foo?mytest=hello&mytest_dir=ltr".
Why is this needed? I.e. What problem does it solve? (keeping in mind I don't know much about bidi). Firefox used to have various preferences which controlled how rtl text should be submitted, but they were largely unused by users and over time stopped working as originally intended. Eventually they were completely removed since it was unclear if they were used at all. So far no one has complained about their removal so it does indeed seem like they were unused. However these prefs were very different from what is proposed here, so it might be that the suggested feature is much more useful. Should the submitted value be affected by CSS rules that affect the direction of the control? What happens if the control itself has dir=ltr but only contains rtl text? What if it contains both rtl and ltr text?
(In reply to comment #1) > Why is this needed? I.e. What problem does it solve? (keeping in mind I don't > know much about bidi). The rationale is explained here: <http://www.w3.org/International/docs/html-bidi-requirements/#reporting-direction> But here is a typical example. Let's say I enter something like "Firefox IS A GOOD WEB BROWSER" (where uppercase text represents text in a RTL language). If the direction of the textbox is ltr, this shows up like this on the screen: Firefox BROWSER WEB GOOD A IS which is wrong. Then, I set my user agent to switch the direction of the text field (using Ctrl+Shift+X in Firefox, for example), which makes the text appear like: BROWSER WEB GOOD A IS Firefox which is what an RTL native speaker would expect. Now, I submit the form, and let's say that the server's job is to just generate some HTML to display the entered value on a web page. If the server has no way of knowing about the direction change that I made while editing, that information is effectively lost, and the resulting HTML will appear as below on the screen. Firefox BROWSER WEB GOOD A IS But with the submitdir attribute, this information will be preserved in the form submission process, and the server can generate the correct HTML code based on that. > Firefox used to have various preferences which controlled how rtl text should > be submitted, but they were largely unused by users and over time stopped > working as originally intended. Eventually they were completely removed since > it was unclear if they were used at all. So far no one has complained about > their removal so it does indeed seem like they were unused. However these prefs > were very different from what is proposed here, so it might be that the > suggested feature is much more useful. Those preferences were actually doing something different: they were used to change the *encoding* of the text between encodings with logical and visual ordering. This is not relevant here. > Should the submitted value be affected by CSS rules that affect the direction > of the control? Yes. > What happens if the control itself has dir=ltr but only contains rtl text? Because @dir=ltr maps to |direction: rtl;| in CSS, submitdir should be submitted as "ltr". Please note that in this proposal, the actual contents of the form control do not affect the value of submitdir. What > if it contains both rtl and ltr text? Same as above.
Is this bidi information only needed some of the time? Or does effectively all form submission need this information to correctly process submitted text? If it's always needed, it seems weird that an opt-in through a separate attribute is needed. It'd be great to find a solution which didn't require that. Like, should we always include information if the user has specifically modified the bidi direction? For example by adding a control character in the beginning of the value or some such? (As i understand it there are unicode characters that modify the text direction?)
(In reply to comment #3) > Is this bidi information only needed some of the time? Or does effectively all > form submission need this information to correctly process submitted text? Bidi information is only necessary for certain kinds of processing. It can affect how content is displayed, especially in a context with a different base directionality. But most processing is "direction agnostic". Lack of direction information does not invalidate the data. However, lack of the direction information does mean that the data may not be represented (rendered) correctly later, as with the example of text inclusion into an HTML page following submission. > > If it's always needed, it seems weird that an opt-in through a separate > attribute is needed. It'd be great to find a solution which didn't require > that. > > Like, should we always include information if the user has specifically > modified the bidi direction? For example by adding a control character in the > beginning of the value or some such? (As i understand it there are unicode > characters that modify the text direction?) If the user has modified the field's direction, that's usually a signal that the information is particularly important. The original direction wasn't working for the user. Including the Unicode bidi controls into the data, though, is potentially problematic. The bidi controls are "just characters" and can interfere with, for example, identity matching with a data source. For example, if I say my name is "ABCD", but the browser submits "<control>ABCD", the backing database may not find my record. Stripping and adding controls becomes complicated, since both manually inserted and automatic controls may be involved. Providing an attribute is a better solution than munging the data in my opinion.
It seems to me that this doesn't need to be an attribute, just that the browser generates the direction metadata for any field for which the user has changed direction, without having the author having to opt-in to that behavior. For scripted client-only use cases, a readonly IDL attribute can be added for <input> and <textarea>.
(In reply to comment #5) > It seems to me that this doesn't need to be an attribute, just that the browser > generates the direction metadata for any field for which the user has changed > direction, without having the author having to opt-in to that behavior. That's not a good choice, because of possible cases like this: <input type="hidden" name="foo_dir" value="bar"> <input type="text" name="foo"> This form will submit "foo_dir=bar" if the browser does not support this proposal, but if it does and the user submits the form with the direction of the second input changed, the form will submit "foo_dir=rtl" or "foo_dir=ltr".
I haven't examined the proposal in detail yet, but before I do: do we have experimental implementations or any implementors committed to supporting this? At this stage I'd really rather not add new features without a clear commitment from user agent implementors, given how close we are to LC.
Mozilla is interested in implementing this, but we do not have an experimental implementation yet.
(In reply to comment #6) > (In reply to comment #5) > > It seems to me that this doesn't need to be an attribute, just that the browser > > generates the direction metadata for any field for which the user has changed > > direction, without having the author having to opt-in to that behavior. > > That's not a good choice, because of possible cases like this: > > <input type="hidden" name="foo_dir" value="bar"> > <input type="text" name="foo"> > > This form will submit "foo_dir=bar" if the browser does not support this > proposal, but if it does and the user submits the form with the direction of > the second input changed, the form will submit "foo_dir=rtl" or "foo_dir=ltr". You could make the submitted metadata be something that the author cannot generate. (For instance, using HTTP headers.)
(In reply to comment #9) > (In reply to comment #6) > > (In reply to comment #5) > > > It seems to me that this doesn't need to be an attribute, just that the browser > > > generates the direction metadata for any field for which the user has changed > > > direction, without having the author having to opt-in to that behavior. > > > > That's not a good choice, because of possible cases like this: > > > > <input type="hidden" name="foo_dir" value="bar"> > > <input type="text" name="foo"> > > > > This form will submit "foo_dir=bar" if the browser does not support this > > proposal, but if it does and the user submits the form with the direction of > > the second input changed, the form will submit "foo_dir=rtl" or "foo_dir=ltr". > > You could make the submitted metadata be something that the author cannot > generate. (For instance, using HTTP headers.) True, but that would be really strange, as it would be different to any other type of form data that is usually submitted.
I would actually say the opposite. The directionality information is metadata about the actual form value. Currently no metadata is sent in the form of additional fields. For example filenames are not sent as a separate field, but rather as metadata within the same control.
(In reply to comment #11) > I would actually say the opposite. The directionality information is metadata > about the actual form value. Currently no metadata is sent in the form of > additional fields. For example filenames are not sent as a separate field, but > rather as metadata within the same control. We do, however, submit the click coordinates for input type=image like other form values... That being said, I don't have an strong preference either way, personally.
(In reply to comment #9) > (In reply to comment #6) > > (In reply to comment #5) > > > It seems to me that this doesn't need to be an attribute, just that the browser > > > generates the direction metadata for any field for which the user has changed > > > direction, without having the author having to opt-in to that behavior. > > > > That's not a good choice, because of possible cases like this: > > > > <input type="hidden" name="foo_dir" value="bar"> > > <input type="text" name="foo"> > > > > This form will submit "foo_dir=bar" if the browser does not support this > > proposal, but if it does and the user submits the form with the direction of > > the second input changed, the form will submit "foo_dir=rtl" or "foo_dir=ltr". > > You could make the submitted metadata be something that the author cannot > generate. (For instance, using HTTP headers.) An interesting suggestion. Would anyone care to suggest a specific header and syntax? Wouldn't the HTML spec have to include this to get interoperability?
A drawback to using an HTTP header is that in the case of GET, the metadata doesn't survive if the URL is copied and pasted to another context.
Wouldn't a better way of solving this be to submit the text with Unicode bidi formatting characters? That would mean that the bug gets automatically fixed on any site that today uses UTF-8 as the submission charset.
(In reply to comment #15) > Wouldn't a better way of solving this be to submit the text with Unicode bidi > formatting characters? That would mean that the bug gets automatically fixed on > any site that today uses UTF-8 as the submission charset. That might be ok for natural language, i.e. unformatted text. But input boxes are also used for things that do have some sort of machine-understood syntax, and the software that parses what the user has entered may not support these characters. For this reason, adding formatting characters to the control values is the last thing that I would want to happen.
Can you give an example of a field on a Web site that accepts UTF-8 submissions where if I enter such Unicode bidi formatting characters the processing on the server breaks?
(In reply to comment #17) > Can you give an example of a field on a Web site that accepts UTF-8 submissions > where if I enter such Unicode bidi formatting characters the processing on the > server breaks? Yes, this site. If I insert the LRM character <‎> at the beginning of the username when logging in, it will take it as part of the username and not let me in. I assume the same thing will happen for LRE.
Hm, yes, usernames are a good example of something where this wouldn't work. This definitely argues for an opt-in solution. However, it's still not clear to me that having multiple submission fields is a good idea. If the user has opted-in to getting this information, is there any harm at that point in "simply" making the submitted data have appropriate bidi formatting characters?
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document: http://dev.w3.org/html5/decision-policy/decision-policy.html Status: Did Not Understand Request Change Description: no spec change Rationale: see comment 19
(In reply to comment #19) > Hm, yes, usernames are a good example of something where this wouldn't work. > > This definitely argues for an opt-in solution. However, it's still not clear to > me that having multiple submission fields is a good idea. If the user has > opted-in to getting this information, is there any harm at that point in > "simply" making the submitted data have appropriate bidi formatting characters? Yes, there is. 1. The app needs the direction metadata as a separate piece of information. For example, when displaying the data, the direction needs to be indicated via the dir attribute, not formatting characters, in order to comply with existing W3C recommendations that highly discourage the appearance of formatting characters in HTML (except for contexts that do not allow mark-up, e.g. inside the title element). Thus, the app will have to strip away the formatting characters added by the user agent because of submitdir (and store the direction metadata separately). Or, to continue with the example of this site's user names, let's say that the site had submitdir on the username input in the page for defining a new account, so that the user can indicate (in the usual manner) the correct way to display his or her username. The app will then have to strip off the formatting characters because the user should not have to enter the username with the formatting characters every time he or she logs in. The very need of having to extract metadata from the data - and then to strip the metadata out of the data - makes the suggested approach clumsy. 2. Detecting and stripping away the formatting characters would be more difficult than one might think. The correct way to indicate direction in formatting characters is to wrap the data in an LRE or RLE at the beginning and a PDF character at the end. However, it is not good enough to simply check whether the first character is LRE or RLE and the last character is PDF, since that would misunderstand (and garble) the string "[LRE]css[PDF] IS MORE FUN THAN [LRE]html[PDF]". 3. There is no way to tell whether the formatting characters were added by the user agent as metadata or entered by the user to be a permanent part of the data. To continue with the example of this site's user names, if it was the user that entered the formatting characters into the username for some strange reason, and intends to enter them at every log-in, stripping them off is not good.
That all seems to boil down to one reason: that we suggest that you shouldn't use bidi formatting characters. But why do we suggest that? Wouldn't a simpler solution be to just not suggest that?
I observe that 1) This feature is only useful if the application is willing to do significant work to opt in. Using it is not trivial if the information is submitted as a separate piece of info -- you'd have to track that extra info and store it out-of-band somehow, which would be very intrusive. 2) If bug 10821 is fixed, so that JavaScript can reliably tell what the direction of the input is (even accounting for the user manually switching the direction), then this feature could be emulated by JavaScript. This wouldn't take much more work than adapting an application that uses the feature to begin with -- the JavaScript to submit the info (either as invisible characters or out-of-band) is trivial. I therefore suggest that bug 10821 is fixed in some way, so that direction detection can be done from JavaScript. If it turns out that authors use it commonly and consistently enough to warrant a non-scripted method to do this, we can consider that then. In particular, we'll be able to tell empirically whether authors would prefer automatic insertion of control characters, or some out-of-band data, or if they don't use the feature at all. (Although directionality control characters are normally evil, I suspect they're actually the lesser evil here. Storing the direction out-of-band would be much more complicated.)
(Inserting a control character won't work for multi-paragraph input, would it? You'd have to parse the input for paragraph breaks and insert a control character for each paragraph. This is probably not quite trivial, actually.)
(In reply to comment #22) > That all seems to boil down to one reason: that we suggest that you shouldn't > use bidi formatting characters. But why do we suggest that? Wouldn't a simpler > solution be to just not suggest that? I think it boils down to more than that, but let me answer you. LRE, RLE, LRO, RLO and PDF are evil in HTML for many reasons, but one of them is that it is impossible to give a reasonable definition of how they should interact with direction specified by mark-up, especially given that the PDF can come in a different element than the opening character, or not at all, without making it an invalid HTML document.
(In reply to comment #23) > I observe that > > 1) This feature is only useful if the application is willing to do significant > work to opt in. Yes. > Using it is not trivial if the information is submitted as a > separate piece of info -- you'd have to track that extra info and store it > out-of-band somehow, which would be very intrusive. It is not intrusive if that's what the app wants to do. And if the app wants to store it by adding control characters to the input value, it can do that too - that's their business. Adding control characters is a lot easier than stripping them off. > > 2) If bug 10821 is fixed, so that JavaScript can reliably tell what the > direction of the input is (even accounting for the user manually switching the > direction), then this feature could be emulated by JavaScript. This wouldn't > take much more work than adapting an application that uses the feature to begin > with -- the JavaScript to submit the info (either as invisible characters or > out-of-band) is trivial. As explained in the proposal, this is only so when the page can use script. What if the page is being sent as HTML-mail, where script is not allowed - but forms are.
(In reply to comment #26) > It is not intrusive if that's what the app wants to do. It's very intrusive to store out-of-band directionality info for each submitted item. Applications normally pass around user input as strings. If suddenly you have to pass them around as strings-plus-direction, you have to change all your functions to recognize this format throughout the code. > And if the app wants to > store it by adding control characters to the input value, it can do that too - > that's their business. Adding control characters is a lot easier than stripping > them off. It's not trivial, though, if the input is multiple paragraphs, right? > As explained in the proposal, this is only so when the page can use script. > What if the page is being sent as HTML-mail, where script is not allowed - but > forms are. We should aim to cover the most common use-cases first. If it's possible using JavaScript, that means it's possible 95% of the time. If authors do this in JavaScript often enough, that's when we should consider a declarative, non-scripted way of doing it. If it turns out that authors don't do it in JavaScript even when they can, then it's not worth adding the feature for. (Do e-mails really allow forms in practice? I can't recall ever seeing a form in an e-mail in my life.)
(In reply to comment #27) > (In reply to comment #26) > It's very intrusive to store out-of-band directionality info for each submitted > item. Applications normally pass around user input as strings. If suddenly > you have to pass them around as strings-plus-direction, you have to change all > your functions to recognize this format throughout the code. Applications won't suddenly have to do anything. If they want to use the explicit direction information, they would turn on submitdir and provide a place to store the direction bit. Most string processing functions are unaffected by the direction and would not have to change. Those that do care would have to get a new parameter. > > > And if the app wants to > > store it by adding control characters to the input value, it can do that too - > > that's their business. Adding control characters is a lot easier than stripping > > them off. > > It's not trivial, though, if the input is multiple paragraphs, right? It is not trivial, but multiple paragraphs are more difficult to deal with whether stripping or adding the control characters. Either way, it's easier to add than to strip. > (Do e-mails really allow forms in practice? I can't recall ever seeing a form > in an e-mail in my life.) Definitely yes. Google spreadsheet forms use them. > We should aim to cover the most common use-cases first. If it's possible using > JavaScript, that means it's possible 95% of the time. If so, why did all the browsers bother to provide the built-in direction selection feature? Applications could always do it in script. > If authors do this in > JavaScript often enough, that's when we should consider a declarative, > non-scripted way of doing it. If it turns out that authors don't do it in > JavaScript even when they can, then it's not worth adding the feature for. The problem is that the round trip takes something like five years. I have an application for it right now.
(In reply to comment #28) > Applications won't suddenly have to do anything. If they want to use the > explicit direction information, they would turn on submitdir and provide a > place to store the direction bit. Sure, but I'm saying it's not really any more effort to emulate it in JavaScript as well, once you're doing that much work. > If so, why did all the browsers bother to provide the built-in direction > selection feature? Applications could always do it in script. If authors want to preserve directionality on forms, and they can do so in script with not much more work than with a dedicated declarative feature, you can expect them to do so. If authors don't want to preserve directionality on forms, nobody can force them to, so it's moot. By contrast, users want to be able to select direction regardless of whether the author wrote any script to do it. Browsers provide the feature because that way it works even if the author doesn't know or care about directionality at all. That's not possible in this case, because the author's application has to explicitly opt in to the directionality info no matter what. > The problem is that the round trip takes something like five years. I have an > application for it right now. The same is true of countless other features that people want to be part of the web platform. They need to be prioritized somehow. One guideline for that is that we don't provide declarative features for things that can be easily emulated in script unless it's a very common or error-prone use-case.
(In reply to comment #25) > I think it boils down to more than that, but let me answer you. LRE, RLE, LRO, > RLO and PDF are evil in HTML for many reasons, but one of them is that it is > impossible to give a reasonable definition of how they should interact with > direction specified by mark-up That's not true, since we in fact define everything in terms of these characters in CSS. It's not only possible, it's literally the only way it is done. It seems like the simplest solution here, especially considering <textarea>s and multiple paragraphs with different directionality, is to have an attribute that, if present, causes the user agent to include the relevant bidi formatting characters in the submission of the control's value. I don't really see how else we could do it... I mean, we could submit a second value that just had a list of character ranges labeled as ltr or rtl, but that would be even harder to manage, as far as I can tell (and easier to implement incorrectly e.g. you could trick a site by sending overlapping ranges).
(In reply to comment #30) > (In reply to comment #25) > > LRE, RLE, LRO, > > RLO and PDF are evil in HTML for many reasons, but one of them is that it is > > impossible to give a reasonable definition of how they should interact with > > direction specified by mark-up > > That's not true, since we in fact define everything in terms of these > characters in CSS. It's not only possible, it's literally the only way it is > done. Just because C++ is implemented in terms of machine language commands does not mean that programmers should be encouraged to insert snippets of machine code into their C++ programs (even if the compiler does support that), or that an IDE, when asked to "create getter/setter", should code up ones in assembler. Here are just some reasons why these formatting characters are like machine code, and should be highly discouraged: 1. It is very easy for these characters (where the PDF is the "closing parenthesis" to the others) to get out of balance and become completely nonsensical, e.g. [PDF][LRE]. Of course, the same can be said for HTML's opening and closing tags, but if the tags are out of balance, the document is invalid, and your authoring tools will help you prevent that from happening. The formatting characters, being just text, do not affect the validity of the document, and you are completely on your own. 2. Similarly, these characters, while being perfectly balanced on their own, can very, very easily become "entangled" between the scopes of the document's tags. For example, what exactly is the browser to make of <span dir=rtl> ... [LRE] ... </span> ... [PDF]? Once again, the same can be said of HTML opening and closing tags, but if you get those wrong, the document is invalid, but the snippet above is perfectly valid HTML. To see just how easily that can happen, consider the case where text containing formatting characters is displayed by an app with added mark-up it adds, e.g. to indicate the search hits in it. 3. Speaking of CSS, how exactly should the formatting characters - if encouraged - interact with the direction-dependent CSS, e.g. text-align:start? For example, consider: [RLE]<div style="text-align:start">blah blah</div>[PDF] Should the direction CSS property be rtl for the div? Should it be aligned to the right? What if the [RLE] and [PDF] were inside the div? It's pretty clear to me that (just as is the case today) the answer should be "no" in all cases. However, the fact remains that in many cases, opposite-direction text gathered from the user is best displayed aligned to its start edge. So, to get that, I will still need to make the *div* say dir=rtl, and not leave it up to the text inside the div. And in order to do that after the browser has stuck the formatting characters into text (because the user entering it indicated its direction), the server side of my app will need to parse the text in order to figure out that indeed it is wrapped in formatting characters. And when I say "parse", I really mean parse: while the formatting characters in "[RLE]BLAH blah BLAH[PDF]" might (!) have been inserted by the mechanism you are proposing, the formatting characters in "[RLE]BLAH[PDF] blah [RLE]BLAH[PDF]" definitely were not, and to understand that, the app will need to scan right through the whole string. I think this use case also makes it clear that one really needs the direction data out-of-band. > It seems like the simplest solution here, especially considering <textarea>s > and multiple paragraphs with different directionality, is to have an attribute > that, if present, causes the user agent to include the relevant bidi formatting > characters in the submission of the control's value. I don't really see how > else we could do it... I mean, we could submit a second value that just had a > list of character ranges labeled as ltr or rtl, but that would be even harder > to manage, as far as I can tell (and easier to implement incorrectly e.g. you > could trick a site by sending overlapping ranges). As I said, I have no intention of using submitdir to support the use case of the user indicating the direction of individual paragraphs inside a textarea (as opposed to indicating the direction of all the paragraphs in a textarea at once). While there are some plain-text editors, e.g. gedit, that support per-paragraph *direction auto-estimation* (and that is why we want autodirmethod=plaintext), as far as I know none support per-paragraph *user control* over directionality. That functionality has only been done in rich text editors (including browser-based rich-text editors like TinyMCE). So, I don't see why we have to try to figure out how we can make the puny textarea, which is not even a full-featured plain text editor, do it.
(In reply to comment #31) > Here are just some reasons why these formatting characters are like machine > code, and should be highly discouraged: I mean in HTML, and specifically in those parts of HTML where mark-up is allowed. Unfortunately, we have no choice but to use formatting characters in <title> and <option>, as well as attributes like title and alt.
(In reply to comment #31) > The formatting characters, being just text, do not affect the validity of the > document, and you are completely on your own. We could make bogus uses of the formatting characters invalid.
As indicated in bug 10821, the use cases there go beyond those that would be solved by submitdir.
(In reply to comment #33) > (In reply to comment #31) > > The formatting characters, being just text, do not affect the validity of the > > document, and you are completely on your own. > > We could make bogus uses of the formatting characters invalid. You could, and it might help, but this would be a first. I am not aware of HTML treating any other part of text content as potentially making a document invalid. If you think about it, you would actually be treating them as a kind of pseudo-mark-up. In any case, though, this does not address part 3 of my comment, which I think is the most important one.
(In reply to comment #35) > You could, and it might help, but this would be a first. I am not aware of HTML > treating any other part of text content as potentially making a document > invalid. Sure it does: http://validator.nu/?doc=data%3Atext%2Fhtml%2C%3C!doctype+html%3E%3Cmeta+charset%3Dutf-8%3E%3Ctitle%3E%3C%2Ftitle%3E%2500 """ Text must consist of Unicode characters. Text must not contain U+0000 characters. Text must not contain permanently undefined Unicode characters (noncharacters). Text must not contain control characters other than space characters. Extra constraints are placed on what is and what is not allowed in text based on where the text is to be put, as described in the other sections. """ http://www.whatwg.org/specs/web-apps/current-work/multipage/syntax.html#text-1
(In reply to comment #36) You are right! It would be nice to make such improper use of bidi formatting characters invalid. One way of defining it might be something like "A document containing text that includes the characters LRE (U+202A), RLE (U+202B), LRO (U+202D), RLO (U+202E), or PDF (U+202C), or their corresponding entities, is invalid if it would be invalid with all LRE, RLE, LRO, and RLO characters replaced with a <span> tag and all PDF characters replaced with a </span>." I can file a separate bug if you like. However, as stated above, this would not have much impact on this bug.
(In reply to comment #31) > 1. It is very easy for these characters (where the PDF is the "closing > parenthesis" to the others) to get out of balance and become completely > nonsensical, e.g. [PDF][LRE]. Of course, the same can be said for HTML's > opening and closing tags, but if the tags are out of balance, the document is > invalid, and your authoring tools will help you prevent that from happening. > The formatting characters, being just text, do not affect the validity of the > document, and you are completely on your own. We should definitely make them affect the validity if it's a concern that people will use them incorrectly and could benefit from validator tools flagging these problems. Please file a bug suggesting this if you think it would help. > 2. Similarly, these characters, while being perfectly balanced on their own, > can very, very easily become "entangled" between the scopes of the document's > tags. For example, what exactly is the browser to make of <span dir=rtl> ... > [LRE] ... </span> ... [PDF]? What should happen is defined by CSS, which defines all of the bidi formatting rules in terms of bidi formatting characters. > 3. Speaking of CSS, how exactly should the formatting characters - if > encouraged - interact with the direction-dependent CSS, e.g. text-align:start? > For example, consider: > > [RLE]<div style="text-align:start">blah blah</div>[PDF] > > Should the direction CSS property be rtl for the div? Should it be aligned to > the right? What if the [RLE] and [PDF] were inside the div? The meaning of 'start' is entirely based on the 'direction' property and nothing else. This is all defined in the CSS spec. > However, the fact remains that in many cases, > opposite-direction text gathered from the user is best displayed aligned to its > start edge. So, to get that, I will still need to make the *div* say dir=rtl, > and not leave it up to the text inside the div. Why not just use dir=auto? If the first character is a bidi formatting character, that'll work as intended, no? > And in order to do that after > the browser has stuck the formatting characters into text (because the user > entering it indicated its direction), the server side of my app will need to > parse the text in order to figure out that indeed it is wrapped in formatting > characters. And when I say "parse", I really mean parse: while the formatting > characters in "[RLE]BLAH blah BLAH[PDF]" might (!) have been inserted by the > mechanism you are proposing, the formatting characters in "[RLE]BLAH[PDF] blah > [RLE]BLAH[PDF]" definitely were not, and to understand that, the app will need > to scan right through the whole string. How would a user ever end up submitting text in this latter state? Why would you not use dir=rtl in this case anyway? > As I said, I have no intention of using submitdir to support the use case of > the user indicating the direction of individual paragraphs inside a textarea > (as opposed to indicating the direction of all the paragraphs in a textarea at > once). That seems a bit limited, but if it's really not something people want to do, fair enough. Anyway, I can see the appeal (in terms of simplicity) of out-of-band direction indication. I'll look into the feasability of just having a boolean attribute on <input> and <textarea> that results in a separate field in the submission.
(In reply to comment #38) > (In reply to comment #31) > > 1. It is very easy for LRE, RLE, LRO, RLO, and PDF [...] > > to get out of balance [...]. > > We should definitely make them affect the validity if it's a concern that > people will use them incorrectly and could benefit from validator tools > flagging these problems. Please file a bug suggesting this if you think it > would help. Will do. > Anyway, I can see the appeal (in terms of simplicity) of out-of-band direction > indication. I'll look into the feasability of just having a boolean attribute > on <input> and <textarea> that results in a separate field in the submission. Thank you. If that is the bottom line, you can ignore my answers below to the stuff that preceded this. > > 2. Similarly, these characters, while being perfectly balanced on their own, > > can very, very easily become "entangled" between the scopes of the document's > > tags. For example, what exactly is the browser to make of <span dir=rtl> ... > > [LRE] ... </span> ... [PDF]? > > What should happen is defined by CSS, which defines all of the bidi formatting > rules in terms of bidi formatting characters. If so, the text between the </span> and the PDF will come out RTL, since the </span> is equivalent to a PDF, which would be interpreted by the UBA to match the LRE, thus closing it, and reverting to the RTL direction defined by the <span dir=rtl>. How much sense does that make - the <span dir=rtl> was supposed to end with the </span>, and the bidi formatting character was LRE, not RLE! If one had equivalently entangled end tags of elements, e.g. <i>A<b>B</i>C</b>, most browsers will attempt to display it the way the user intended it - with the C bold, not italic. I am not saying your interpretation of what should happen is bad, only that there is no good interpretation of this mess. > > 3. Speaking of CSS, how exactly should the formatting characters - if > > encouraged - interact with the direction-dependent CSS, e.g. text-align:start? > > For example, consider: > > > > [RLE]<div style="text-align:start">blah blah</div>[PDF] > > > > Should the direction CSS property be rtl for the div? Should it be aligned to > > the right? What if the [RLE] and [PDF] were inside the div? > > The meaning of 'start' is entirely based on the 'direction' property and > nothing else. This is all defined in the CSS spec. I know. The point is that the formatting characters will not have any effect on the CSS - and that effect is vital if you want things to work well. I am just trying to demonstrate why in HTML you need to use mark-up (dir=), not the bidi formatting characters. > > However, the fact remains that in many cases, > > opposite-direction text gathered from the user is best displayed aligned to its > > start edge. So, to get that, I will still need to make the *div* say dir=rtl, > > and not leave it up to the text inside the div. > > Why not just use dir=auto? If the first character is a bidi formatting > character, that'll work as intended, no? 1. Deciding it is RTL simply because the first character is RLE is definitely wrong: consider "[RLE]JOE[PDF] likes to eat." It is an English sentence, LTR, not RTL. In RTL, it would be displayed as ".likes to eat EOJ" instead of the correct "EOJ likes to eat." 2. Unfortunately, the standard UBA algorithm (first-strong) ignores formatting characters. We would have to twiddle with it a little to make it support them (e.g. ignore the stuff inside them too, except for the case when the whole string is wrapped in them, in which case return the direction they indicate). > > And in order to do that after > > the browser has stuck the formatting characters into text (because the user > > entering it indicated its direction), the server side of my app will need to > > parse the text in order to figure out that indeed it is wrapped in formatting > > characters. And when I say "parse", I really mean parse: while the formatting > > characters in "[RLE]BLAH blah BLAH[PDF]" might (!) have been inserted by the > > mechanism you are proposing, the formatting characters in "[RLE]BLAH[PDF] blah > > [RLE]BLAH[PDF]" definitely were not, and to understand that, the app will need > > to scan right through the whole string. > > How would a user ever end up submitting text in this latter state? By pasting from some HTML page that uses bidi formatting characters :-) > Why would you not use dir=rtl in this case anyway? Let me make the example clearer with real text instead of blahs: [RLE]JOE[PDF] intends to call [RLE]SUSAN[PDF] This is an English sentence that happens to use some names in an RTL script. It is thus LTR. It needs to be displayed as EOJ intends to call NASUS which will only happen if it is displayed LTR. In RTL, it will be displayed as NASUS intends to call EOJ which actually reverses the meaning. > > As I said, I have no intention of using submitdir to support the use case of > > the user indicating the direction of individual paragraphs inside a textarea > > (as opposed to indicating the direction of all the paragraphs in a textarea at > > once). > > That seems a bit limited, but if it's really not something people want to do, > fair enough. I didn't say that people don't want it. I am saying that no one has figured out a way to give it to them, even in a full-featured plain text editor.
(In reply to comment #39) > (In reply to comment #38) > > (In reply to comment #31) > > > 1. It is very easy for LRE, RLE, LRO, RLO, and PDF [...] > > > to get out of balance [...]. > > > > We should definitely make them affect the validity if it's a concern that > > people will use them incorrectly and could benefit from validator tools > > flagging these problems. Please file a bug suggesting this if you think it > > would help. > > Will do. Filed as bug 11234.
We need a better name than "submitdir". I mentioned it on IRC and the first two people who commented interpreted it as being a "submit directory" (more like "action"), and the third person interpreted it as being something that takes a value (and controls the input direction, presumably, i.e. like dir="") rather than controlling whether the element submits something. I'll need to think about this some more.
Maybe "dirname", with the value being the name of the field in which to put the direction; etymology being an amalgamation of "dir" and "name", the two attributes that it is most closely related to?
1. dirname has the same problem as submitdir, i.e. that to most people, "dir" is a synonym for directory, not direction. In fact, it has that problem to a higher degree, since directories do have names, but a person is at least likely to wonder what the heck a submit directory would be. 2. When I originally proposed submitdir about a year ago, it was with the same semantics: its value the name under which the dir will be submitted. The feedback I got was: - No one will ever be sure whether they are supposed to create a hidden input with that name, or whether the control is created for you automatically in the submission. - The name of the input and the name of the control to be added to the submission are bound to be intimately related, e.g. <input name="foo" submitdir="foordir">. Why force the author to type in what we know he will type in anyway? A standard naming scheme should be good enough. 3. If you do want to specify a complete name, there is also a completely different alternative: pull instead of push. We could have an attribute named dirof, or even directionof, and its value would be the name of another control in the same form whose computed direction ('ltr' or 'rtl') would provide this control's value at submit time. For example: <input name=foo type=text dir=auto /> <input name=foodir directionof=foo /> The default stylesheet would make all inputs with directionof hidden by default. And there is no need to specify a type value for a directionof input - it must be text. This way, there is no magic creation of a submission control, and no room to wonder where if there is or not.
(In reply to comment #43) > 3. If you do want to specify a complete name, there is also a completely > different alternative: pull instead of push. We could have an attribute named > dirof, or even directionof, and its value would be the name of another control > in the same form whose computed direction ('ltr' or 'rtl') would provide this > control's value at submit time. For example: > > <input name=foo type=text dir=auto /> > <input name=foodir directionof=foo /> > > The default stylesheet would make all inputs with directionof hidden by > default. And there is no need to specify a type value for a directionof input - > it must be text. > > This way, there is no magic creation of a submission control, and no room to > wonder where if there is or not. I like this proposal. Except that 1. it's a lot of typing. 2. To work best with older browsers, you need to manually hide it anyway, so, no gain there. Here's yet another proposal: <input name=foo type=text xxx=xxx> where xxx is whatever attribute name we comeup with. What this does though, is that it will instruct the agent to included exactly one of U+200E or U+200F at the beginning of the submitted text, depending on the resolved direction of the text.
(In reply to comment #44) > > Here's yet another proposal: > > <input name=foo type=text xxx=xxx> > > where xxx is whatever attribute name we comeup with. What this does though, is > that it will instruct the agent to included exactly one of U+200E or U+200F at > the beginning of the submitted text, depending on the resolved direction of the > text. That's what I was proposing earlier, but everyone seemed to think it was a terrible idea. :-) Regarding the other point it's not that I want to be able to provide an explicit name (I'd be fine even with just saying that dirname with no value defaults to name+".dir") but that "submitdir" isn't understood. "dirname" is certainly not ideal, but it has the advantage of precedent "dir" and "name" are the two attributes most closely related to what we're doing here. I'm certainly open to better names, but people think "submitdir" means "the directory you submit to" (similar to "action"; someone even suggested renaming it "actiondir"), so that's just not going to fly. It's unclear what you would misinterpret dirname="foodir" as being, if you didn't know what it was. Not knowing what it is far less of a problem than being confident that it is something different than what it really is.
(In reply to comment #44) > (In reply to comment #43) > > 3. If you do want to specify a complete name, there is also a completely > > different alternative: pull instead of push. We could have an attribute named > > dirof, or even directionof, and its value would be the name of another control [...] > > The default stylesheet would make all inputs with directionof hidden by > > default. > > I like this proposal. Except that 1. it's a lot of typing. Agreed, but that does not particularly bother. > 2. To work best > with older browsers, you need to manually hide it anyway, so, no gain there. True. > Here's yet another proposal: > > <input name=foo type=text xxx=xxx> > > where xxx is whatever attribute name we comeup with. What this does though, is > that it will instruct the agent to included exactly one of U+200E or U+200F at > the beginning of the submitted text, depending on the resolved direction of the > text. The characters being proposed here are LRM and RLM. The discussion above had dealt with LRE and RLE (and a PDF at the end), which are both similar and different. One difference is that there is no objection to the use of LRM and RLM in HTML, so in this sense this proposal is better than wrapping in LRE|RLE and PDF. Another difference, however, is that although the first-strong estimation algorithm would in fact estimate a string starting LRM|RLM to be in the intended direction, LRM and RLM do not declare direction. They are just invisible strong-directional characters, like an invisible A and an invisible alef. They do not guarantee that the remainder of the string is displayed in the intended direction, and in fact their normal use case is for strictly local effect (e.g. "[LRM]10 main street IS THE ADDRESS.", which is intended to be RTL overall - the leading LRM just makes the LTR address get displayed as intended, instead of as "main street 10"). Thus, when displaying a string obtained using the proposed feature in another HTML page, the application would either have to wrap it in an element with dir=auto, or check for the leading character, optionally remove it, and wrap the string in an element with dir=ltr|rtl. When displaying the string in plain text, where dir=auto is not an option, the application would have no choice but to check for the leading character, optionally remove it, and wrap the string in LRE|RLE and PDF. The advantage of using LRE|RLE and PDF - that you can just leave the formatting characters in place and get the intended display wherever you happen to plop the string - is gone. In this sense, this proposal is even worse than wrapping in LRE|RLE and PDF. And, in fact, removing the LRM|RLM before including the string in output is pretty much essential. If it is not removed, and the user copy/pastes it along with the string, it will eventually cause problems: editing a string containing invisible characters is always fun, ands the effects of the LRM|RLM when the string is reused are unpredictable, since, as already noted above, LRM|RLM does not declare the direction of anything. And if it is not removed before being used as the default value of an input in another page, the added formatting characters would build up ad infinitum. Furthermore, please note that it is impossible, given a string of unknown provenance, to tell whether its leading LRM|RLM was put there by the proposed feature or put there by the string's author for a purpose very different than indicating the string's overall direction. Thus, the proposed feature would have to *always* add the LRM|RLM to the string reported in the submission, so the application knows that whatever it gets always has a leading LRM|RLM that indicates the overall direction. Given that the application would then proceed to strip the LRM|RLM for the reasons indicated above, I propose that adding a leading LRM|RLM is roughly the same as adding any other prefix to the string, even a visible one like either "ltr" or "rtl". On the other hand, and is much easier to abuse than an out-of-band method like another form control, as originally proposed.
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document: http://dev.w3.org/html5/decision-policy/decision-policy.html Status: Partially Accepted Change Description: see diff given below Rationale: There seems to be a good use case. I used dirname="" rather than submitdir="" for the reasons described above, though.
Checked in as WHATWG revision r5676. Check-in comment: Add dirname='' feature (may still be renamed or changed if someone comes up with a better solution) http://html5.org/tools/web-apps-tracker?from=5675&to=5676
(In reply to comment #48) Several problems: 1. I am not in love with the "dirname" name. How about "add-direction-as"? 2. Spec should say what happens if an actual input element with the name given by dirname exists in this form. Suggested behavior was to add it on without overriding. 3. Spec should say what happens when dirname is given with no value. Can't we have a reasonable default, like the name value suffixed with "_dir"? 4. Spec should say that the value is either "ltr" or "rtl". Specifically, it is never "auto", but the estimated direction.
Aharon just one question: is "as" as necessary part of the new attribute name? That is, is there a reason it needs to be "add-direction-as" instead of "add-direction"? Best, --C. E. Whitehead
(In reply to comment #49) > 1. I am not in love with the "dirname" name. How about "add-direction-as"? CE is right, the "as" isn't necessary. Of course, there is no precedent for a dash or underscore in HTML attributes, so it would actually be addDirection (case insensitive, as usual). > 4. Spec should say that the value is either "ltr" or "rtl". Specifically, it is > never "auto", but the estimated direction. This is not strictly necessary, since the spec refers to "directionality", which is defined in the spec as being either ltr or rtl - not auto. However, it would not hurt to mention this.
(In reply to comment #49) > > 1. I am not in love with the "dirname" name. How about "add-direction-as"? I'm not in love with "dirname" either, but it has some distinct advantages: it's short, its name is formed from the names of the two attributes to which it is most closely related, and it is accurate (it gives the "name" of the "dir", just like "name" gives the "name" of the field). I don't think either "add-direction-as" nor "adddirection" are better; they're longer, don't fit the style of HTML attribute names (insofar as there is a style), and are no more intuitive. > 2. Spec should say what happens if an actual input element with the name given > by dirname exists in this form. Suggested behavior was to add it on without > overriding. As far as I can tell this is completely defined already. > 3. Spec should say what happens when dirname is given with no value. Can't we > have a reasonable default, like the name value suffixed with "_dir"? Currently, giving no value is not valid. We could allow it and say that it automatically generates a field name, but given how unintuitive this is already, I'm not sure adding magic here is a good idea. > 4. Spec should say that the value is either "ltr" or "rtl". Specifically, it is > never "auto", but the estimated direction. Everywhere where it lists what the value will be, it lists all the values and doesn't list "auto". Where would you add text saying that it's never "auto"?
(In reply to comment #52) > (In reply to comment #49) > > > > 1. I am not in love with the "dirname" name. How about "add-direction-as"? > I'm not in love with "dirname" either, but it has some distinct advantages: > it's short, its name is formed from the names of the two attributes to which it > is most closely related, and it is accurate (it gives the "name" of the "dir", > just like "name" gives the "name" of the field). I don't think either > "add-direction-as" nor "adddirection" are better; they're longer, don't fit the > style of HTML attribute names (insofar as there is a style), and are no more > intuitive. Hi. I am inferring that you are happy with "dir" here -- although in an earlier posting you said: "people think "submitdir" means "the directory you submit to" (similar to "action"; someone even suggested renaming it "actiondir")" and I thus thought you opposed "dir;" if we are going to use "dir" then to me "submidir" or "adddir" seems better well "adddir" has 3 d's in a row so it's not quite as palatable as "submitdir" . . . just my two cents. (I have no problem with "dir;" but "dirname" sounds strange to me . . .) Best, C. E. Whitehead cewcathar@hotmail.com > > 2. Spec should say what happens if an actual input element with the name given > > by dirname exists in this form. Suggested behavior was to add it on without > > overriding. > As far as I can tell this is completely defined already. > > 3. Spec should say what happens when dirname is given with no value. Can't we > > have a reasonable default, like the name value suffixed with "_dir"? > Currently, giving no value is not valid. We could allow it and say that it > automatically generates a field name, but given how unintuitive this is > already, I'm not sure adding magic here is a good idea. > > 4. Spec should say that the value is either "ltr" or "rtl". Specifically, it is > > never "auto", but the estimated direction. > Everywhere where it lists what the value will be, it lists all the values and > doesn't list "auto". Where would you add text saying that it's never "auto"?
I think the problem the "dir" in "submitdir" is that it's used as an object noun, and that when they try to think of what you might do with a "dir" they jump to "directory" as an expansion first. One other problem with "add dir" is the triple d; generally you want to avoid triple letters in identifiers as they lead to more typos. Anyway, we're just bikeshedding now.
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document: http://dev.w3.org/html5/decision-policy/decision-policy.html Status: Partially Accepted Change Description: see diff given below Rationale: see comment 52
(Oops, I meant to say "no changes", not "see diff below". The diff in comment 48 is still the only one here.)