This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
See http://lists.w3.org/Archives/Public/public-script-coord/2013JulSep/0165.html for why [EnsureUTF16] is wrong and ByteString is okay. We should have a consistent story for these conversions: * code units <-> bytes * code units <-> code points
Per https://github.com/slightlyoff/ServiceWorker/issues/263#issuecomment-43338876 we should have ScalarValueString rather than the current modifier setup.
http://encoding.spec.whatwg.org/#type-scalarvaluestring
We should also allow ByteString and ScalarValueString to have default values.
ScalarValueString is Not Great. It uses a Unicode phrase out of context, so does not connote Unicode -- in particular does not suggest that lone surrogates are replaced by U+FFFD -- and also sounds grandiosely general ("scalar value" in what domain? String theory? :-P). Concrete beats abstract when something very concrete such as U+FFFD replacement is going on under the hood. Any of UnicodeString or UniString or UCString would be better. A bikeshedding symposium is in order. I don't have a particular favorite but can provide beverages if nearby. The big-picture point is: we can do better than ScalarValueString, and we should. The future is bigger than the past. Let's get this right soon, before ScalarValueString spreads widely and we can't change it. /be
1. ScalarValueString is not in any way exposed. We can always change it. It is specification language. 2. "scalar value" in the context of strings means a code point that is not a lone surrogate. It therefore very accurately describes the intent. 3. An attractive name such as UnicodeString might lead to adoption in places that do not require it. String/DOMString should be used normally.
(In reply to Anne from comment #5) > 1. ScalarValueString is not in any way exposed. We can always change it. It > is specification language. I know. It's still a bug to fix rather than spread all over spec-land. > 2. "scalar value" in the context of strings In the context of *Unicode strings*. > means a code point that is not a > lone surrogate. It therefore very accurately describes the intent. Asserting accuracy does not demonstrate it. Spec authors don't all grok or use Unicode-relative jargon, nor should they have to. Plus, the name's too long. Oh, and it smells too :-P. > 3. An attractive name such as UnicodeString might lead to adoption in places > that do not require it. String/DOMString should be used normally. UnicodeString is less attractive, but if you are worried about an attractive nuisance, then your assertion #2 fails. Unicode experts know more than enough to use the right type. Who is the audience here, what's the threat model? /be
> 3. An attractive name such as UnicodeString might lead to adoption in places that do not require it. String/DOMString should be used normally. I just want to second this. As an API designer, if presented with the choice between `UnicodeString` and `DOMString`, I'd probably choose UnicodeString. ("Of course I want to support Unicode! I bet whatever a DOMString is, it's to support some weird legacy DOM APIs.") I don't see why ScalarValueString is bad, really, despite the stated reasons. But if this must change for whatever reason, it needs to sound "weird." Maybe PostprocessedUnicodeString or ScalarizedUnicodeString or something.
(In reply to Domenic Denicola from comment #7) DOMSVString In seems to have everything we are looking for: unattractive, short, specific.
> Who is the audience here, what's the threat model? The threat model is people who have no clue what they're doing (most web spec authors at one point or another, and quite rationally so) going "Oooh, Unicode is the shiny, I should use that" and not realizing that in this case "Unicode" actually means "dataloss" in many cases.
(In reply to Allen Wirfs-Brock from comment #8) > DOMSVString > > In seems to have everything we are looking for: unattractive, short, > specific. That works for me. Is there any ETA on when this will land in IDL? If it's going to take long I could rename this in various places before I go away most of July.
(In reply to Anne from comment #10) > That works for me. Is there any ETA on when this will land in IDL? If it's > going to take long I could rename this in various places before I go away > most of July. Oops, it's October already. :-( I agree that "Unicode" might well be an attractive nuisance, particularly if "String" is also in the name. I've gone with "USVString". You can now also use string literals for optional argument and dictionary members of any of the string types. https://github.com/heycam/webidl/commit/672e73d476d8c4341339cc22592bb5ff01c69823 http://heycam.github.io/webidl/#idl-USVString http://heycam.github.io/webidl/#es-USVString http://heycam.github.io/webidl/#string-literal http://heycam.github.io/webidl/#dfn-overload-resolution-algorithm http://heycam.github.io/webidl/#es-union Let me know if this is sufficient. Thanks! This means we can remove [EnsureUTF16], yes?
(In reply to Cameron McCormack from comment #11) > This means we can remove [EnsureUTF16], yes? Please, will review in detail later.
Actually, just did it quickly, text seems fine too, thanks!
Great! Removed [EnsureUTF16]: https://github.com/heycam/webidl/commit/e2cf800ef401578db8d1445d815dc13bc8899554
Removed ScalarValueString from Encoding: https://github.com/whatwg/encoding/commit/193297f512d34bb952c4473f95324939dd2cc0d6
*** Bug 20159 has been marked as a duplicate of this bug. ***