This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 24581 - Fix ByteString type & [EnsureUTF16] flag story
Summary: Fix ByteString type & [EnsureUTF16] flag story
Status: RESOLVED FIXED
Alias: None
Product: WebAppsWG
Classification: Unclassified
Component: WebIDL (show other bugs)
Version: unspecified
Hardware: PC All
: P2 normal
Target Milestone: ---
Assignee: Cameron McCormack
QA Contact: public-webapps-bugzilla
URL:
Whiteboard:
Keywords:
: 20159 (view as bug list)
Depends on:
Blocks: 23317 23346 25540
  Show dependency treegraph
 
Reported: 2014-02-07 16:58 UTC by Anne
Modified: 2014-11-17 18:37 UTC (History)
7 users (show)

See Also:


Attachments

Description Anne 2014-02-07 16:58:53 UTC
See http://lists.w3.org/Archives/Public/public-script-coord/2013JulSep/0165.html for why [EnsureUTF16] is wrong and ByteString is okay. We should have a consistent story for these conversions:

* code units <-> bytes
* code units <-> code points
Comment 1 Anne 2014-05-16 15:19:17 UTC
Per https://github.com/slightlyoff/ServiceWorker/issues/263#issuecomment-43338876 we should have ScalarValueString rather than the current modifier setup.
Comment 3 Anne 2014-06-13 12:11:54 UTC
We should also allow ByteString and ScalarValueString to have default values.
Comment 4 Brendan Eich 2014-06-23 16:50:11 UTC
ScalarValueString is Not Great. It uses a Unicode phrase out of context, so does not connote Unicode -- in particular does not suggest that lone surrogates are replaced by U+FFFD -- and also sounds grandiosely general ("scalar value" in what domain? String theory? :-P).

Concrete beats abstract when something very concrete such as U+FFFD replacement is going on under the hood. Any of UnicodeString or UniString or UCString would be better.

A bikeshedding symposium is in order. I don't have a particular favorite but can provide beverages if nearby.

The big-picture point is: we can do better than ScalarValueString, and we should. The future is bigger than the past. Let's get this right soon, before ScalarValueString spreads widely and we can't change it.

/be
Comment 5 Anne 2014-06-24 08:17:02 UTC
1. ScalarValueString is not in any way exposed. We can always change it. It is specification language.

2. "scalar value" in the context of strings means a code point that is not a lone surrogate. It therefore very accurately describes the intent.

3. An attractive name such as UnicodeString might lead to adoption in places that do not require it. String/DOMString should be used normally.
Comment 6 Brendan Eich 2014-06-24 12:12:51 UTC
(In reply to Anne from comment #5)
> 1. ScalarValueString is not in any way exposed. We can always change it. It
> is specification language.

I know. It's still a bug to fix rather than spread all over spec-land.

> 2. "scalar value" in the context of strings

In the context of *Unicode strings*.

> means a code point that is not a
> lone surrogate. It therefore very accurately describes the intent.

Asserting accuracy does not demonstrate it.

Spec authors don't all grok or use Unicode-relative jargon, nor should they have to. Plus, the name's too long. Oh, and it smells too :-P.

> 3. An attractive name such as UnicodeString might lead to adoption in places
> that do not require it. String/DOMString should be used normally.

UnicodeString is less attractive, but if you are worried about an attractive nuisance, then your assertion #2 fails. Unicode experts know more than enough to use the right type.

Who is the audience here, what's the threat model?

/be
Comment 7 Domenic Denicola 2014-06-24 13:21:02 UTC
> 3. An attractive name such as UnicodeString might lead to adoption in places that do not require it. String/DOMString should be used normally.

I just want to second this. As an API designer, if presented with the choice between `UnicodeString` and `DOMString`, I'd probably choose UnicodeString. ("Of course I want to support Unicode! I bet whatever a DOMString is, it's to support some weird legacy DOM APIs.")

I don't see why ScalarValueString is bad, really, despite the stated reasons. But if this must change for whatever reason, it needs to sound "weird." Maybe PostprocessedUnicodeString or ScalarizedUnicodeString or something.
Comment 8 Allen Wirfs-Brock 2014-06-24 14:45:51 UTC
(In reply to Domenic Denicola from comment #7)

DOMSVString

In seems to have everything we are looking for: unattractive, short, specific.
Comment 9 Boris Zbarsky 2014-06-24 15:31:59 UTC
> Who is the audience here, what's the threat model?

The threat model is people who have no clue what they're doing (most web spec authors at one point or another, and quite rationally so) going "Oooh, Unicode is the shiny, I should use that" and not realizing that in this case "Unicode" actually means "dataloss" in many cases.
Comment 10 Anne 2014-06-26 09:17:52 UTC
(In reply to Allen Wirfs-Brock from comment #8)
> DOMSVString
> 
> In seems to have everything we are looking for: unattractive, short,
> specific.

That works for me. Is there any ETA on when this will land in IDL? If it's going to take long I could rename this in various places before I go away most of July.
Comment 11 Cameron McCormack 2014-10-03 08:32:51 UTC
(In reply to Anne from comment #10)
> That works for me. Is there any ETA on when this will land in IDL? If it's
> going to take long I could rename this in various places before I go away
> most of July.

Oops, it's October already. :-(

I agree that "Unicode" might well be an attractive nuisance, particularly if "String" is also in the name.

I've gone with "USVString".  You can now also use string literals for optional argument and dictionary members of any of the string types.

https://github.com/heycam/webidl/commit/672e73d476d8c4341339cc22592bb5ff01c69823

http://heycam.github.io/webidl/#idl-USVString
http://heycam.github.io/webidl/#es-USVString
http://heycam.github.io/webidl/#string-literal
http://heycam.github.io/webidl/#dfn-overload-resolution-algorithm
http://heycam.github.io/webidl/#es-union

Let me know if this is sufficient.  Thanks!

This means we can remove [EnsureUTF16], yes?
Comment 12 Anne 2014-10-03 08:47:31 UTC
(In reply to Cameron McCormack from comment #11)
> This means we can remove [EnsureUTF16], yes?

Please, will review in detail later.
Comment 13 Anne 2014-10-03 08:50:13 UTC
Actually, just did it quickly, text seems fine too, thanks!
Comment 14 Cameron McCormack 2014-10-03 08:56:26 UTC
Great!  Removed [EnsureUTF16]: https://github.com/heycam/webidl/commit/e2cf800ef401578db8d1445d815dc13bc8899554
Comment 15 Anne 2014-10-03 08:57:08 UTC
Removed ScalarValueString from Encoding: https://github.com/whatwg/encoding/commit/193297f512d34bb952c4473f95324939dd2cc0d6
Comment 16 Anne 2014-11-17 18:37:34 UTC
*** Bug 20159 has been marked as a duplicate of this bug. ***