This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 27436 - Document.charset
Summary: Document.charset
Status: RESOLVED FIXED
Alias: None
Product: WebAppsWG
Classification: Unclassified
Component: DOM (show other bugs)
Version: unspecified
Hardware: PC All
: P2 normal
Target Milestone: ---
Assignee: Anne
QA Contact: public-webapps-bugzilla
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-11-25 23:27 UTC by Philip Jägenstedt
Modified: 2015-09-22 07:56 UTC (History)
8 users (show)

See Also:


Attachments

Description Philip Jägenstedt 2014-11-25 23:27:16 UTC
This is supported by IE, Blink and WebKit, but not Gecko.

Usage in Chrome is around 4%:
https://www.chromestatus.com/metrics/feature/timeline/popularity/127

It's not readonly like characterSet, but we can probably remove the setter:
https://www.chromestatus.com/metrics/feature/timeline/popularity/427

So, make charset an alias of characterSet? It's very unlikely that it can be removed in Blink, since at this level of usage it's bound to show up on code paths that Gecko doesn't take for some reason or another.
Comment 1 Arkadiusz Michalski (Spirit) 2014-11-26 00:05:40 UTC
I would add one another thing if you have already started this bug.

Document.characterSet should retrun encoding's name but lowercase (what we have in table on encoding spec) or uppercase? Ask becasue I noticed different behavior in browsers.
https://encoding.spec.whatwg.org/#names-and-labels

Some results returned by various commands:

Document.characterSet
Firefox UTF-8
Chrome UTF-8
IE utf-8

Document.inputEncoding (DOM Level 3)
Firefox UTF-8
Chrome UTF-8
IE UTF-8

Document.charset (not standard)
Chrome UTF-8
IE utf-8

Document.characterSet (not standard)
Chrome ISO-8859-2
IE windows-1250

TextEncoder.encoding and TextDecoder.encoding
Firefox utf-8
Chrome utf-8
Comment 2 Arkadiusz Michalski (Spirit) 2014-11-26 00:16:55 UTC
> Document.characterSet (not standard)
> Chrome ISO-8859-2
> IE windows-1250
> 

Here is Document.defaultCharset (not Document.characterSet).
Comment 3 Anne 2014-11-26 08:13:52 UTC
Adding some Mozillians who might have opinions on adding an alias.

They should return the names in lowercase per the Encoding Standard. If different casing needs to be considered (note that browsers do not consistently use uppercase or lowercase today) we'd need to address that through a "display name" field in the Encoding Standard or some such.
Comment 4 Philip Jägenstedt 2014-11-26 13:34:51 UTC
I agree that we should try to return lowercase string, but that's orthogonal to this bug. charset is already an alias of characterSet in Blink, and any changes would apply to both.
Comment 5 Anne 2014-11-26 13:38:00 UTC
Well not completely right, or do both have a setter in Blink? Should we wait with adding an alias until the setter has been removed?
Comment 6 Henri Sivonen 2014-11-26 13:47:10 UTC
What does the setter do?

Is it known that if the property sniffs as existing, sites won't try to use the setter (i.e. having it as getter-only would be safe)?

(In reply to Anne from comment #3)
> If
> different casing needs to be considered (note that browsers do not
> consistently use uppercase or lowercase today)

Didn't WebKit make a specific effort to be consistent with Gecko's (rather arbitrary) casing? Have you researched why the WebKit developers made the effort to be case-consistent with Gecko?
Comment 7 Ehsan Akhgari [:ehsan] 2014-11-26 13:49:37 UTC
About adding a getter alias, I'm not sure what that will buy us for Gecko, since it is clearly not required for web compat for content that we're handling (at least I have never seen anyone ask for it, or any major website being broken in Gecko because we don't support it.)

About adding a setter, I'm not sure if I understand what the semantics would be.  In fact, I can't think of a use case for dynamically changing the charset of a document.
Comment 8 Anne 2014-11-26 14:23:41 UTC
(In reply to Henri Sivonen from comment #6)
> Didn't WebKit make a specific effort to be consistent with Gecko's (rather
> arbitrary) casing? Have you researched why the WebKit developers made the
> effort to be case-consistent with Gecko?

WebKit did? I'm not aware of that. I remember that what I found was inconsistent across user agents. From https://bugs.webkit.org/buglist.cgi?query_format=specific&order=relevance+desc&bug_status=__all__&product=&content=characterset I cannot find anything that supports what you suggest.
Comment 9 Philip Jägenstedt 2014-11-26 14:40:08 UTC
(In reply to Anne from comment #5)
> Well not completely right, or do both have a setter in Blink?

Only charset has a setter.

(In reply to Henri Sivonen from comment #6)
> What does the setter do?

It's propagated to a TextResourceDecoder where it looks like it will prevent further checks for <meta charset>, but I've been unable to produce a simple test case where it has any observable effect. I'm betting on removal, in which case it doesn't matter.

> Is it known that if the property sniffs as existing, sites won't try to use
> the setter (i.e. having it as getter-only would be safe)?

All I know is that the usage of the setter is in the range where it's plausible that removal would work, currently ~0.01% of page views. In my experience, only actually attempting removal will tell you if it's safe or not.
Comment 10 Henri Sivonen 2014-11-26 14:45:24 UTC
(In reply to Anne from comment #8)
> (In reply to Henri Sivonen from comment #6)
> > Didn't WebKit make a specific effort to be consistent with Gecko's (rather
> > arbitrary) casing? Have you researched why the WebKit developers made the
> > effort to be case-consistent with Gecko?
> 
> WebKit did? I'm not aware of that. I remember that what I found was
> inconsistent across user agents.

Maybe they didn't. Still, the case is remarkably consistent across WebKit and Gecko. I quick look suggests that WebKit follows IANA casing and Gecko follows IANA casing except for gbk and gb18030 (which are upper case in IANA & WebKit). So maybe WebKit didn't copy Gecko but both WebKit and Gecko used IANA casing, except Gecko somehow failed to do that for gbk and gb18030.
Comment 11 Philip Jägenstedt 2014-11-26 14:49:37 UTC
As for incentives, the status quo for many years has been that Gecko has no incentive to add Document.charset, and IE/WebKit/Blink have no incentive to remove it. The result is a small but ever-present opportunity for writing non-portable code...

In this case, the quickest path to interop appears to be for Blink to remove the setter and for the spec and Gecko to add the getter. Other ideas welcome :)
Comment 12 Masatoshi Kimura 2014-11-26 15:10:35 UTC
document.charset was once spec'ed then removed. Why is it going to added once again? Because WebKit refused to remove it? Because everyone except Gecko has the support? (It is basically what I said in Gecko bug 647621 comment #0.)
Comment 13 Philip Jägenstedt 2014-11-26 15:32:05 UTC
Masatoshi, do you have another proposal for how to reach agreement between the spec and browsers?
Comment 14 Ehsan Akhgari [:ehsan] 2014-11-26 15:59:19 UTC
I'm not necessarily opposed to Gecko implementing the getter, but I would like to know what we will gain from that (in addition to comment 11, of course.)  Specifically, do we have any data on how this property is used on the 4% of pages viewed in Blink based browsers?  If we have a way to obtain more info on the actual usage of this property on the Web, that may help guide us to decide whether it makes more sense for Gecko to implement or for Blink/IE to drop.
Comment 15 Philip Jägenstedt 2014-11-26 19:10:47 UTC
The 4% is any access to Document.charset, notably including code like (document.charset || document.characterSet) that would work without it, which is likely a large majority of cases.

Answering questions like these using Blink's UseCounter system is difficult, one would have to collect a representative sample of pages that access document.charset and analyze them manually.

If someone has access to a large corpus of Web content, a grep for pages that say "document.charset" without "document.characterSet" in the vicinity might be illuminating.
Comment 16 Anne 2014-11-27 09:38:03 UTC
I added compatibility names in https://github.com/whatwg/dom/commit/03e170351f095e4fe749e0259a3aafc0cbb49c91

I want to wait with adding .charset until at least the setter has disappeared. Removing that seems like a win for everyone. Then we can evaluate again.
Comment 17 Philip Jägenstedt 2014-11-27 09:52:36 UTC
OK, I'll try to get rid of the Document.charset setter and then report back here.
Comment 18 Philip Jägenstedt 2015-05-20 14:33:49 UTC
I've now removed the setter from Blink, let's hope it sticks:
https://code.google.com/p/chromium/issues/detail?id=438392#c4
Comment 19 Philip Jägenstedt 2015-09-21 09:27:25 UTC
The removal of the setter appears to have worked out. It was gone in M45, which reached Chrome stable on September 1. Now that Document.charset is an alias of Document.characterSet, can we spec it?