This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
https://dom.spec.whatwg.org/#dom-document-inputencoding Usage is currently around 0.4%: https://www.chromestatus.com/metrics/feature/timeline/popularity/114 AFAICT no browser has removed it yet. In Blink/WebKit it's just an alias of characterSet, while in Gecko it returns null for in-memory document and is otherwise an alias of characterSet. Making it an alias seems simplest.
Making it an alias is also an explicit violation of the previous spec for it, right?
Yeah, see http://www.w3.org/TR/DOM-Level-3-Core/core.html#Document3-inputEncoding
Yes, but do we care, given that a plain alias seems Web compatible? In IE11, document.implementation.createHTMLDocument('').inputEncoding is "UTF-8" while .characterSet is "utf-8". In Chrome both are null. In Firefox inputEncoding is null while characterSet is "UTF-8". In other words, the in-memory case doesn't have great interop right now. (For documents served over HTTP it's all "UTF-8" except characterSet in IE11 which is "utf-8".)
I don't care. I'm happy for them all to be "utf-8" (assuming no other encoding was used).
You mean "UTF-8", since that's the one thing people more or less agree on? I can probably deal with the alias thing, but just wanted to point out that this is an explicit behavior change and an explicit spec change. I do fully expect to get some compat fallout from it, but not much.
https://github.com/whatwg/dom/commit/03e170351f095e4fe749e0259a3aafc0cbb49c91
Why not just uppercase the return value? Are there any common cases not listed to worry about?
E.g. windows-1252 is not uppercased.
Ugh, that's unfortunate. It seems like Chromium already returns "UTF-8" but "windows-1252", but I'm sure there are discrepancies. Are there other Web-facing APIs that are supposed to return lowercase encoding names? Do they actually in shipping implementations?
TextDecoder does, yes. If we are to expose these elsewhere I would hope we align with that. Having to guess the case of the encoding name is no fun.
In Blink, the TextDecoder.encoding getter lowercases the returned string, so somewhere internally the canonical names already differ by case. I guess it doesn't matter how the specs phrase this as long as the observable behavior is the same.
In Gecko the canonical encoding name for UTF-8 is "UTF-8". The canonical encoding name for windows-1252 is "windows-1252". It sounds like Blink does the same. What do other UAs do? Seems to me like ideally the canonical names in the encoding standard would match UA behavior to the extent it's interoperable. inputEncoding should just return the canonical name, imo. We shouldn't be adding stupid complexity and special casing here if we can avoid it. Fwiw, what TextEncoder/TextDecoder do in Gecko is to just always lowercase our internal canonical name before returning. :( > Having to guess the case of the encoding name is no fun. I agree, but neither is breaking compat. :(
(In reply to Boris Zbarsky from comment #12) > Seems to me like ideally the canonical names in the encoding standard would > match UA behavior to the extent it's interoperable. It's not interoperable for gbk/gb18030 (I aligned with Blink, which has uppercase). Not sure what IE does. > inputEncoding should just return the canonical name, imo. We shouldn't be > adding stupid complexity and special casing here if we can avoid it. I think we shouldn't add silly casing to the Encoding Standard as they might leak elsewhere. That's why I chose this setup.
I think having different parts of the platform have different "canonical" case for encodings is just bizarre beyond belief, personally.
Given that we've already shipped Document.characterSet as "UTF-8" and TextDecoder.encoding as "utf-8", is there a way out of this bizarre situation? 1. Let Document.characterSet and aliases return lowercase, like IE. 2. Make TextDecoder.encoding match characterSet's variable case. Option 1 seems slightly better long-term, but also far more likely to break stuff.
(In reply to Philip Jägenstedt from comment #15) > 1. Let Document.characterSet and aliases return lowercase, like IE. But IE for inputEncoding always return uppercasee (for all names, like UTF-8, BIG5, GB18030, WINDOWS-1250). So aliases to characterSet will never be correct (if we take into account size of characters). In other site returned value for encoding's name by browser are realy inconsistent. I don't think anyone really create a code without prior conversion to uppercase or lowercase when it is used in conditions. Changing to always returning lowercase letters, in all cases, really break compatibility? Some interesting result: <meta charset="big5"> Firefox document.characterSet: Big5 document.charset: undefined document.inputEncoding: Big5 Chrome document.characterSet: Big5 document.charset: Big5 document.inputEncoding: Big5 IE document.characterSet: big5 document.charset: big5 document.inputEncoding: BIG5 <meta charset="uff-8"> Firefox document.characterSet: UTF-8 document.charset: undefined document.inputEncoding: UTF-8 Chrome document.characterSet: UTF-8 document.charset: UTF-8 document.inputEncoding: UTF-8 IE document.characterSet: utf-8 document.charset: utf-8 document.inputEncoding: UTF-8 <meta charset="gbk"> Firefox document.characterSet: gbk document.charset: undefined document.inputEncoding: gbk Chrome document.characterSet: GBK document.charset: GBK document.inputEncoding: GBK IE document.characterSet: gb2312 document.charset: gb2312 document.inputEncoding: GB2312 <meta charset="gb18030"> Firefox document.characterSet: gb18030 document.charset: undefined document.inputEncoding: gb18030 Chrome document.characterSet: gb18030 document.charset: gb18030 document.inputEncoding: gb18030 IE document.characterSet: GB18030 document.charset: GB18030 document.inputEncoding: GB18030 If the various APIs can return different size of encoding names then I think that minimum is add to the Encoding spec such information (somewhere near the table which lists those names).