This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 26655 - Support mistakenly `utf-`-prefixed encodings seen in the wild
Summary: Support mistakenly `utf-`-prefixed encodings seen in the wild
Status: RESOLVED WONTFIX
Alias: None
Product: WHATWG
Classification: Unclassified
Component: Encoding (show other bugs)
Version: unspecified
Hardware: PC All
: P2 normal
Target Milestone: Unsorted
Assignee: Anne
QA Contact: sideshowbarker+encodingspec
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-08-25 10:38 UTC by Mathias Bynens
Modified: 2014-08-26 07:26 UTC (History)
5 users (show)

See Also:


Attachments

Description Mathias Bynens 2014-08-25 10:38:46 UTC
https://github.com/ForbesLindesay/legacy-encoding/issues/1#issuecomment-53221336 links to Web content that uses weird encodings like `utf-8859-1`. At first glance, it looks like that should be `iso-8859-1` which is a label for `windows-1252`. Maybe such names should be added as labels?
Comment 1 Anne 2014-08-25 10:43:17 UTC
See bug 16773. Without conclusive data it would be dangerous to just add labels. E.g. from experience we know that euc_jp cannot be treated as euc-jp (which is why we do not follow UTS22). I'm inclined to mark this WONTFIX.
Comment 2 Anne 2014-08-25 10:44:08 UTC
Also, that GitHub repository seems to confuse encodings with labels of encodings.
Comment 3 Addison Phillips 2014-08-25 16:00:11 UTC
The list in the github appears to be a list of encoding labels found by crawling e.g. the W3C list archives. The existence of bad encoding labels in that list does not imply that user agents (or anyone else) interpret them properly or that they *should* interpret them. So I concur that this should be WONTFIX. The best way to get broken implementations fixed is for users to call up and complain that it doesn't work.
Comment 4 John Cowan 2014-08-25 16:58:41 UTC
Addison writes:  "The best way to get broken implementations fixed is for users to call up and complain that it doesn't work."

By "implementations" do you mean "web pages"?  Because, if so, that approach doesn't work.  If you mean "browsers", I doubt that browsers will add one-off support for random labels like these.
Comment 5 Addison Phillips 2014-08-25 17:04:33 UTC
For web pages the best fix is to change the page to use the proper label. Making the broken label "work" is a bad idea.

In this case, my mind (first cup of coffee) fixated on the email aspect of the scraped data. If your mailer generates a bad encoding label, then you probably should call up and complain.
Comment 6 John Cowan 2014-08-25 17:24:45 UTC
I agree about mailers.

Reporting broken web pages is itself a broken process: it horribly fails to scale, and frequently there is no one to complain to.  That's why we have to tolerate broken markup and have even needed to make a broken-markup standard.  Broken encoding labels are just another part of that.

It's all very well to say "Move everything to UTF-8", but if it were that easy, XHTML would have been a smashing success.
Comment 7 Anne 2014-08-26 07:26:40 UTC
Changing encodings is much easier than completely revamping development practices, but this bug is about adding labels and per comment 1 that does not seem like a good idea without conclusive data.