This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 21617 - Do not limit e-mail address labels to 255 characters
Summary: Do not limit e-mail address labels to 255 characters
Status: RESOLVED FIXED
Alias: None
Product: WHATWG
Classification: Unclassified
Component: HTML (show other bugs)
Version: unspecified
Hardware: All All
: P2 normal
Target Milestone: Unsorted
Assignee: Ian 'Hixie' Hickson
QA Contact: contributor
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 19117
  Show dependency treegraph
 
Reported: 2013-04-08 11:52 UTC by Mounir Lamouri
Modified: 2013-06-16 10:21 UTC (History)
5 users (show)

See Also:


Attachments

Description Mounir Lamouri 2013-04-08 11:52:38 UTC
With the changes made in bug 19117, it was decided to make each label (ie. sub-domain) of an email address to be 255 characters max [1]. This is based on rfc 1123 [2] that says that a software MUST handle 63 characters domain and SHOULD handle 255 characters length. I believe that it is better to follow the MUST rule here. Otherwise, the 255 characters limit is not going to help in any way.

Also, the specification could use the punydecoding algorithm [3] to make that requirement. The specification currently says that the UA *may* puny-decode the value but this is a behaviour that is quite expected and if the UA does that properly, the label can't be longer than 63 characters as far as I understand it, 63 being the puny-decode limitation.

[1] http://www.whatwg.org/specs/web-apps/current-work/multipage/states-of-the-type-attribute.html#valid-e-mail-address
[2] https://tools.ietf.org/html/rfc1123#section-2
[3] https://tools.ietf.org/html/rfc3492
Comment 1 Anne 2013-04-08 13:26:16 UTC
1) You cannot just invoke Punycode without applying the other rules that apply to domain names. So you need to invoke some algorithm we'll define in the URL Standard once http://annevankesteren.nl/2012/11/idna-hell is sorted.

2) The 255 character limit will be enforced by that algorithm and applies to the domain name, not its individual labels, those have a 63 character limit.

3) For the non-domain name part of the email address some other conversion algorithm might need to apply. Not entirely sure what they ended up with.
Comment 2 Ian 'Hixie' Hickson 2013-04-22 22:37:49 UTC
Anne: Do you have an ETA for the URL spec's host parsing stuff? (I can wait if it's soonish.)
Comment 3 Anne 2013-04-23 16:27:18 UTC
It depends on what happens with IDNA. I could define the abstract bits around it I suppose. My idea was for it to return an IPv6 address, list of labels, or failure. However, maybe you want a higher level of abstraction that parses and serializes or something?
Comment 4 Ian 'Hixie' Hickson 2013-04-24 00:00:10 UTC
I dunno, for this bug I don't really know what I need exactly, if anything.
Comment 5 Marcus Bointon 2013-05-16 11:57:53 UTC
I just commented in bug 19117 (which already says what to do with this issue), but to be clear, the ABNF comment should be changed from:

label         = let-dig [ [ ldh-str ] let-dig ]  ; limited to a length of 255 characters by RFC 1123 section 2.1

to

label         = let-dig [ [ ldh-str ] let-dig ]  ; limited to a length of 63 characters by RFC 1123 section 2.1

And the sample regex should be changed from:

/^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,253}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,253}[a-zA-Z0-9])?)*$/

to

/^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/

As Anne says, I don't think that punycode has any bearing on this since it's just a layer on top of RFC1123, not a replacement. We could define a separate pattern to match IDNA addresses too, but it would be fairly different to this one and would have to wait for the confusion to settle down.

Side note: I commented in the other ticket about a problem with this regex. It was down to a bug in PHP's PCRE implementation: the / within the initial character class should not need escaping, but it does in PHP.
Comment 6 Ian 'Hixie' Hickson 2013-06-10 21:11:29 UTC
Ok given that the URL spec isn't ready yet, I'm ignoring the stuff about punycoding for now.

I don't understand why we should not follow the "SHOULD handle host names of up to 255 characters" conformance requirement from RFC 1123. Can you elaborate?
Comment 7 Marcus Bointon 2013-06-10 22:18:55 UTC
It's very simple! As Anne said, the hostname is limited to 255 chars overall, but the labels that make it up are inidividually limited to 63 chars. This site's host name for example has three labels: 'www', 'w3' and 'org'; each of those may not exceed 63 chars, but the whole lot together, including the separating dots (which is why it's 63 and not 64 chars) is limited to 255.

The problem is that the length limit mentioned in the description ABNF for the label was for the host name, not the label.
Comment 8 Ian 'Hixie' Hickson 2013-06-14 18:49:54 UTC
Oh, I see my confusion. RFC 1123 uses "host name" to mean the fully-qualified host name, and I thought it just meant the local name. My bad.
Comment 9 contributor 2013-06-14 18:49:59 UTC
Checked in as WHATWG revision r7978.
Check-in comment: Limit labels in e-mail addresses to 63 characters, not 255.
http://html5.org/tools/web-apps-tracker?from=7977&to=7978