This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 11124 - consider reducing verbosity when talking about code points
Summary: consider reducing verbosity when talking about code points
Status: RESOLVED FIXED
Alias: None
Product: HTML WG
Classification: Unclassified
Component: LC1 HTML5 spec (show other bugs)
Version: unspecified
Hardware: PC Windows NT
: P2 trivial
Target Milestone: ---
Assignee: Maciej Stachowiak
QA Contact: HTML WG Bugzilla archive list
URL:
Whiteboard:
Keywords: WGDecision
Depends on:
Blocks:
 
Reported: 2010-10-22 15:53 UTC by Julian Reschke
Modified: 2012-08-27 19:36 UTC (History)
8 users (show)

See Also:


Attachments

Description Julian Reschke 2010-10-22 15:53:15 UTC
Example:

"a valid non-negative integer, followed by a U+003B SEMICOLON character (;), followed by one or more space characters, followed by a substring that is an ASCII case-insensitive match for the string "URL", followed by a U+003D EQUALS SIGN character (=), followed by a valid URL."

It's unclear to me why we can't just call the characters my name, or mention them verbatim. The current style of prose makes the spec hard to read.

Sets and sequences of code points that appear frequently could be assigned names upfront, see, for instance, <http://greenbytes.de/tech/webdav/rfc5234.html#rfc.section.B.1>.
Comment 1 Julian Reschke 2010-10-22 15:54:06 UTC
See also conversation around <http://krijnhoetmer.nl/irc-logs/whatwg/20101018#l-184>.
Comment 2 Anne 2010-10-22 16:16:14 UTC
For CSSOM whenever it is not a simple string of more than one character I use "<code>CHARACTER</code>" (U+XXXX). E.g. "<code>(</code>" (U+0028). For certain characters I use something slightly different, e.g. space (U+0020).

I guess at some point we should set up some kind of style guide.
Comment 3 Henri Sivonen 2010-10-22 18:03:59 UTC
(In reply to comment #0)
> Sets and sequences of code points that appear frequently could be assigned
> names upfront, see, for instance,
> <http://greenbytes.de/tech/webdav/rfc5234.html#rfc.section.B.1>.

I disapprove of the RFC style of giving ad hoc names to Unicode characters. I object to using that style in HTML5.

I want to see the Unicode code point in the U+hhhh notation and the literal character inline in the spec prose without indirection. I don't care about the UPPERCASE UNICODE NAME, but what's in the spec now works for me.
Comment 4 Aryeh Gregor 2010-10-22 18:25:45 UTC
I think the UPPERCASE NAME is excessive.  Just give the character itself and the code point, like

"""
a valid non-negative integer, followed by ";" (U+003B), followed by one or more space characters, followed by a substring that is an ASCII case-insensitive match for the string "URL", followed by "=" (U+003D), followed by a valid URL.
"""

or maybe

"""
a valid non-negative integer, followed by U+003B (;), followed by one or more space characters, followed by a substring that is an ASCII case-insensitive match for the string "URL", followed by U+003D (=), followed by a valid URL.
"""

or whatever.  It's more concise and easier to read, but no less precise.  You should only need to give a name when the character is whitespace or a combining diacritic or something, and in that case you should just use a simple description like "space", "tab", "CR", "LF", not the full Unicode name.

(In reply to comment #3)
> I disapprove of the RFC style of giving ad hoc names to Unicode characters. I
> object to using that style in HTML5.

It's already used in some places, e.g., "space characters".
Comment 5 Julian Reschke 2010-10-22 19:09:50 UTC
(In reply to comment #3)
> (In reply to comment #0)
> > Sets and sequences of code points that appear frequently could be assigned
> > names upfront, see, for instance,
> > <http://greenbytes.de/tech/webdav/rfc5234.html#rfc.section.B.1>.
> 
> I disapprove of the RFC style of giving ad hoc names to Unicode characters. I
> object to using that style in HTML5.

This is not "the" RFC style. There is no single RFC style. It's just an example.

Also, I should have mentioned that I'm mainly interested in *ASCII* characters; I have no problem with the spec keeping the level of verbosity for characters outside the ASCII range.

Finally, calling Carriage Return and Line Feed "CR" and "LF" may be "ad hoc", but it's certainly something readers understand.

> I want to see the Unicode code point in the U+hhhh notation and the literal
> character inline in the spec prose without indirection. I don't care about the
> UPPERCASE UNICODE NAME, but what's in the spec now works for me.

I think getting rid of the UPPERCASE UNICODE NAME (move it into title attribute, or make it a link?) would be a vast improvement.

That being said, I'd like to understand why you think it's a bad idea to define once for all a few sets, such as DIGIT, ALPHA or HEXDIGIT. Unless I'm mistaken the spec already does this for other things, such as whitespace characters.
Comment 6 Ian 'Hixie' Hickson 2010-12-27 23:37:12 UTC
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Rejected
Change Description: no spec change
Rationale: I like it the way it is.
Comment 7 Sam Ruby 2011-01-05 18:33:38 UTC
http://www.w3.org/html/wg/tracker/issues/150
Comment 9 Michael[tm] Smith 2011-08-04 05:05:27 UTC
mass-moved component to LC1
Comment 10 Michael[tm] Smith 2011-11-20 18:01:04 UTC
(In reply to comment #8)
> Decision:
> http://lists.w3.org/Archives/Public/public-html/2011Jul/0213.html
> 
> Change Proposal:
> http://lists.w3.org/Archives/Public/public-html/2011Feb/0120.html

Something went wrong here.

This bug rightly should never have been escalated to an issue to begin with.

Do we really want to be using to micro-manage editorial choices about which there are a variety of not-particularly-strong opinions and that therefore amount purely to judgement calls that properly are best left up to editorial discretion -- and that have no effect either way on actual implementation conformance criteria?

This change is something that would require a significant amount of manual work on the part of an editor to change. And to what end? How does this rank as a priority with other actually important that need to be done?
Comment 11 Ms2ger 2011-11-20 20:21:56 UTC
(In reply to comment #10)
> (In reply to comment #8)
> > Decision:
> > http://lists.w3.org/Archives/Public/public-html/2011Jul/0213.html
> > 
> > Change Proposal:
> > http://lists.w3.org/Archives/Public/public-html/2011Feb/0120.html
> 
> Something went wrong here.
> 
> This bug rightly should never have been escalated to an issue to begin with.

Indeed.

> Do we really want to be using to micro-manage editorial choices about which
> there are a variety of not-particularly-strong opinions and that therefore
> amount purely to judgement calls that properly are best left up to editorial
> discretion -- and that have no effect either way on actual implementation
> conformance criteria?

Yes, this is exactly what the WG Chairs would like to do, AFAICT from their decisions. Unfortunately, you're paid to deal with their stupidity.
Comment 12 Michael[tm] Smith 2011-11-20 23:58:00 UTC
(In reply to comment #11)
> Yes, this is exactly what the WG Chairs would like to do, AFAICT from their
> decisions. Unfortunately, you're paid to deal with their stupidity.

No.

First off, I think we can all agree that we need to remain civil and respectful in discussions here in bugzilla, just as we do on our mailing lists. Doing otherwise is just an unproductive use of everybody's time. So let's please not do that.

We're all of use here working together to solve problems. So let's try to do that and not resort to incivilities and name calling. Please.

So, that said, I regret posting my most recent comment. That comment doesn't reflect what I really think. The only excuse I can make for posting it is that I was at the time (very late on Sunday evening...) in the midst of going through dozens of unresolved LC bugs to figure out what actions needed to be taken to move them further toward resolution. I should have waited before I posted that comment. But I didn't, so here we are now. Going forward, I'll make an effort to think things through more carefully before posting comments.

So let's please leave it at that and get back to working together on getting stuff done in the best ways we can.
Comment 13 Ian 'Hixie' Hickson 2012-02-14 23:33:13 UTC
BTW if anyone can write a self-contained perl or python script that can be run against the file here:

   http://dev.w3.org/html5/spec/Overview.html

...that applies this decision, that would make my life a lot easier. (If you are interested in doing that let me know and I can help you — there might be things I can do to make it simpler, e.g. applying it at a different point in the pipeline.)
Comment 14 Michael[tm] Smith 2012-02-15 05:21:10 UTC
(In reply to comment #13)
> BTW if anyone can write a self-contained perl or python script that can be run
> against the file here:
> 
>    http://dev.w3.org/html5/spec/Overview.html
> 
> ...that applies this decision, that would make my life a lot easier. (If you
> are interested in doing that let me know and I can help you — there might be
> things I can do to make it simpler, e.g. applying it at a different point in
> the pipeline.)

Julian?
Comment 15 Julian Reschke 2012-02-15 07:42:05 UTC
(In reply to comment #14)
> (In reply to comment #13)
> > BTW if anyone can write a self-contained perl or python script that can be run
> > against the file here:
> > 
> >    http://dev.w3.org/html5/spec/Overview.html
> > 
> > ...that applies this decision, that would make my life a lot easier. (If you
> > are interested in doing that let me know and I can help you — there might be
> > things I can do to make it simpler, e.g. applying it at a different point in
> > the pipeline.)
> 
> Julian?

If somebody can supply a sample that can be extended, I can give it a try (my choice would be XSLT2, but it would affect source formatting....)
Comment 16 Michael[tm] Smith 2012-02-15 12:38:59 UTC
(In reply to comment #13)
> BTW if anyone can write a self-contained perl or python script that can be run
> against the file here:
> 
>    http://dev.w3.org/html5/spec/Overview.html

perl -pe 'undef $/; s/(U\+[A-F0-9]{4})\s[A-Z\s-]+(\scharacter(s)?)?\s\((.{1,3})\)/"$4" \($1\)$2/g' Overview.html
Comment 17 Michael[tm] Smith 2012-02-20 16:36:30 UTC
(In reply to comment #13)
> BTW if anyone can write a self-contained perl or python script that can be run
> against the file here:
> 
>    http://dev.w3.org/html5/spec/Overview.html
> 
> ...that applies this decision, that would make my life a lot easier. (If you
> are interested in doing that let me know and I can help you — there might be
> things I can do to make it simpler, e.g. applying it at a different point in
> the pipeline.)

I have what I think is a complete script that attempts to implement both parts of the change proposal and that seems to work as expected:

http://people.w3.org/mike/fixes/bs.pl
Comment 18 Julian Reschke 2012-02-20 22:04:28 UTC
(In reply to comment #17)
> (In reply to comment #13)
> > BTW if anyone can write a self-contained perl or python script that can be run
> > against the file here:
> > 
> >    http://dev.w3.org/html5/spec/Overview.html
> > 
> > ...that applies this decision, that would make my life a lot easier. (If you
> > are interested in doing that let me know and I can help you — there might be
> > things I can do to make it simpler, e.g. applying it at a different point in
> > the pipeline.)
> 
> I have what I think is a complete script that attempts to implement both parts
> of the change proposal and that seems to work as expected:
> 
> http://people.w3.org/mike/fixes/bs.pl

Mike, thanks A LOT for working on this; I think it's a great starting point.

Maybe, to move forward, we can split this into two subtasks; definining the character classes and using them, and then reduce the remaining verbosity? I *believe* the first one should be less controversial.

It also seems that applying the patch may require some grammar post-tuning; I'm happy to help with that once we agree on how to proceed.
Comment 19 Michael[tm] Smith 2012-02-21 08:59:25 UTC
I found some problems in the output from script and have updated the script to fix them.

http://people.w3.org/mike/fixes/bs.pl
Comment 20 Simon Pieters 2012-03-15 10:14:27 UTC
This seems to have some problems in http://dev.w3.org/html5/spec/the-script-element.html#restrictions-for-contents-of-script-elements
Comment 21 Maciej Stachowiak 2012-03-15 16:14:37 UTC
Would be appropriate to mark this resolved and then file bugs on specific cases where the test is failing?
Comment 22 Sam Ruby 2012-08-15 15:19:37 UTC
(In reply to comment #19)
> I found some problems in the output from script and have updated the script to
> fix them.
> 
> http://people.w3.org/mike/fixes/bs.pl

Ported to Python:

https://github.com/w3c/html/commit/1961b9f61501f0fc3801e0ac20a38ab43b9ef0fe
Comment 23 Ms2ger 2012-08-15 17:13:26 UTC
Filter on [Idon'tcareaboutHTMLWGbugspam].
Comment 24 Sam Ruby 2012-08-16 13:08:58 UTC
Output from spec splitter after running this script:

warning: can't find target for #ascii-digits
warning: can't find target for #conforming-documents
warning: can't find target for #lowercase-ascii-letters
warning: can't find target for #uppercase-ascii-letters
Comment 25 Sam Ruby 2012-08-26 15:17:14 UTC
Unless we hear otherwise, we are going to assume that this change is complete, and that further requests will come in the form of new bugs:

http://lists.w3.org/Archives/Public/public-html/2012Aug/0398.html
http://lists.w3.org/Archives/Public/public-html/2012Aug/0404.html
Comment 26 Edward O'Connor 2012-08-27 19:36:15 UTC
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are
satisfied with this response, please change the state of this bug to CLOSED. If
you have additional information and would like the Editor to reconsider, please
reopen this bug. If you would like to escalate the issue to the full HTML
Working Group, please add the TrackerRequest keyword to this bug, and suggest
title and text for the Tracker Issue; or you may create a Tracker Issue
yourself, if you are able to do so. For more details, see this document:

   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Accepted
Change Description: https://github.com/w3c/html/commit/1961b9f61501f0fc3801e0ac20a38ab43b9ef0fe and https://github.com/w3c/html/commit/c68e86b4736f4e7a11c4209a06976ec0618529bd
Rationale: Resolving per Sam's comment.