W3C

Internationalization Core Working Group Teleconference

22 Aug 2012

Agenda

See also: IRC log

Attendees

Present
Addison_Phillips, matial, Richard, fsasaki, Norbert, koji, andrewc
Regrets
Chair
Addison Phillips
Scribe
Addison Phillips

Contents


Action Items

Filed: https://www.w3.org/Bugs/Public/show_bug.cgi?id=18646

close ACTION-142

<trackbot> ACTION-142 File a ticket on HTML5 asking for accommodation of IANA time zone IDs closed

Prepare some examples of ITS2NIF showing NFC issue for consideration next week

felix: probably take a little while

Info Share

richard: updated templates/boilerplates for articles
... now show live results from test framework database
... still doing some tweaks for maintaintability
... if look at test page

<r12a> http://www.w3.org/International/tests/

richard: the list is a pain to maintain
... so removing links to list pages
... because results page does that work

<r12a> http://www.w3.org/International/tests/html-css/character-encoding/results-basics

richard: above is link to results page

Charter

richard: end date for charter renewal is close
... not a lot of response
... 9 currently
... would like more
... anyone on call with an AC Rep get them to do so

TURTLE (yes, again, but we finish this time)

http://lists.w3.org/Archives/Public/www-international/2012JulSep/0072.html

<r12a> http://lists.w3.org/Archives/Public/www-international/2012JulSep/0076.html

http://www.w3.org/2012/08/15-i18n-minutes.html#item06

norbert: not actually another type to use for 6a. don't complain about it

addison: okay

>> 7. Section 2.6. These escapes are malformed or use a questionable syntax: >> >> The characters -, \uB7, \u300 to \u36F and \u203F to 2040 are permitted anywhere except the first character. > > + Use the U+XXXX or U+XXXXXX notation to refer to code points in the specification (rather than other escaped forms).

(also name the characters?)

addison: appears to be a "ppor man's" attempt to prevent including combining marks

>> 8. Section 3. The reference to #xA; is making some assumptions about line terminators too :-) > > + Don't make assumptions about line terminator characters. > > >> An example of two identical triples containing literal objects containing newlines, written in plain and long literal forms. Assumes that line feeds in this document are #xA. (example3.ttl): >>

norbert: should define line terms

>> 9. Section 5.1. The following might need attention: >> >> The media type of Turtle is text/turtle. The content encoding of Turtle content is always UTF-8. Charset parameters on the mime type are required until such time as the text/ media type tree permits UTF-8 to be sent without a charset parameter. See section B Internet Media Type, File Extension and Macintosh File Type for the media type registration form.

okay as is

>> 10. Section 6. Refers to TURTLE documents as being encoded as UTF-8. In practice, UTF-8 is a serialization. The actually document should just be "a sequence of Unicode characters" This probably needs to distinguish more clearly between processing and storage/transmission. For the former it's just a sequence of Unicode characters, for the latter it's UTF-8.

>> 11. Section 6.2. Says in part this: "continue to the end of line (marked by characters U+000D or U+000A)" which again makes assumptions about line terminators. Should there be a rule for line termination? Such as this? http://ecma-international.org/ecma-262/5.1/#sec-7.3

>> 12. Section 6.4. The \u (lowercase u) syntax allows: >> >> A Unicode codepoint in the range U+0 to U+FFFF inclusive corresponding to the value encoded by the four hexadecimal digits interpreted from most significant to least significant digit. >> >> This is probably wrong, given that the surrogate code points fall into this range. No mention is made of surrogate pair handling. And since there is a second form that can handle the complete Unicode charac

>> 13. Section 6.4 contains this Note: >> >> -- >> %-encoded sequences are in the character range for IRIs and are explicitly allowed in local names. These appear as a '%' followed by two hex characters and represent that same sequence of three characters. These sequences are not decoded during processing. A term written as <http://a.example/%66oo-bar> in Turtle designates the IRI http://a.example/%66oo-bar and not IRI http://a.example/foo-bar. A term writte

richard; why do you think they do that?

addison: let's ask

>> 14. Section 6.5 (Grammar) defines LANGTAG far more permissively than BCP 47 does--even in its obsolete forms, to wit: >> >> [144s] LANGTAG ::= '@' [a-zA-Z]+ ('-' [a-zA-Z0-9]+)* >> >> It would be better to define this at least in terms of BCP 47's "obs-language-tag" production: >> >> obs-language-tag = primary-subtag *( "-" subtag ) >> primary-subtag = 1*8ALPHA >> subtag = 1*8(ALPHA / DIGIT) Since the RDF spec refers

norbert: RDF already references LANGTAG in Section 2.1 of BCP 47
... so why not use it?

addison: okay to refer to full definition

>> 15. Same section. PN_CHARS_BASE erases various Unicode ranges without explanation. This appears to be an attempt to eliminate combining marks and the surrogates. This probably isn't how to do this?

http://www.w3.org/TR/2012/WD-turtle-20120710/#grammar-production-PN_CHARS_BASE

richard: do we have a list of this kind?
... use to refer to XML?

http://www.unicode.org/reports/tr31/#Default_Identifier_Syntax

norbert: what are they really trying to accomplish here? just identifiers? or something else?

addison: not sure if UTR31 is too restrictive??
... should we say "charmod-norm wrong"?

norbert: start with question: what are you trying to do here?

>> 16. Appendix B contains this note: >> >> Encoding considerations: >> The syntax of Turtle is expressed over code points in Unicode [UNICODE]. The encoding is always UTF-8 [UTF-8]. >> Unicode code points may also be expressed using an \uXXXX (U+0 to U+FFFF) or \UXXXXXXXX syntax (for U+10000 onwards) where X is a hexadecimal digit [0-9A-Fa-f]

>> 17. Appendix B contains security considerations, that reference UTR#36 (good). Should there also be reference to UTS#39??

http://www.unicode.org/reports/tr39/

>> 18. In "References", the Unicode reference is to Unicode 4.0, which is well out of date. The current version is 6.1, for example.

> W3C I18N Techniques: Developing specifications > Referencing the Unicode Standard > http://www.w3.org/International/techniques/developing-specs#unicoderef

norbert: but a specific version may be needed for implementation consistency, e.g. for identifier syntax

addison: but not restrict code point usage to a specific version

richard: note that above link doesn't go directly to charmod
... but to our techniques page

<scribe> ACTION: addison: move TURTLE comments to tracker and send the comments to the WG using the usual process [recorded in http://www.w3.org/2012/08/22-i18n-minutes.html#action01]

<trackbot> Created ACTION-146 - Move TURTLE comments to tracker and send the comments to the WG using the usual process [on Addison Phillips - due 2012-08-29].

richard: made a few changes in the process document, so take a look at it again

AOB?

<scribe> ACTION: addison: send TPAC registration reminder [recorded in http://www.w3.org/2012/08/22-i18n-minutes.html#action02]

<trackbot> Created ACTION-147 - Send TPAC registration reminder [on Addison Phillips - due 2012-08-29].

<matial> Bye

Summary of Action Items

[NEW] ACTION: addison: move TURTLE comments to tracker and send the comments to the WG using the usual process [recorded in http://www.w3.org/2012/08/22-i18n-minutes.html#action01]
[NEW] ACTION: addison: send TPAC registration reminder [recorded in http://www.w3.org/2012/08/22-i18n-minutes.html#action02]
 
[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.136 (CVS log)
$Date: 2012/09/20 15:11:43 $