Copyright © 2004 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply.
This document outlines the way in which the HTML Working Group addressed the comments submitted during the XHTML-Print Last Call Working Draft review period.
During the Last Call Working Draft review period for XHTML-Print a number of comments were received from both inside and outside of the W3C. This document summarizes those comments and describes the ways in which the comments were addressed by the HTML Working Group.
Note that the majority of this document is automatically generated from the Working Group's database of comments. As such, it may contain typographical or stylistic errors. If so, these are contained in the original submissions, and the HTML Working Group elected to not change these submissions.
This document is a product of the W3C's HTML Working Group. This document may be updated, replaced or rendered obsolete by other W3C documents at any time. It is inappropriate to use this document as reference material or to cite it as other than "work in progress". This document is work in progress and does not imply endorsement by the W3C membership.
This document has been produced as part of the W3C HTML Activity. The goals of the HTML Working Group (members only) are discussed in the HTML Working Group charter (members only).
Please send detailed comments on this document to www-html-editor@w3.org. We cannot guarantee a personal response, but we will try when it is appropriate. Public discussion on HTML features takes place on the mailing list www-html@w3.org.
A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR.
PROBLEM ID: 6492
STATE: Closed
RESOLUTION: Modify and Accept
USER POSITION: Agree
NOTES:
Fixed incorrect syntax of example
ORIGINAL MESSAGE:
From: Jun Fujisawa <fujisawa.jun@canon.co.jp>
From: Jun Fujisawa <fujisawa.jun@canon.co.jp>
To: www-html-editor@w3.org
Cc: www-html@w3.org, xp@pwg.org,
Jon Ferraiolo <jon.ferraiolo@adobe.com>
Subject: Incorrect example in Appendix B.3 of XHTML Print
Date: Fri, 25 Jul 2003 12:48:47 +0900
Message-Id: <p05111011bb4654080f6f@[172.23.45.13]>
X-Archived-At: http://www.w3.org/mid/p05111011bb4654080f6f@%5B172.23.45.13%5D
Hello HTML editors,
Here is a comment to the last call draft for XHTML Print.
At 6:28 PM +0200 03.7.24, Steven Pemberton wrote:
>XHTMLT-Print
>http://www.w3.org/MarkUp/Group/2003/WD-xhtml-print-20030723
Jon Ferraiolo of SVG WG found out that the example in Appendix
B.3 looks strange since the two instances of 'object' element have
the sample value for 'id' attribute in a single XML document.
<object declare="declare"
height="20 mm" width="20 mm"
type="image/jpeg"
id="image_1" >
</object>
. . . .
<object id="image_1"
data="data:image/jpeg;base64,aGh67Fghsapa0Hji7dfGSweTa . . . ">
</object>
I believe the example is not correct. Also, I think the choice of this
particular example is not appropriate because we don't need to use
the case for 'object' element with 'declare' attributes in order to
show how we can include inline image data in XHTML-Print by using
data URI scheme.
I would like to suggest to replace this example by simpler ones such
as the following:
<object height="20 mm" width="20 mm" type="image/jpeg"
data="data:image/jpeg;base64,aGh67Fghsapa0Hji7dfGSweTa . . . ">
Example Image
</object>
or
<img height="20 mm" width="20 mm" alt="Example Image"
src="data:image/jpeg;base64,aGh67Fghsapa0Hji7dfGSweTa . . . " />
--
Jun Fujisawa
<mailto:fujisawa.jun@canon.co.jp>
FOLLOWUP 1:
From: Masayasu Ishikawa <mimasa@w3.org> So, we are receiving Last Call comments even before publication. Great. Jim, do you think this is an easy-to-fix thing that we should just do it now (i.e. fix it and publish the Last Call WD, which should happen today), or leave it for now and fix later? -- Masayasu Ishikawa / mimasa@w3.org W3C - World Wide Web Consortium mimasa@w3.mag.keio.ac.jp wrote: > From: Jun Fujisawa <fujisawa.jun@canon.co.jp> > To: www-html-editor@w3.org > Cc: www-html@w3.org, xp@pwg.org, > Jon Ferraiolo <jon.ferraiolo@adobe.com> > Subject: Incorrect example in Appendix B.3 of XHTML Print > Date: Fri, 25 Jul 2003 12:48:47 +0900 > Message-Id: <p05111011bb4654080f6f@[172.23.45.13]> > X-Archived-At: http://www.w3.org/mid/p05111011bb4654080f6f@%5B172.23.45.13%5D > > Hello HTML editors, > > Here is a comment to the last call draft for XHTML Print. > > At 6:28 PM +0200 03.7.24, Steven Pemberton wrote: > >XHTMLT-Print > >http://www.w3.org/MarkUp/Group/2003/WD-xhtml-print-20030723 > > Jon Ferraiolo of SVG WG found out that the example in Appendix > B.3 looks strange since the two instances of 'object' element have > the sample value for 'id' attribute in a single XML document. > > <object declare="declare" > height="20 mm" width="20 mm" > type="image/jpeg" > id="image_1" > > </object> > > . . . . > > <object id="image_1" > data="data:image/jpeg;base64,aGh67Fghsapa0Hji7dfGSweTa . . . "> > </object> > > I believe the example is not correct. Also, I think the choice of this > particular example is not appropriate because we don't need to use > the case for 'object' element with 'declare' attributes in order to > show how we can include inline image data in XHTML-Print by using > data URI scheme. > > I would like to suggest to replace this example by simpler ones such > as the following: > > <object height="20 mm" width="20 mm" type="image/jpeg" > data="data:image/jpeg;base64,aGh67Fghsapa0Hji7dfGSweTa . . . "> > Example Image > </object> > > or > > <img height="20 mm" width="20 mm" alt="Example Image" > src="data:image/jpeg;base64,aGh67Fghsapa0Hji7dfGSweTa . . . " /> > > -- > Jun Fujisawa > <mailto:fujisawa.jun@canon.co.jp>
FOLLOWUP 2:
From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com> > From: don@lexmark.com [mailto:don@lexmark.com] > Sent: Friday, July 25, 2003 5:15 AM > To: Jun Fujisawa > Cc: xp@pwg.org; jim.bigelow@hp.com > Subject: Re: XP> Incorrect example in Appendix B.3 of XHTML Print > > > > Jun: > > The intent of this example was to show how an image can be > declared inline with the other XHTML while the actual data > for the image may come later. Neither of your two > alternatives separate the delaration of the image from the > actual data of the image. If the example provided is > incorrect, can you provide an example that achieves this separation? > > ********************************************** > Don Wright don@lexmark.com > > Chair, IEEE SA Standards Board > Member, IEEE-ISTO Board of Directors > f.wright@ieee.org / f.wright@computer.org > > Director, Alliances & Standards > Lexmark International > 740 New Circle Rd > Lexington, Ky 40550 > 859-825-4808 (phone) 603-963-8352 (fax) > ********************************************** >
FOLLOWUP 3:
From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>
> From: Jun Fujisawa [mailto:fujisawa.jun@canon.co.jp]
Sent: Monday, July 28, 2003 3:44 AM
To: don@lexmark.com
Cc: xp@pwg.org; jim.bigelow@hp.com
Subject: Re: XP> Incorrect example in Appendix B.3 of XHTML Print
Hello Don,
At 8:15 AM -0400 03.7.25, don@lexmark.com wrote:
>The intent of this example was to show how an image can be declared
>inline with the other XHTML while the actual data for the image may
>come later.
I don't understand the intent. I you want to get actual image
data later (not at the declaration), you can just use 'img'
or 'object' element without 'declare' attribute.
>If the example provided is incorrect, can
>you provide an example that achieves this separation?
The following example shows one type of separation, but I
don't think that meets your need.
<object id="image_1" declare="declare" type="image/jpeg"
data="data:image/jpeg;base64,aGh67Fghsapa0Hji7dfGSweTa .
. . "> </object>
. . . .
<object height="20 mm" width="20 mm"
data="#image_1" >
</object>
--
Jun Fujisawa
<mailto:fujisawa.jun@canon.co.jp>
FOLLOWUP 4:
From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>
From: ElliottBradshaw@oaktech.com [mailto:ElliottBradshaw@oaktech.com]
Sent: Friday, August 01, 2003 8:07 AM
To: Jun Fujisawa
Cc: don@lexmark.com; jim.bigelow@hp.com; owner-xp@pwg.org; xp@pwg.org
Subject: Re: XP> Incorrect example in Appendix B.3 of XHTML Print
I see two issues here, perhaps separable.
1. Use of inline data.
This can be accomplished by adding support for the data scheme.
Examples (from Fujisawa-san):
<object height="20 mm" width="20 mm" type="image/jpeg"
data="data:image/jpeg;base64,aGh67Fghsapa0Hji7dfGSweTa . . . ">
Example Image
</object>
or
<img height="20 mm" width="20 mm" alt="Example Image"
src="data:image/jpeg;base64,aGh67Fghsapa0Hji7dfGSweTa . . . " />
2. Separation of the data from the reference
This is where the declare attribute comes in. I went back and read
http://www.w3.org/TR/html4/struct/objects.html#h-13.3.4
It seems to me that the declare facility would let a client supply the
content for the object before its reference, not after. If the requirement
is that the client can send the image data at the end, I'm not sure that
HTML supports that.
If there is a requirement that the client can send the data first, then
refer to it, then an example (again, thanks Fujisawa) is:
<object id="image_1" declare="declare" type="image/jpeg"
data="data:image/jpeg;base64,aGh67Fghsapa0Hji7dfGSweTa . . . ">
</object>
. . . .
<object height="20 mm" width="20 mm"
data="#image_1" >
</object>
I think the first requirement is good to have, but we can probably drop the
second, especially since the ordering is probably not what we want.
------------------------------------------
Elliott Bradshaw
Director, Software Engineering
Oak Technology Imaging Group
781 638-7534
FOLLOWUP 5:
From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com> From: BIGELOW,JIM (HP-Boise,ex1) Sent: Friday, August 01, 2003 8:38 AM To: 'ElliottBradshaw@oaktech.com'; Jun Fujisawa Cc: don@lexmark.com; BIGELOW,JIM (HP-Boise,ex1); owner-xp@pwg.org; xp@pwg.org Subject: RE: XP> Incorrect example in Appendix B.3 of XHTML Print Elliott wrote: > I see two issues here, perhaps separable. > 1. Use of inline data. > > This can be accomplished by adding support for the data scheme. ... > > 2. Separation of the data from the reference > > ... > > I think the first requirement is good to have, but we can > probably drop the second, especially since the ordering is > probably not what we want. > I'm not perfectly clear on what you think the requirements should be. The current spec says that printer may support in-line data via the object/img elements, but is not required to. Are you calling for a change to this statement? Arguments against requiring support for in-line image data have been that: 1. it requires too much buffering 2. the image data could overflow the memory used to store element attributes. Alternately, to avoid the possibility of exceeding the memory set aside for storing element attributes while processing a job, a printer must either reserve large amounts of memory to avoid problems in this one, almost unique case, or implement a complex, dynamic memory allocation scheme. In any event supporting in-line data via the object and image attributes means that the entire image is funneled through the document parser, whereas, alternate means of handling image data are possible if the image is referenced via the cid or http schemes. There is another method for managing image data buffering, Section B.2.1 In-line images of the W3C spec provides some informative suggestions about ways to stage the delivery of image data using the (required) multiplexed document format. This method seeks to reduce the memory needed to store images while processing the document, by providing enough of the image header to determine the image's size, synchronized with the image's reference. The remainder or bulk of the image is delivered later in the document, hopefully, when the printer is ready to commit the image to the page. Jim --
FOLLOWUP 6:
From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>
From: ElliottBradshaw@oaktech.com [mailto:ElliottBradshaw@oaktech.com]
Sent: Friday, August 01, 2003 9:46 AM
To: BIGELOW,JIM (HP-Boise,ex1)
Cc: don@lexmark.com; Jun Fujisawa; BIGELOW,JIM (HP-Boise,ex1);
owner-xp@pwg.org; xp@pwg.org
Subject: RE: XP> Incorrect example in Appendix B.3 of XHTML Print
Sorry, I didn't mean to change the actual requirements. Section B.3 should
stay informative and just be a discussion of different things a printer may
choose to implement.
However, there is at least one case of a conditional requirement elsewhere
in the document (the Object Module) that refers to this section.
But, it is confusing what problem this section is trying to solve (in an
optional way). And, it looks like the example for use of the declare
attribute is just plain wrong.
I propose that we re-write this section to eliminate all discussion of the
declare attribute, and simply show how to use the data URL scheme to handle
inline data.
For example:
<proposal>
This section is informative.
An alternative method to include inline image data in XHTML-Print is via the
"data" URL scheme (see RFC2397). Because this method normally encodes the
binary image data using base64 encoding, a significant increase in the size
of the data transmitted will be experienced. This SHOULD be avoided over low
speed connections. Printers supporting inline data MAYsupport base64
encoding using the img or object element.
<object height="20 mm" width="20 mm" type="image/jpeg"
data="data:image/jpeg;base64,aGh67Fghsapa0Hji7dfGSweTa . . . ">
Example Image
</object>
or
<img height="20 mm" width="20 mm" alt="Example Image"
src="data:image/jpeg;base64,aGh67Fghsapa0Hji7dfGSweTa . . . " />
This method MAY be useful for very simple clients that cannot afford a
server for image downloading or for some reason cannot utilize the
Application/Multiplexed MIME type; however, it is not RECOMMENDED for
general use especially if the size of the printer's buffer is unknown.
</proposal>
REPLY 1:
From: Jim Bigelow <voyager-issues@mn.aptest.com>
Fujisawa-san,
Thank you for you comment. It is recorded as issue 6492 [1] in the HTML Working
Group's issue tracking system. The working group has elected to accept this
defect and modify XHTML-Print spec by accepting Elliott Bradshaw's proposal to
change Appendix B.3 to read as shown below. If this is not acceptable, please
respond to this message with your comments.
Jim Bigelow
--
This section is informative.
An alternative method to include inline image data in XHTML-Print is via the
"data" URL scheme (see RFC2397). Because this method normally encodes the
binary image data using base64 encoding, a significant increase in the size
of the data transmitted will be experienced. This SHOULD be avoided over low
speed connections.. Printers supporting inline data MAY support base64
encoding using the img or object element.
<object height="20 mm" width="20 mm" type="image/jpeg"
data="data:image/jpeg;base64,aGh67Fghsapa0Hji7dfGSweTa . . . ">
Example Image
</object>
or
<img height="20 mm" width="20 mm" alt="Example Image"
src="data:image/jpeg;base64,aGh67Fghsapa0Hji7dfGSweTa . . . " />
This method MAY be useful for very simple clients that cannot afford a
server for image downloading or for some reason cannot utilize the
Application/Multiplexed MIME type; however, it is not RECOMMENDED for
general use especially if the size of the printer's buffer is unknown.
[1] http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6492;user=guest
PROBLEM ID: 6782
STATE: Closed
RESOLUTION: Accept
USER POSITION: Agree
NOTES:
Apply changes as noted -- Jim
ORIGINAL MESSAGE:
From: Susan Lesch [mailto:lesch@w3.org] These are minor editorial comments for your XHTML-Print Last Call Working Draft [1]. Kudos to the editor and your group(s). It looks great. s/family of XHTML Languages/family of XHTML languages/ s/members/Members/ s/whitespace/white space/ s/Style Sheet/style sheet/ s/guillemots/guillemets/ s/ththe/the/ s/, support/. Support/ [extracted from 6899] [1] http://www.w3.org/TR/2003/WD-xhtml-print-20030729/ Best wishes for your project, -- Susan Lesch http://www.w3.org/People/Lesch/ mailto:lesch@w3.org tel:+1.858.483.4819 World Wide Web Consortium (W3C) http://www.w3.org/
FOLLOWUP 1:
From: Mail Delivery Subsystem <MAILER-DAEMON@hades.mn.aptest.com>
This is a MIME-encapsulated message
--h8QNj9b28021.1064619909/hades.mn.aptest.com
The original message was received at Fri, 26 Sep 2003 18:45:09 -0500
from IDENT:7ywgpQCDze4q049jyJPGDf82aNuXvKE8@localhost [127.0.0.1]
----- The following addresses had permanent fatal errors -----
<[mailto:lesch@w3.org]>
(reason: 550 Host unknown)
----- Transcript of session follows -----
550 5.1.2 <[mailto:lesch@w3.org]>... Host unknown (Name server: w3.org]: host not found)
--h8QNj9b28021.1064619909/hades.mn.aptest.com
Content-Type: message/delivery-status
Reporting-MTA: dns; hades.mn.aptest.com
Received-From-MTA: DNS; localhost
Arrival-Date: Fri, 26 Sep 2003 18:45:09 -0500
Final-Recipient: RFC822; [mailto:lesch@w3.org]
Action: failed
Status: 5.1.2
Remote-MTA: DNS; w3.org]
Diagnostic-Code: SMTP; 550 Host unknown
Last-Attempt-Date: Fri, 26 Sep 2003 18:45:09 -0500
--h8QNj9b28021.1064619909/hades.mn.aptest.com
Content-Type: message/rfc822
Return-Path: <voyager-issues@mn.aptest.com>
Received: from localhost (IDENT:7ywgpQCDze4q049jyJPGDf82aNuXvKE8@localhost [127.0.0.1])
by hades.mn.aptest.com (8.11.6/8.11.6) with ESMTP id h8QNj9b28019
for <[mailto:lesch@w3.org]>; Fri, 26 Sep 2003 18:45:09 -0500
Date: Fri, 26 Sep 2003 18:45:09 -0500
Message-Id: <200309262345.h8QNj9b28019@hades.mn.aptest.com>
From: Jim Bigelow <voyager-issues@mn.aptest.com>
To: lesch@w3.org]
Subject: Re: Minor editorial comments (PR#6782)
X-Loop: voyager-issues@mn.aptest.com
Thank you for your comment on the XHTML-Print Last Call
Working Draft. It is recorded as issue 6782 [1] in the HTML
Working Group's issue tracking system.
The working group has elected to implement you suggestions.
Jim Bigelow
Editor
[1] http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6782;user=guest
--h8QNj9b28021.1064619909/hades.mn.aptest.com--
REPLY 1:
From: Jim Bigelow <voyager-issues@mn.aptest.com> Thank you for your comment on the XHTML-Print Last Call Working Draft. It is recorded as issue 6782 [1] in the HTML Working Group's issue tracking system. The working group has elected to implement you suggestions. Jim Bigelow Editor [1] http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6782;user=guest
PROBLEM ID: 6869
STATE: Closed
RESOLUTION: Modify and Accept
USER POSITION: Agree
NOTES:
Equivalent of 6777
ORIGINAL MESSAGE:
From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com> From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com> To: www-html-editor@w3.org Cc: xp@pwg.org Subject: XHTML-Print: change of url from xhtml-print.org to w3c.org breaks current implementations. Date: Thu, 4 Sep 2003 11:02:17 -0700 Message-ID: <020A3CF87FB5AC47AA67966B33845755050DB585@xboi22.boise.itc.hp.com> X-Archived-At: http://www.w3.org/mid/020A3CF87FB5AC47AA67966B33845755050DB585@xboi22.boise.itc.hp.com The W3C Last Call Working Draft of XHTML-Print [1] changes the URL in the DOCTYPE from "http://www.xhtml-print.org/xhtml-print/xhtml-print10.dtd" to "http://www.w3.org/MarkUp/DTD/xhtml-print10.dtd". This breaks compatibility with existing implementations. Can this situation be handled by redirecting the xhtml-print.org url to the w3.org url? If so, how is this done? [1] http://www.w3.org/TR/2003/WD-xhtml-print-20030729/ Jim Bigelow Hewlett-Packard
REPLY 1:
From: Jim Bigelow <voyager-issues@mn.aptest.com> Jonny Axelsson wrote: Just for my curiosity: How does that break backwards compatibility? The old DTD will presumably remain at the www.xhtml-print.org location for at least as long as is needed (for the current implementations), while new or updated XHTML-Print implementations will use the new location. Or? -- Jonny Axelsson, Web Standards, Opera Software
REPLY 2:
From: Jim Bigelow <voyager-issues@mn.aptest.com> Elliott Bradshaw wrote: Don is going to remind us (as well he should) that the URL is not used for a live retrieval from that server. So a redirect doesn't work. So I think this is, technically, an incompatible change. But I think it's one we could live with. -------------------------------------------------------------------------------- Elliott Bradshaw Director, Software Engineering Zoran Imaging Group (formerly Oak Technology Imaging Group) 781 638-7534
REPLY 3:
From: Jim Bigelow <voyager-issues@mn.aptest.com> Jim Bigelow wrote: Jonny, Thanks for the question. If a document with the w3c DTD is sent to a printer that shipped with firmware written using the spec saying that conforming XHTML-Print documents must have a DTD containing a URL to the xhtml-print.org DTD, then the it is possible that the document wouldn't print correctly, even though the printer is not validating. In the extreme case, it is possible that the document wouldn't print at all, since Section 2.3.1, item 1 says, "A printer MAY ignore or otherwise reject a non-conforming XHTML-Print document." I think we're all better off avoiding things that could make the user unhappy! :-) Jim
REPLY 4:
From: Jim Bigelow <voyager-issues@mn.aptest.com> Thank you for your comment on the XHTML-Print Last Call Working Draft. It is recorded as issue 6869 [1] in the HTML Working Group's issue tracking system. The working group following the reasoning of issue 6780 [2] decided that the DTD in in Appendix C of the spec [3] and the DTD in Appendix C of XHTML-Print [4] must be accepted. However, the DTD in Appendix C of XHTML-Print [4] is deprecated in favor of the DTD in Appendic C. Future releases of this specification may remove the required support for the DTD in Appendix C of XHTML-Print [4]. If you feel that this resolution of your comment is not acceptable, please respond to this message with your comments. Jim Bigelow Editor [1] http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6869;user=guest [2] http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6780;user=guest [3] http://www.w3.org/TR/2003/WD-xhtml-print-20030729/ [4] http://www.pwg.org/xhtml-print/HTML-Version/XHTML-Print.html
PROBLEM ID: 6772
STATE: Closed
RESOLUTION: Accept
USER POSITION: Agree
NOTES:
Defined ignore as display:none
ORIGINAL MESSAGE:
From: Henri Sivonen <hsivonen@iki.fi> From: Henri Sivonen <hsivonen@iki.fi> To: www-html-editor@w3.org Subject: Scripts and Events Date: Sun, 3 Aug 2003 22:01:47 +0300 Message-Id: <EE667E7F-C5E4-11D7-B77B-003065B8CF0E@iki.fi> X-Archived-At: http://www.w3.org/mid/EE667E7F-C5E4-11D7-B77B-003065B8CF0E@iki.fi 1.3.1 Script and Events Since the specification requires the documents to conform to restrictions that are not applicable to all XHTML documents, it is unlikely that casually authored XHTML documents would happen to be conforming XHTML-Print documents. Therefore, it is reasonable to expect some preprocessing to take place in the application before sending a document to the printer. That application could be required to discard script elements without burdening the printer with that task. Such modification would change the document tree, though, and could change the matching of CSS selectors. If it is important to take into account the special case that someone could use a CSS selector such as "script + p" to style a paragraph, it would be necessary to elaborate on what "discarding" an element on the printer means (that is, is it discarded from the document tree or merely defaulted to display: none;). [extracted from issue 6548] -- Henri Sivonen hsivonen@iki.fi http://www.iki.fi/hsivonen/
REPLY 1:
From: Jim Bigelow <voyager-issues@mn.aptest.com> Thank you for your comment. It is recorded as issue 6772 [1] in the HTML Working Group's issue tracking system. The working group has elected to accept your comment by clarifying that discarding an element should be the equivalent to setting its display property to "none". If this resolution of you comment is not acceptable, please respond to this message with your comments. Jim Bigelow Editor [1] http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6772;user=guest
PROBLEM ID: 6773
STATE: Closed
RESOLUTION: Reject
USER POSITION: No Response
NOTES:
DOCTYPE does not add an extra burden on printers
ORIGINAL MESSAGE:
From: Henri Sivonen <hsivonen@iki.fi> From: Henri Sivonen <hsivonen@iki.fi> To: www-html-editor@w3.org Subject: Document Conformance Date: Sun, 3 Aug 2003 22:01:47 +0300 Message-Id: <EE667E7F-C5E4-11D7-B77B-003065B8CF0E@iki.fi> X-Archived-At: http://www.w3.org/mid/EE667E7F-C5E4-11D7-B77B-003065B8CF0E@iki.fi 2.1 Document Conformance Considering that printers are allowed to ignore non-conforming documents, requiring a particular doctype declaration and DTD validity looks like a significant burden for applications producing XHTML-Print documents. In particular, DTD validity requires namespaces to be represented in a particular way even though other representations would be semantically equivalent. This means applications producing XHTML-Print documents cannot use any off-the-shelf XML serializer but need a serializer specifically tailored to meet the requirements of XML-Print. Wouldn't it be enough allow DTDless documents as long as the element structure meets the requirements expressed in the DTD (even though this kind of conformance can't be checked with a [DTD-]validating XML processor)? [extracted from issue 6548] -- Henri Sivonen hsivonen@iki.fi http://www.iki.fi/hsivonen/
REPLY 1:
From: Jim Bigelow <voyager-issues@mn.aptest.com> Thank you for your comment on the XHTML-Print Last Call Working Draft. It is recorded as issue 6773 [1] in the HTML Working Group's issue tracking system. The working group does not agree that the inclusion of the required doctype element in XHTML-Print documents would be a burden either to an application that produced XHTML-Print documents or a printer that processed them. Therefore, no change is planned to the specific regarding your issue. If you feel that this resolution of your comment is not acceptable, please respond to this message with your comments. Jim Bigelow Editor [1] http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6773;user=guest
PROBLEM ID: 6774
STATE: Closed
RESOLUTION: Accept
USER POSITION: Agree
NOTES:
Lexmark dessenting, all others accept
ORIGINAL MESSAGE:
From: Henri Sivonen <hsivonen@iki.fi> From: Henri Sivonen <hsivonen@iki.fi> To: www-html-editor@w3.org Subject: allow UTF-16 not just UTF-8 Date: Sun, 3 Aug 2003 22:01:47 +0300 Message-Id: <EE667E7F-C5E4-11D7-B77B-003065B8CF0E@iki.fi> X-Archived-At: http://www.w3.org/mid/EE667E7F-C5E4-11D7-B77B-003065B8CF0E@iki.fi It is said that if a "charset" parameter is present for the application/xhtml+xml MIME type, the only valid value is "utf-8". It would make sense to allow "utf-16" as well. All XML processors are required to support UTF-16 in addition to UTF-8, so allowing UTF-16 for XHTML-Print doesn't cause any additional burden to implementations. Also, the payload of Application/Vnd.pwg-multiplexed chunks is defined as octets, so UTF-16 strings can be delivered as Application/Vnd.pwg-multiplexed chunks without any further encoding. [extracted from issue 6548] -- Henri Sivonen hsivonen@iki.fi http://www.iki.fi/hsivonen/
FOLLOWUP 1:
From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com> From: don@lexmark.com [mailto:don@lexmark.com] Sent: Tuesday, September 02, 2003 6:06 PM To: BIGELOW,JIM (HP-Boise,ex1) Cc: xp@pwg.org Subject: Re: XP> Relaxing XHTML-Print's restriction to UTF-8 to include UTF-16 Jim: I would disagree. I don't believe that all XHTML-Print enabled printers will necessarily bite the bullet and include a complete XML parser that requires support for UTF-16. I don't believe we should force that to occur. Perhaps you should remind the group that XHTML-Print is target for LOW-END printers with this embedded. No 3 gigahertz Pentium 4's with 512 MB of memory!!! ******************************************* Don Wright don@lexmark.com Chair, IEEE SA Standards Board Member, IEEE-ISTO Board of Directors f.wright@ieee.org / f.wright@computer.org Director, Alliances and Standards Lexmark International 740 New Circle Rd C14/082-3 Lexington, Ky 40550 859-825-4808 (phone) 603-963-8352 (fax) *******************************************
FOLLOWUP 2:
From: jim.bigelow@hp.com
I tend to agree with Henri.
-- Jim Bigelow
FOLLOWUP 3:
From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>
> From: elliott.bradshaw@zoran.com [mailto:elliott.bradshaw@zoran.com]
> Sent: Wednesday, September 03, 2003 7:07 AM
> To: don@lexmark.com
> Cc: BIGELOW,JIM (HP-Boise,ex1); owner-xp@pwg.org; xp@pwg.org
> Subject: Re: XP> Relaxing XHTML-Print's restriction to UTF-8
> to include UTF-16
>
Or to put it another way, XHTML-Print describes a single way of doing
something. Wherease HTML and its derivatives frequently support multiple
ways of getting the same effect.
In the past, we have have resisted features that appear easy, unless they
actually extend the capabilities of what can be done.
Since I think a UTF-8 oriented client can get the same work done as a UTF-16
client, we should not mandate the extension.
IMHO.
E.
FOLLOWUP 4:
From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com> > From: Michael Sweet [mailto:mike@easysw.com] > Sent: Wednesday, September 03, 2003 7:26 AM > To: don@lexmark.com > Cc: BIGELOW,JIM (HP-Boise,ex1); xp@pwg.org > Subject: Re: XP> Relaxing XHTML-Print's restriction to UTF-8 > to include UTF-16 I'm not so worried about memory usage; converting UTF-16 to UTF-8 on the input side is not expensive in terms of memory or processor. However, reliably detecting UTF-16 and managing the endianess of the words is a pain in the ass in the real world. Assuming that all UTF-16 files start with FFFE or FEFF, the XML parser can handle the UTF-16 encoding without difficulty, however certain large convicted software monopolies regularly omit this important information making autodetection unreliable. Given the limited scope of XHTML-Print and the desire for maximum interoperability, I would recommend that we stick with UTF-8 as the only requirement so that applications that send XHTML-Print data have to use UTF-8 and manage whatever perversion of UTF-16 they use internally themselves... -- ______________________________________________________________________ Michael Sweet, Easy Software Products mike at easysw dot com Printing Software for UNIX http://www.easysw.com
FOLLOWUP 5:
From: don@lexmark.com I maintain my disagreement with this decision for all the reasons previously mentioned including: 1) There are no characters which can be represented in UTF16 that connot be represented in UTF8 2) Reliable detection of UTF16 has not been proven 3) High "zoot" clients can much more easily convert any UTF16 to UTF8 4) Many of the target printers will have no need to deal with generic XML and hence no reason to support UTF16 Jim Bigelow <voyager-issues@mn.aptest.com> on 09/26/2003 03:48:41 PM To: hsivonen@iki.fi cc: don@lexmark.com, elliott.bradshaw@zoran.com Subject: Re: allow UTF-16 not just UTF-8 (PR#6774) Thank you for your comment on the XHTML-Print Last Call Working Draft. It is recorded as issue 6774 [1] in the HTML Working Group's issue tracking system. The working group agrees that since XHTML-Print is a member of the family of XHTML 1.0 languages documents encodings cannot be restricted to UTF-8 but must also include UTF-16. The specification will be modified to remove the sentence, 'The only valid value for the "charset" parameter is "utf-8".' If you feel that this resolution of your comment is not acceptable, please respond to this message with your comments. Jim Bigelow Editor [1] http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6774;user=guest
FOLLOWUP 6:
From: don@lexmark.com Works for me. ********************************************** Don Wright don@lexmark.com Chair, IEEE SA Standards Board Member, IEEE-ISTO Board of Directors f.wright@ieee.org / f.wright@computer.org Director, Alliances & Standards Lexmark International 740 New Circle Rd Lexington, Ky 40550 859-825-4808 (phone) 603-963-8352 (fax) ********************************************** Jim Bigelow <voyager-issues@mn.aptest.com> on 09/29/2003 05:39:11 PM To: don@lexmark.com cc: Subject: Re: allow UTF-16 not just UTF-8 (PR#6774) Don, What do you think of the following compromise? 1. say nothing about whether a printer supports UTF-8 or UTF-16 2. require that conforming XHTML-Print documents be encoded in UTF-8 by requiring that conforming clients (Section 2.2) creating documents that are encoded in UF-8. This means adding the following to item 1 of Section 2.2: 1. Clients SHALL produce a well-formed XHTML-Print document as defined in XHTML 1.0 [XHTML1] and in Document Conformance. The document SHALL be encoded using UTF-8 [RFC2279]. Jim Bigelow
FOLLOWUP 7:
From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com> To the HTML WG: Hello, Please help me understand this facet of XHTML-Print as a member of the Family of Languages defined by the Modularization of XHTML 1.0 -- must an application that processes XHTML-Print documents be a conforming XML processor? I'm sure that it must be able to process XHTML-Print documents as described by the XHTML-Print specification, but are there other constraints? For example, an xml processor is supposed to be able to process documents in UTF-8 and UTF-16. Why does an XHTML-Print processor have support UTF-16? What would be the reasons for not restricting the encoding to UTF-8? The potential benefit of only requiring support for UTF-8, rather than both UTF-8 and UTF-16, is that a more low-cost (in terms of memory and processing power) printers could process utf-8 encoded XHTML-Print documents. Requiring support for both UTF-8 and UTF-16 increases the memory and processing requirements and thereby reduces the number of devices that could process XHTML-Print documents. One of the goals of XHTML-Print is to provide document format for printing from and to low-cost devices, so keeping requirements to a minimum increases the possibilities that low-cost printers will implement support for it. Several representative of printer manufactures have expressed the opinion that support for UTF-8 and not for UTF-16 is preferred. Can you help me understand the technical reasons why UTF-16 support should be required, so we can judge the trade-offs in implementation costs versus capabilities? Jim
FOLLOWUP 8:
From: elliott.bradshaw@zoran.com Jim, Um, seems to me like a game of semantics. Whether we make a statement about the language or a statement about how the client generates it, seems like it's the same thing. I think the conflict here is: 1. PWG wanted a simple way to send print jobs. No need for multiple ways to accomplish the same thing. 2. But there seem to be W3C rules about how one derives languages from XHTML. I do think that #2 is contrary to the purpose of the original project. Just as we are able to say that XHTML-Print does not mandate certain properties which are too hard for a printer (e.g. the caveats on the position property) we ought to be able to exclude something that is not appropriate to the problem at hand. The only justification for this extension is "W3C says so." In principle we shouldn't do it. But, as a compromise I could live with it if I had to. -- Elliott Bradshaw Director, Software Engineering Zoran Imaging Division (formerly Oak Technology Imaging Group) 781 638-7534 0
FOLLOWUP 9:
From: Mail Delivery Subsystem <MAILER-DAEMON@hades.mn.aptest.com>
This is a MIME-encapsulated message
--h91Hhtb18706.1065030235/hades.mn.aptest.com
The original message was received at Wed, 1 Oct 2003 12:43:53 -0500
from IDENT:i5LhU/0sXY+dwkWULvPvTYjef6dRQYOI@localhost [127.0.0.1]
----- The following addresses had permanent fatal errors -----
<don@lexmark>
(reason: 550 Host unknown)
----- Transcript of session follows -----
550 5.1.2 <don@lexmark>... Host unknown (Name server: lexmark: host not found)
--h91Hhtb18706.1065030235/hades.mn.aptest.com
Content-Type: message/delivery-status
Reporting-MTA: dns; hades.mn.aptest.com
Received-From-MTA: DNS; localhost
Arrival-Date: Wed, 1 Oct 2003 12:43:53 -0500
Final-Recipient: RFC822; don@lexmark
Action: failed
Status: 5.1.2
Remote-MTA: DNS; lexmark
Diagnostic-Code: SMTP; 550 Host unknown
Last-Attempt-Date: Wed, 1 Oct 2003 12:43:54 -0500
--h91Hhtb18706.1065030235/hades.mn.aptest.com
Content-Type: message/rfc822
Return-Path: <voyager-issues@mn.aptest.com>
Received: from localhost (IDENT:i5LhU/0sXY+dwkWULvPvTYjef6dRQYOI@localhost [127.0.0.1])
by hades.mn.aptest.com (8.11.6/8.11.6) with ESMTP id h91Hhrb18704;
Wed, 1 Oct 2003 12:43:53 -0500
Date: Wed, 1 Oct 2003 12:43:53 -0500
Message-Id: <200310011743.h91Hhrb18704@hades.mn.aptest.com>
From: Jim Bigelow <voyager-issues@mn.aptest.com>
To: don@lexmark, elliott.bradshaw@zoran.com
Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
X-Loop: voyager-issues@mn.aptest.com
Don and Elliott,
The HTML working group discussed my question of why and XHTML-Print processor
must be a conforming XML processor (in particular, why it must support both
UTF-8 and UTF-16 encodings) on October 1, 2003.
The answer is that XHTML-Print must be a conforming XML processor and support
both UTF-8 and UTF-16 encodings to preserve compatibility between xml-based
applications.
If XHTML-Print processors only supported UTF-8 then an xml-based application
could not be reliably depended upon to emit an XHTML-Print document that the
XHTML-print application could process. For example, an xml-based Xforms
application's output of an XHTML-Print document cannot be restricted by the
XHTML-Print specification to UTF-8 since the application may not be able to
control the encoding.
Section 4.3.3 [1] and Appendix F [2] of the XML specification [3] give
heuristics for determing a document's encoding when the charset parameter of the
MIME type [4] is absent.
An example UTF-16 decoder is available at [5] other encodings are at [6].
Jim Bigelow
[1] http://www.w3.org/TR/REC-xml#charencoding
[2] http://www.w3.org/TR/REC-xml#sec-guessing
[3] http://www.w3.org/TR/REC-xml
[4] http://www.ietf.org/rfc/rfc3023.txt
[5] http://interscript.sourceforge.net/interscript/doc/en_iscr_0282.html
[6] http://interscript.sourceforge.net/interscript/doc/en_iscr_0275.html
--h91Hhtb18706.1065030235/hades.mn.aptest.com--
FOLLOWUP 10:
From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com> Here is Don Wright's objection to UTF-16 support. Jim http://oz.boi.hp.com/~jhb/ -----Original Message----- From: don@lexmark.com [mailto:don@lexmark.com] Sent: Wednesday, October 08, 2003 9:42 AM To: BIGELOW,JIM (HP-Boise,ex1) Cc: elliott.bradshaw@zoran.com; www-html@w3.org Subject: Re: allow UTF-16 not just UTF-8 (PR#6774) Jim: So let me understand this.... Because people have poorly designed and written XML applications running on 3 GHz Pentium 4s with 512 megabytes of real memory that do not allow the control over whether UTF-8 or UTF-16 are emitted, we are expecting to burden $49 printers with code to be able to detect and interpret both. I maintain my objection and my no vote. ********************************************** Don Wright don@lexmark.com Chair, IEEE SA Standards Board Member, IEEE-ISTO Board of Directors f.wright@ieee.org / f.wright@computer.org Director, Alliances & Standards Lexmark International 740 New Circle Rd Lexington, Ky 40550 859-825-4808 (phone) 603-963-8352 (fax) ********************************************** "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com> on 10/08/2003 10:24:45 AM To: don@lexmark.com cc: elliott.bradshaw@zoran.com, www-html@w3.org Subject: Re: allow UTF-16 not just UTF-8 (PR#6774) From http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6774;user=g uest - reply #3 Date: Wed Oct 1 12:43:54 2003 Don and Elliott, The HTML working group discussed my question of why and XHTML-Print processor must be a conforming XML processor (in particular, why it must support both UTF-8 and UTF-16 encodings) on October 1, 2003. The answer is that XHTML-Print must be a conforming XML processor and support both UTF-8 and UTF-16 encodings to preserve compatibility between xml-based applications. If XHTML-Print processors only supported UTF-8 then an xml-based application could not be reliably depended upon to emit an XHTML-Print document that the XHTML-print application could process. For example, an xml-based Xforms application's output of an XHTML-Print document cannot be restricted by the XHTML-Print specification to UTF-8 since the application may not be able to control the encoding. Section 4.3.3 [1] and Appendix F [2] of the XML specification [3] give heuristics for determing a document's encoding when the charset parameter of the MIME type [4] is absent. An example UTF-16 decoder is available at [5] other encodings are at [6]. Jim Bigelow [1] http://www.w3.org/TR/REC-xml#charencoding [2] http://www.w3.org/TR/REC-xml#sec-guessing [3] http://www.w3.org/TR/REC-xml [4] http://www.ietf.org/rfc/rfc3023.txt [5] http://interscript.sourceforge.net/interscript/doc/en_iscr_0282.html [6] http://interscript.sourceforge.net/interscript/doc/en_iscr_0275.html Jim http://oz.boi.hp.com/~jhb/
FOLLOWUP 11:
From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>
-----Original Message-----
From: elliott.bradshaw@zoran.com [mailto:elliott.bradshaw@zoran.com]
Sent: Thursday, October 09, 2003 2:14 PM
To: don@lexmark.com
Cc: BIGELOW,JIM (HP-Boise,ex1)
Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
Don,
As you know I have been skeptical of feature creep all along. But I think
this one may be different...here's why.
When we originally conceived XHTML-Print the idea was that the client code
would be essentially a hand-coded print driver. But this W3C discussion
brings up the idea that people could use XML application development tools
as well. This could be in our interest if it gives people an easy way to
write XHTML-Print aware applications. (And it seems to be pretty
fundamental to the way they defined XML.)
It seems that such tools don't like to be constrained to only one of UTF-8
vs. UTF-16...it would be "unnatural" to limit a developer in this way. It
sort of reminds me of 10baseT vs. 100baseT, in which it seems odd to support
one but not the other.
How much complexity would this add to the $49 printer? Once we know whether
or not we are in UTF-16, it would add very little (if nothing else do a
brute force conversion from UTF-16 to UTF-8). Detection of UTF-16 is also
straightforward, as described in 4.3.3 of http://www.w3.org/TR/REC-xml,
which says the special Byte Order Mark is required at the beginning of
UTF-16. (It also says very clearly that UTF-16 support is required.)
So I think the cost is low, the benefit of XML-based application tools might
be significant, and technical alignment with XML makes it worth doing.
E.
----------------------------------------------------------------------------
----
Elliott Bradshaw
Director, Software Engineering
Zoran Imaging Division (formerly Oak Technology Imaging Group) 781 638-7534
don@lexmark.co
m To: "BIGELOW,JIM
(HP-Boise,ex1)"
<jim.bigelow@hp.com>
10/08/2003 cc: elliott.bradshaw@zoran.com,
www-html@w3.org
12:41 PM Subject: Re: allow UTF-16 not
just UTF-8
(PR#6774)
Jim:
So let me understand this....
Because people have poorly designed and written XML applications running on
3 GHz Pentium 4s with 512 megabytes of real memory that do not allow the
control over whether UTF-8 or UTF-16 are emitted, we are expecting to burden
$49 printers with code to be able to detect and interpret both.
I maintain my objection and my no vote.
**********************************************
Don Wright don@lexmark.com
Chair, IEEE SA Standards Board
Member, IEEE-ISTO Board of Directors
f.wright@ieee.org / f.wright@computer.org
Director, Alliances & Standards
Lexmark International
740 New Circle Rd
Lexington, Ky 40550
859-825-4808 (phone) 603-963-8352 (fax)
**********************************************
"BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com> on 10/08/2003 10:24:45 AM
To: don@lexmark.com
cc: elliott.bradshaw@zoran.com, www-html@w3.org
Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
From
http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6774;user=g
uest - reply #3
Date: Wed Oct 1 12:43:54 2003
Don and Elliott,
The HTML working group discussed my question of why and XHTML-Print
processor must be a conforming XML processor (in particular, why it must
support both UTF-8 and UTF-16 encodings) on October 1, 2003.
The answer is that XHTML-Print must be a conforming XML processor and
support both UTF-8 and UTF-16 encodings to preserve compatibility between
xml-based applications.
If XHTML-Print processors only supported UTF-8 then an xml-based application
could not be reliably depended upon to emit an XHTML-Print document that the
XHTML-print application could process. For example, an xml-based Xforms
application's output of an XHTML-Print document cannot be restricted by the
XHTML-Print specification to UTF-8 since the application may not be able to
control the encoding.
Section 4.3.3 [1] and Appendix F [2] of the XML specification [3] give
heuristics for determing a document's encoding when the charset parameter of
the MIME type [4] is absent.
An example UTF-16 decoder is available at [5] other encodings are at [6].
Jim Bigelow
[1] http://www.w3.org/TR/REC-xml#charencoding
[2] http://www.w3.org/TR/REC-xml#sec-guessing
[3] http://www.w3.org/TR/REC-xml
[4] http://www.ietf.org/rfc/rfc3023.txt
[5] http://interscript.sourceforge.net/interscript/doc/en_iscr_0282.html
[6] http://interscript.sourceforge.net/interscript/doc/en_iscr_0275.html
Jim
http://oz.boi.hp.com/~jhb/
FOLLOWUP 12:
From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com> Mike, I've neglected to update you on the discussions about UTF-8/UTF-16 support for XHTML-Print. Please let us know you thoughts on the matter. You can see these discussion using the following link to the W3C's HTML Working Group issue database: http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6774;user=g uest In summary: HTML WG: must support UTF-8 & UTF-16 for interoperability with all other xml and xml-derived applications and processors. Lexmark: UTF-16 support is too expensive to support in a low-cost printer, and too hard to reliably detect, ... Oak/Zoran: UTF-16 wouldn't be too expensive to implement and enables a new class of XHTML-Print producing devices HP: UTF-16 allows for more compact representation of Asian character documents and would not be too much to implement. Jim Bigelow, Editor: XHTML-Print & CSS Print Profile W3C HTML and CSS Working Groups http://www.w3.org/TR/xhtml-print/ http://www.w3.org/TR/css-print/ Hewlett-Packard 208-396-2068 jim.bigelow@hp.com
FOLLOWUP 13:
From: don@lexmark.com
Steven, et al:
The real problem is that the entire XML architecture was designed assuming
high end boxes like the 3 GHz Pentium with 512 megabytes of memory. We
have already seen push back in other standards groups that consumer
electronic devices and other smaller, lighter devices cannot afford all the
luxuries demand by an obese XML architecture. Unless the XML community
accepts subsetting, we can't expect the broadest support for XML to happen
at the low end until the price/performance ratios experience another order
or two magnitude improvement. As recently reported in several of the trade
magazines focused on IT professionals, the deployment of XML and Web
Services are have significant negative impacts on the IT infrastructure
especially in the area of bandwidth utilization. This is just another
symptom of the same problem.
I know I will lose this argument in the W3C but the realities of the
XHTML-Print implementations will blow off UTF-16 as more fat with no
benefit and simply not support it, "interoperable" or not.
Sorry I'm not pure but practical.
*******************************************
Don Wright don@lexmark.com
Chair, IEEE SA Standards Board
Member, IEEE-ISTO Board of Directors
f.wright@ieee.org / f.wright@computer.org
Director, Alliances and Standards
Lexmark International
740 New Circle Rd C14/082-3
Lexington, Ky 40550
859-825-4808 (phone) 603-963-8352 (fax)
*******************************************
"Steven Pemberton" <Steven.Pemberton@cwi.nl> on 10/15/2003 09:18:15 AM
To: "BIGELOW,JIM \(HP-Boise,ex1\)" <jim.bigelow@hp.com>,
<w3c-html-wg@w3.org>, <don@lexmark.com>
cc: <voyager-issues@mn.aptest.com>, <elliott.bradshaw@zoran.com>,
<www-html@w3.org>
Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
> From: don@lexmark.com [mailto:don@lexmark.com]
> So let me understand this....
>
> Because people have poorly designed and written XML applications running
on
> 3 GHz Pentium 4s with 512 megabytes of real memory that do not allow the
> control over whether UTF-8 or UTF-16 are emitted, we are expecting to
burden
> $49 printers with code to be able to detect and interpret both.
No Don. It is about interoperability and conforming to standards. XML
allows
documents to be encoded in either UTF8 or UTF 16: consumers must accept
both, producers may produce either. An XHTML-Print printer will be just a
consumer of an XML byte-stream at some IP address; we don't want to burden
every program in the world that can produce XML with a switch that says
"this output is going to a poor lowly XHTML Print processor that can't deal
with UTF-16, so please produce UTF-8", especially since UTF 16 is the easy
one to implement, and can only cost a few dozen bytes at best.
If we changed this, XHTML Print would have to go back to last call, and you
can bet your boots that the XML community would rise up against us, as it
has in the past, and I can tell you we don't want to go there, and we would
have a hundred people registering objections.
Conforming to XML requirements comes with the territory of being XHTML. The
XML community will not take lightly to us messing with their standards.
Best wishes,
Steven Pemberton
FOLLOWUP 14:
From: "Steven Pemberton" <Steven.Pemberton@cwi.nl> > From: don@lexmark.com [mailto:don@lexmark.com] > So let me understand this.... > > Because people have poorly designed and written XML applications running on > 3 GHz Pentium 4s with 512 megabytes of real memory that do not allow the > control over whether UTF-8 or UTF-16 are emitted, we are expecting to burden > $49 printers with code to be able to detect and interpret both. No Don. It is about interoperability and conforming to standards. XML allows documents to be encoded in either UTF8 or UTF 16: consumers must accept both, producers may produce either. An XHTML-Print printer will be just a consumer of an XML byte-stream at some IP address; we don't want to burden every program in the world that can produce XML with a switch that says "this output is going to a poor lowly XHTML Print processor that can't deal with UTF-16, so please produce UTF-8", especially since UTF 16 is the easy one to implement, and can only cost a few dozen bytes at best. If we changed this, XHTML Print would have to go back to last call, and you can bet your boots that the XML community would rise up against us, as it has in the past, and I can tell you we don't want to go there, and we would have a hundred people registering objections. Conforming to XML requirements comes with the territory of being XHTML. The XML community will not take lightly to us messing with their standards. Best wishes, Steven Pemberton
FOLLOWUP 15:
From: Michael Sweet <mike@easysw.com> BIGELOW,JIM (HP-Boise,ex1) wrote: > Mike, > > I've neglected to update you on the discussions about UTF-8/UTF-16 > support for XHTML-Print. Please let us know you thoughts on the > matter. My concerns have always been concerning the detection between UTF-8 and UTF-16. After looking through the archive and the current XML spec, it does look like the BOM is required at the beginning of any UTF-16 XML document, so any autodetection problems can safely be blamed on Microsoft or whatever vendor is producing a non-conforming document. I do like the idea of recommending (a SHOULD, not a MUST) that the XHTML-Print client use the UTF-8 encoding, and add a note that the typical XHTML-Print device has limited CPU/memory available and the use of UTF-8 will potentially provide faster printing, etc. -- ______________________________________________________________________ Michael Sweet, Easy Software Products mike at easysw dot com Printing Software for UNIX http://www.easysw.com
FOLLOWUP 16:
From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>
From: elliott.bradshaw@zoran.com [mailto:elliott.bradshaw@zoran.com]
Sent: Thursday, October 09, 2003 2:14 PM
To: don@lexmark.com
Cc: BIGELOW,JIM (HP-Boise,ex1)
Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
Don,
As you know I have been skeptical of feature creep all along. But I think
this one may be different...here's why.
When we originally conceived XHTML-Print the idea was that the client code
would be essentially a hand-coded print driver. But this W3C discussion
brings up the idea that people could use XML application development tools
as well. This could be in our interest if it gives people an easy way to
write XHTML-Print aware applications. (And it seems to be pretty
fundamental to the way they defined XML.)
It seems that such tools don't like to be constrained to only one of UTF-8
vs. UTF-16...it would be "unnatural" to limit a developer in this way. It
sort of reminds me of 10baseT vs. 100baseT, in which it seems odd to support
one but not the other.
How much complexity would this add to the $49 printer? Once we know whether
or not we are in UTF-16, it would add very little (if nothing else do a
brute force conversion from UTF-16 to UTF-8). Detection of UTF-16 is also
straightforward, as described in 4.3.3 of http://www.w3.org/TR/REC-xml,
which says the special Byte Order Mark is required at the beginning of
UTF-16. (It also says very clearly that UTF-16 support is required.)
So I think the cost is low, the benefit of XML-based application tools might
be significant, and technical alignment with XML makes it worth doing.
E.
----------------------------------------------------------------------------
----
Elliott Bradshaw
Director, Software Engineering
Zoran Imaging Division (formerly Oak Technology Imaging Group) 781 638-7534
FOLLOWUP 17:
From: "Steven Pemberton" <steven.pemberton@cwi.nl> But support for UTF 16 adds a few dozen bytes of code, and no extra memory requirements. It is simpler than UTF 8! What's the problem? Steven ----- Original Message ----- From: <don@lexmark.com> To: "Steven Pemberton" <Steven.Pemberton@cwi.nl> Cc: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>; <w3c-html-wg@w3.org>; <don@lexmark.com>; <voyager-issues@mn.aptest.com>; <elliott.bradshaw@zoran.com>; <www-html@w3.org> Sent: Thursday, October 16, 2003 12:20 AM Subject: Re: allow UTF-16 not just UTF-8 (PR#6774) > > Steven, et al: > > The real problem is that the entire XML architecture was designed assuming > high end boxes like the 3 GHz Pentium with 512 megabytes of memory. We > have already seen push back in other standards groups that consumer > electronic devices and other smaller, lighter devices cannot afford all the > luxuries demand by an obese XML architecture. Unless the XML community > accepts subsetting, we can't expect the broadest support for XML to happen > at the low end until the price/performance ratios experience another order > or two magnitude improvement. As recently reported in several of the trade > magazines focused on IT professionals, the deployment of XML and Web > Services are have significant negative impacts on the IT infrastructure > especially in the area of bandwidth utilization. This is just another > symptom of the same problem. > > I know I will lose this argument in the W3C but the realities of the > XHTML-Print implementations will blow off UTF-16 as more fat with no > benefit and simply not support it, "interoperable" or not. > > Sorry I'm not pure but practical. > > ******************************************* > Don Wright don@lexmark.com > > Chair, IEEE SA Standards Board > Member, IEEE-ISTO Board of Directors > f.wright@ieee.org / f.wright@computer.org > > Director, Alliances and Standards > Lexmark International > 740 New Circle Rd C14/082-3 > Lexington, Ky 40550 > 859-825-4808 (phone) 603-963-8352 (fax) > ******************************************* > > > > > "Steven Pemberton" <Steven.Pemberton@cwi.nl> on 10/15/2003 09:18:15 AM > > To: "BIGELOW,JIM \(HP-Boise,ex1\)" <jim.bigelow@hp.com>, > <w3c-html-wg@w3.org>, <don@lexmark.com> > cc: <voyager-issues@mn.aptest.com>, <elliott.bradshaw@zoran.com>, > <www-html@w3.org> > Subject: Re: allow UTF-16 not just UTF-8 (PR#6774) > > > > From: don@lexmark.com [mailto:don@lexmark.com] > > > So let me understand this.... > > > > Because people have poorly designed and written XML applications running > on > > 3 GHz Pentium 4s with 512 megabytes of real memory that do not allow the > > control over whether UTF-8 or UTF-16 are emitted, we are expecting to > burden > > $49 printers with code to be able to detect and interpret both. > > No Don. It is about interoperability and conforming to standards. XML > allows > documents to be encoded in either UTF8 or UTF 16: consumers must accept > both, producers may produce either. An XHTML-Print printer will be just a > consumer of an XML byte-stream at some IP address; we don't want to burden > every program in the world that can produce XML with a switch that says > "this output is going to a poor lowly XHTML Print processor that can't deal > with UTF-16, so please produce UTF-8", especially since UTF 16 is the easy > one to implement, and can only cost a few dozen bytes at best. > > If we changed this, XHTML Print would have to go back to last call, and you > can bet your boots that the XML community would rise up against us, as it > has in the past, and I can tell you we don't want to go there, and we would > have a hundred people registering objections. > > Conforming to XML requirements comes with the territory of being XHTML. The > XML community will not take lightly to us messing with their standards. > > Best wishes, > > Steven Pemberton > > > > > > >
FOLLOWUP 18:
From: don@lexmark.com
One more thing, just one more thing. Every option or alternative adds one
more thing.
I think I'll pass on that one more thin mint.
*******************************************
Don Wright don@lexmark.com
Chair, IEEE SA Standards Board
Member, IEEE-ISTO Board of Directors
f.wright@ieee.org / f.wright@computer.org
Director, Alliances and Standards
Lexmark International
740 New Circle Rd C14/082-3
Lexington, Ky 40550
859-825-4808 (phone) 603-963-8352 (fax)
*******************************************
"Steven Pemberton" <steven.pemberton@cwi.nl> on 10/15/2003 07:26:24 PM
To: <don@lexmark.com>
cc: "BIGELOW,JIM \(HP-Boise,ex1\)" <jim.bigelow@hp.com>,
<w3c-html-wg@w3.org>, <don@lexmark.com>,
<voyager-issues@mn.aptest.com>, <elliott.bradshaw@zoran.com>,
<www-html@w3.org>
Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
But support for UTF 16 adds a few dozen bytes of code, and no extra memory
requirements. It is simpler than UTF 8! What's the problem?
Steven
----- Original Message -----
From: <don@lexmark.com>
To: "Steven Pemberton" <Steven.Pemberton@cwi.nl>
Cc: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>;
<w3c-html-wg@w3.org>;
<don@lexmark.com>; <voyager-issues@mn.aptest.com>;
<elliott.bradshaw@zoran.com>; <www-html@w3.org>
Sent: Thursday, October 16, 2003 12:20 AM
Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
>
> Steven, et al:
>
> The real problem is that the entire XML architecture was designed
assuming
> high end boxes like the 3 GHz Pentium with 512 megabytes of memory. We
> have already seen push back in other standards groups that consumer
> electronic devices and other smaller, lighter devices cannot afford all
the
> luxuries demand by an obese XML architecture. Unless the XML community
> accepts subsetting, we can't expect the broadest support for XML to
happen
> at the low end until the price/performance ratios experience another
order
> or two magnitude improvement. As recently reported in several of the
trade
> magazines focused on IT professionals, the deployment of XML and Web
> Services are have significant negative impacts on the IT infrastructure
> especially in the area of bandwidth utilization. This is just another
> symptom of the same problem.
>
> I know I will lose this argument in the W3C but the realities of the
> XHTML-Print implementations will blow off UTF-16 as more fat with no
> benefit and simply not support it, "interoperable" or not.
>
> Sorry I'm not pure but practical.
>
> *******************************************
> Don Wright don@lexmark.com
>
> Chair, IEEE SA Standards Board
> Member, IEEE-ISTO Board of Directors
> f.wright@ieee.org / f.wright@computer.org
>
> Director, Alliances and Standards
> Lexmark International
> 740 New Circle Rd C14/082-3
> Lexington, Ky 40550
> 859-825-4808 (phone) 603-963-8352 (fax)
> *******************************************
>
>
>
>
> "Steven Pemberton" <Steven.Pemberton@cwi.nl> on 10/15/2003 09:18:15 AM
>
> To: "BIGELOW,JIM \(HP-Boise,ex1\)" <jim.bigelow@hp.com>,
> <w3c-html-wg@w3.org>, <don@lexmark.com>
> cc: <voyager-issues@mn.aptest.com>, <elliott.bradshaw@zoran.com>,
> <www-html@w3.org>
> Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
>
>
> > From: don@lexmark.com [mailto:don@lexmark.com]
>
> > So let me understand this....
> >
> > Because people have poorly designed and written XML applications
running
> on
> > 3 GHz Pentium 4s with 512 megabytes of real memory that do not allow
the
> > control over whether UTF-8 or UTF-16 are emitted, we are expecting to
> burden
> > $49 printers with code to be able to detect and interpret both.
>
> No Don. It is about interoperability and conforming to standards. XML
> allows
> documents to be encoded in either UTF8 or UTF 16: consumers must accept
> both, producers may produce either. An XHTML-Print printer will be just a
> consumer of an XML byte-stream at some IP address; we don't want to
burden
> every program in the world that can produce XML with a switch that says
> "this output is going to a poor lowly XHTML Print processor that can't
deal
> with UTF-16, so please produce UTF-8", especially since UTF 16 is the
easy
> one to implement, and can only cost a few dozen bytes at best.
>
> If we changed this, XHTML Print would have to go back to last call, and
you
> can bet your boots that the XML community would rise up against us, as it
> has in the past, and I can tell you we don't want to go there, and we
would
> have a hundred people registering objections.
>
> Conforming to XML requirements comes with the territory of being XHTML.
The
> XML community will not take lightly to us messing with their standards.
>
> Best wishes,
>
> Steven Pemberton
>
>
>
>
>
>
>
FOLLOWUP 19:
From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com> Don, Here is a new section in the Design Rationale portion of the spec: <h3 id="s.1.3.7">1.3.7 Character Model</h3> <p> The W3C architectural specification <cite>Character Model for the World Wide Web 1.0</cite> [<a href="#ref_charmod">CHARMOD</a>] gives the <em title="RECOMMENDED in RFC 2119 context" class="RFC2119">RECOMMENDED</em> representation of characters in XHTML-Print. Authors of XHTML-Print producing applications <em title="SHOULD in RFC 2119 context" class="RFC2119">SHOULD</em> be aware that lost cost printers might be limited in both processing power and memory and therefore, that fully-normalized ([<a href="#ref_charmod">CHARMOD</a>], <a href="http://www.w3.org/TR/charmod/#sec-FullyNormalized">4.2.3) utf-8 encoded documents could print more quickly than documents in other forms and encodings. </p> I hope that this section will help discourage UTF-16. Jim
FOLLOWUP 20:
From: Henri Sivonen <hsivonen@iki.fi> On Thursday, Oct 16, 2003, at 01:20 Europe/Helsinki, don@lexmark.com wrote: > The real problem is that the entire XML architecture was designed > assuming > high end boxes like the 3 GHz Pentium with 512 megabytes of memory. Lesser devices can host expat. However, if a device can't host expat, perhaps it would be better to use something other than XML to communicate with the device. > We have already seen push back in other standards groups that consumer > electronic devices and other smaller, lighter devices cannot afford > all the > luxuries demand by an obese XML architecture. Unless the XML community > accepts subsetting, we can't expect the broadest support for XML to > happen > at the low end until the price/performance ratios experience another > order > or two magnitude improvement. If you subset XML, is support for the subset support for XML? What's the point of building a language on application-specific almost-XML? A Language built on such almost-XML breaks expectations (either in software or in the minds of people who need to deal with the language). If you can't use tools that are based on the assumption that the data they process is *exactly* XML and the programmers' knowledge about XML isn't guaranteed to apply, wouldn't it be less confusing to invent another grammar entirely and not call it XML? A well-defined extended subset of XML (for example: UTF-8 only, normalization form C only, no doctype, no PIs, no CDATA sections, no epilog, all HTML character entities predefined, namespace processing mandatory) would be more useful that having specs layered on top of XML 1.0 trying to readjust what XML 1.0 is. XHTML-Print printers get data over HTTP which is over TCP. It would be ludicrous to tweak the TCP header format in the XHTML-Print spec. > I know I will lose this argument in the W3C but the realities of the > XHTML-Print implementations will blow off UTF-16 as more fat with no > benefit and simply not support it, "interoperable" or not. Converting UTF-16 to UTF-8 really isn't a big deal. It's basically a matter of shifting bits. Considering eliminating fat, I'd much rather eliminate character entities[1] and references to the external DTD subset[2]. Character entities are a burden in any case. They require either processing the external DTD subset (bad for execution speed and memory requirements) or implementing an extra feature which doesn't belong in an XML processor (bad for conformance and yet redundant since there are conforming ways of representing characters). [1] http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML- Print?id=6776;user=guest [2] http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML- Print?id=6773;user=guest -- Henri Sivonen hsivonen@iki.fi http://www.iki.fi/hsivonen/
FOLLOWUP 21:
From: don@lexmark.com
Steven:
I think your answer proves my point that the XML commmunity did not and
does not consider the limitations of low cost, constrained embedded
environments when developing XML.
You make the assertion that no extra memory is required yet the reality is
quite the opposite.
Please tell me if I'm wrong, but my understanding of UTF-8 and UTF-16 is
that:
1) Every XHTML tag will require twice as many bytes when represented in
UTF-16 versus UTF-8
2) Every English XHTML-Print print job will be twice as big encoded with
UTF-16 versus UTF-8
3) Every "Latin 1" print job will be larger approaching 2X in size.
When you double the data's size, buffers have to double to be able to hold
and manipulate an equivalent amount of print stream content. There is real
cost and performance costs to be paid to deal with UTF-16 encoding
especially when dealing with western character sets. When a device is
designed to deal with the far east "characters" there are other penalties
to be paid in things like the size of the font load that mitigate the
UTF-16 versus UTF-8 encoding issue.
*******************************************
Don Wright don@lexmark.com
Chair, IEEE SA Standards Board
Member, IEEE-ISTO Board of Directors
f.wright@ieee.org / f.wright@computer.org
Director, Alliances and Standards
Lexmark International
740 New Circle Rd C14/082-3
Lexington, Ky 40550
859-825-4808 (phone) 603-963-8352 (fax)
*******************************************
"Steven Pemberton" <steven.pemberton@cwi.nl> on 10/15/2003 07:26:24 PM
To: <don@lexmark.com>
cc: "BIGELOW,JIM \(HP-Boise,ex1\)" <jim.bigelow@hp.com>,
<w3c-html-wg@w3.org>, <don@lexmark.com>,
<voyager-issues@mn.aptest.com>, <elliott.bradshaw@zoran.com>,
<www-html@w3.org>
Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
But support for UTF 16 adds a few dozen bytes of code, and no extra memory
requirements. It is simpler than UTF 8! What's the problem?
Steven
----- Original Message -----
From: <don@lexmark.com>
To: "Steven Pemberton" <Steven.Pemberton@cwi.nl>
Cc: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>;
<w3c-html-wg@w3.org>;
<don@lexmark.com>; <voyager-issues@mn.aptest.com>;
<elliott.bradshaw@zoran.com>; <www-html@w3.org>
Sent: Thursday, October 16, 2003 12:20 AM
Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
>
> Steven, et al:
>
> The real problem is that the entire XML architecture was designed
assuming
> high end boxes like the 3 GHz Pentium with 512 megabytes of memory. We
> have already seen push back in other standards groups that consumer
> electronic devices and other smaller, lighter devices cannot afford all
the
> luxuries demand by an obese XML architecture. Unless the XML community
> accepts subsetting, we can't expect the broadest support for XML to
happen
> at the low end until the price/performance ratios experience another
order
> or two magnitude improvement. As recently reported in several of the
trade
> magazines focused on IT professionals, the deployment of XML and Web
> Services are have significant negative impacts on the IT infrastructure
> especially in the area of bandwidth utilization. This is just another
> symptom of the same problem.
>
> I know I will lose this argument in the W3C but the realities of the
> XHTML-Print implementations will blow off UTF-16 as more fat with no
> benefit and simply not support it, "interoperable" or not.
>
> Sorry I'm not pure but practical.
>
> *******************************************
> Don Wright don@lexmark.com
>
> Chair, IEEE SA Standards Board
> Member, IEEE-ISTO Board of Directors
> f.wright@ieee.org / f.wright@computer.org
>
> Director, Alliances and Standards
> Lexmark International
> 740 New Circle Rd C14/082-3
> Lexington, Ky 40550
> 859-825-4808 (phone) 603-963-8352 (fax)
> *******************************************
>
>
>
>
> "Steven Pemberton" <Steven.Pemberton@cwi.nl> on 10/15/2003 09:18:15 AM
>
> To: "BIGELOW,JIM \(HP-Boise,ex1\)" <jim.bigelow@hp.com>,
> <w3c-html-wg@w3.org>, <don@lexmark.com>
> cc: <voyager-issues@mn.aptest.com>, <elliott.bradshaw@zoran.com>,
> <www-html@w3.org>
> Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
>
>
> > From: don@lexmark.com [mailto:don@lexmark.com]
>
> > So let me understand this....
> >
> > Because people have poorly designed and written XML applications
running
> on
> > 3 GHz Pentium 4s with 512 megabytes of real memory that do not allow
the
> > control over whether UTF-8 or UTF-16 are emitted, we are expecting to
> burden
> > $49 printers with code to be able to detect and interpret both.
>
> No Don. It is about interoperability and conforming to standards. XML
> allows
> documents to be encoded in either UTF8 or UTF 16: consumers must accept
> both, producers may produce either. An XHTML-Print printer will be just a
> consumer of an XML byte-stream at some IP address; we don't want to
burden
> every program in the world that can produce XML with a switch that says
> "this output is going to a poor lowly XHTML Print processor that can't
deal
> with UTF-16, so please produce UTF-8", especially since UTF 16 is the
easy
> one to implement, and can only cost a few dozen bytes at best.
>
> If we changed this, XHTML Print would have to go back to last call, and
you
> can bet your boots that the XML community would rise up against us, as it
> has in the past, and I can tell you we don't want to go there, and we
would
> have a hundred people registering objections.
>
> Conforming to XML requirements comes with the territory of being XHTML.
The
> XML community will not take lightly to us messing with their standards.
>
> Best wishes,
>
> Steven Pemberton
>
>
>
>
>
>
>
FOLLOWUP 22:
From: "Steven Pemberton" <steven.pemberton@cwi.nl> Don, I've been wondering for a long time if that was the misunderstanding, but I was assured it wasn't. UTF 16 and UTF 8 are *external* representations. The internal amount of storage needed for them is identical, and completely up to you how you store. The only extra memory needed is the couple of dozen extra bytes of code to convert UTF 16 into whatever internal representation you use. Best wishes, Steven ----- Original Message ----- From: <don@lexmark.com> To: "Steven Pemberton" <steven.pemberton@cwi.nl> Cc: <don@lexmark.com>; "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>; <w3c-html-wg@w3.org>; <voyager-issues@mn.aptest.com>; <elliott.bradshaw@zoran.com>; <www-html@w3.org> Sent: Thursday, October 16, 2003 2:51 PM Subject: Re: allow UTF-16 not just UTF-8 (PR#6774) > > > Steven: > > I think your answer proves my point that the XML commmunity did not and > does not consider the limitations of low cost, constrained embedded > environments when developing XML. > > You make the assertion that no extra memory is required yet the reality is > quite the opposite. > > Please tell me if I'm wrong, but my understanding of UTF-8 and UTF-16 is > that: > > 1) Every XHTML tag will require twice as many bytes when represented in > UTF-16 versus UTF-8 > 2) Every English XHTML-Print print job will be twice as big encoded with > UTF-16 versus UTF-8 > 3) Every "Latin 1" print job will be larger approaching 2X in size. > > When you double the data's size, buffers have to double to be able to hold > and manipulate an equivalent amount of print stream content. There is real > cost and performance costs to be paid to deal with UTF-16 encoding > especially when dealing with western character sets. When a device is > designed to deal with the far east "characters" there are other penalties > to be paid in things like the size of the font load that mitigate the > UTF-16 versus UTF-8 encoding issue. > > ******************************************* > Don Wright don@lexmark.com > > Chair, IEEE SA Standards Board > Member, IEEE-ISTO Board of Directors > f.wright@ieee.org / f.wright@computer.org > > Director, Alliances and Standards > Lexmark International > 740 New Circle Rd C14/082-3 > Lexington, Ky 40550 > 859-825-4808 (phone) 603-963-8352 (fax) > ******************************************* > > > > > > "Steven Pemberton" <steven.pemberton@cwi.nl> on 10/15/2003 07:26:24 PM > > To: <don@lexmark.com> > cc: "BIGELOW,JIM \(HP-Boise,ex1\)" <jim.bigelow@hp.com>, > <w3c-html-wg@w3.org>, <don@lexmark.com>, > <voyager-issues@mn.aptest.com>, <elliott.bradshaw@zoran.com>, > <www-html@w3.org> > Subject: Re: allow UTF-16 not just UTF-8 (PR#6774) > > > But support for UTF 16 adds a few dozen bytes of code, and no extra memory > requirements. It is simpler than UTF 8! What's the problem? > > Steven > > ----- Original Message ----- > From: <don@lexmark.com> > To: "Steven Pemberton" <Steven.Pemberton@cwi.nl> > Cc: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>; > <w3c-html-wg@w3.org>; > <don@lexmark.com>; <voyager-issues@mn.aptest.com>; > <elliott.bradshaw@zoran.com>; <www-html@w3.org> > Sent: Thursday, October 16, 2003 12:20 AM > Subject: Re: allow UTF-16 not just UTF-8 (PR#6774) > > > > > > Steven, et al: > > > > The real problem is that the entire XML architecture was designed > assuming > > high end boxes like the 3 GHz Pentium with 512 megabytes of memory. We > > have already seen push back in other standards groups that consumer > > electronic devices and other smaller, lighter devices cannot afford all > the > > luxuries demand by an obese XML architecture. Unless the XML community > > accepts subsetting, we can't expect the broadest support for XML to > happen > > at the low end until the price/performance ratios experience another > order > > or two magnitude improvement. As recently reported in several of the > trade > > magazines focused on IT professionals, the deployment of XML and Web > > Services are have significant negative impacts on the IT infrastructure > > especially in the area of bandwidth utilization. This is just another > > symptom of the same problem. > > > > I know I will lose this argument in the W3C but the realities of the > > XHTML-Print implementations will blow off UTF-16 as more fat with no > > benefit and simply not support it, "interoperable" or not. > > > > Sorry I'm not pure but practical. > > > > ******************************************* > > Don Wright don@lexmark.com > > > > Chair, IEEE SA Standards Board > > Member, IEEE-ISTO Board of Directors > > f.wright@ieee.org / f.wright@computer.org > > > > Director, Alliances and Standards > > Lexmark International > > 740 New Circle Rd C14/082-3 > > Lexington, Ky 40550 > > 859-825-4808 (phone) 603-963-8352 (fax) > > ******************************************* > > > > > > > > > > "Steven Pemberton" <Steven.Pemberton@cwi.nl> on 10/15/2003 09:18:15 AM > > > > To: "BIGELOW,JIM \(HP-Boise,ex1\)" <jim.bigelow@hp.com>, > > <w3c-html-wg@w3.org>, <don@lexmark.com> > > cc: <voyager-issues@mn.aptest.com>, <elliott.bradshaw@zoran.com>, > > <www-html@w3.org> > > Subject: Re: allow UTF-16 not just UTF-8 (PR#6774) > > > > > > > From: don@lexmark.com [mailto:don@lexmark.com] > > > > > So let me understand this.... > > > > > > Because people have poorly designed and written XML applications > running > > on > > > 3 GHz Pentium 4s with 512 megabytes of real memory that do not allow > the > > > control over whether UTF-8 or UTF-16 are emitted, we are expecting to > > burden > > > $49 printers with code to be able to detect and interpret both. > > > > No Don. It is about interoperability and conforming to standards. XML > > allows > > documents to be encoded in either UTF8 or UTF 16: consumers must accept > > both, producers may produce either. An XHTML-Print printer will be just a > > consumer of an XML byte-stream at some IP address; we don't want to > burden > > every program in the world that can produce XML with a switch that says > > "this output is going to a poor lowly XHTML Print processor that can't > deal > > with UTF-16, so please produce UTF-8", especially since UTF 16 is the > easy > > one to implement, and can only cost a few dozen bytes at best. > > > > If we changed this, XHTML Print would have to go back to last call, and > you > > can bet your boots that the XML community would rise up against us, as it > > has in the past, and I can tell you we don't want to go there, and we > would > > have a hundred people registering objections. > > > > Conforming to XML requirements comes with the territory of being XHTML. > The > > XML community will not take lightly to us messing with their standards. > > > > Best wishes, > > > > Steven Pemberton > > > > > > > > > > > > > > > > > > > > >
FOLLOWUP 23:
From: Rowland Shaw <Rowland.Shaw@crystaldecisions.com>
...and for every Asian language, each character can take up to three bytes
(in UTF-8 vs. two in UTF-16)
Taking a complete random Japanese character (Hiragana Letter Small A)
U+3041, in UTF-8 as 0xE3 0x81 0x81 -- this assumes that you are willing to
deal with characters as a MBCS, and that you aren't going to convert to UCS2
internally.
English has the biggest saving by saving as UTF-8 (so let it), but for most
other languages, there is no benefit or worse, a 50% growth in sizes (vs.
UTF-16).
If UTF-16 is disallowed, it's no longer an XML application (which may be a
road to go down) by definition on the minimum bar set for XML (back in the
days of 486's and 8Mb machines). Thinking about it, my printer nowadays at
home has more RAM in it than my PC when XML was being created...
-----Original Message-----
From: don@lexmark.com [mailto:don@lexmark.com]
Sent: 16 October 2003 14:00
To: Steven Pemberton
Cc: don@lexmark.com; BIGELOW,JIM (HP-Boise,ex1); w3c-html-wg@w3.org;
voyager-issues@mn.aptest.com; elliott.bradshaw@zoran.com; www-html@w3.org
Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
Steven:
I think your answer proves my point that the XML commmunity did not and
does not consider the limitations of low cost, constrained embedded
environments when developing XML.
You make the assertion that no extra memory is required yet the reality is
quite the opposite.
Please tell me if I'm wrong, but my understanding of UTF-8 and UTF-16 is
that:
1) Every XHTML tag will require twice as many bytes when represented in
UTF-16 versus UTF-8
2) Every English XHTML-Print print job will be twice as big encoded with
UTF-16 versus UTF-8
3) Every "Latin 1" print job will be larger approaching 2X in size.
When you double the data's size, buffers have to double to be able to hold
and manipulate an equivalent amount of print stream content. There is real
cost and performance costs to be paid to deal with UTF-16 encoding
especially when dealing with western character sets. When a device is
designed to deal with the far east "characters" there are other penalties
to be paid in things like the size of the font load that mitigate the
UTF-16 versus UTF-8 encoding issue.
*******************************************
Don Wright don@lexmark.com
Chair, IEEE SA Standards Board
Member, IEEE-ISTO Board of Directors
f.wright@ieee.org / f.wright@computer.org
Director, Alliances and Standards
Lexmark International
740 New Circle Rd C14/082-3
Lexington, Ky 40550
859-825-4808 (phone) 603-963-8352 (fax)
*******************************************
"Steven Pemberton" <steven.pemberton@cwi.nl> on 10/15/2003 07:26:24 PM
To: <don@lexmark.com>
cc: "BIGELOW,JIM \(HP-Boise,ex1\)" <jim.bigelow@hp.com>,
<w3c-html-wg@w3.org>, <don@lexmark.com>,
<voyager-issues@mn.aptest.com>, <elliott.bradshaw@zoran.com>,
<www-html@w3.org>
Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
But support for UTF 16 adds a few dozen bytes of code, and no extra memory
requirements. It is simpler than UTF 8! What's the problem?
Steven
----- Original Message -----
From: <don@lexmark.com>
To: "Steven Pemberton" <Steven.Pemberton@cwi.nl>
Cc: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>;
<w3c-html-wg@w3.org>;
<don@lexmark.com>; <voyager-issues@mn.aptest.com>;
<elliott.bradshaw@zoran.com>; <www-html@w3.org>
Sent: Thursday, October 16, 2003 12:20 AM
Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
>
> Steven, et al:
>
> The real problem is that the entire XML architecture was designed
assuming
> high end boxes like the 3 GHz Pentium with 512 megabytes of memory. We
> have already seen push back in other standards groups that consumer
> electronic devices and other smaller, lighter devices cannot afford all
the
> luxuries demand by an obese XML architecture. Unless the XML community
> accepts subsetting, we can't expect the broadest support for XML to
happen
> at the low end until the price/performance ratios experience another
order
> or two magnitude improvement. As recently reported in several of the
trade
> magazines focused on IT professionals, the deployment of XML and Web
> Services are have significant negative impacts on the IT infrastructure
> especially in the area of bandwidth utilization. This is just another
> symptom of the same problem.
>
> I know I will lose this argument in the W3C but the realities of the
> XHTML-Print implementations will blow off UTF-16 as more fat with no
> benefit and simply not support it, "interoperable" or not.
>
> Sorry I'm not pure but practical.
>
> *******************************************
> Don Wright don@lexmark.com
>
> Chair, IEEE SA Standards Board
> Member, IEEE-ISTO Board of Directors
> f.wright@ieee.org / f.wright@computer.org
>
> Director, Alliances and Standards
> Lexmark International
> 740 New Circle Rd C14/082-3
> Lexington, Ky 40550
> 859-825-4808 (phone) 603-963-8352 (fax)
> *******************************************
>
>
>
>
> "Steven Pemberton" <Steven.Pemberton@cwi.nl> on 10/15/2003 09:18:15 AM
>
> To: "BIGELOW,JIM \(HP-Boise,ex1\)" <jim.bigelow@hp.com>,
> <w3c-html-wg@w3.org>, <don@lexmark.com>
> cc: <voyager-issues@mn.aptest.com>, <elliott.bradshaw@zoran.com>,
> <www-html@w3.org>
> Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
>
>
> > From: don@lexmark.com [mailto:don@lexmark.com]
>
> > So let me understand this....
> >
> > Because people have poorly designed and written XML applications
running
> on
> > 3 GHz Pentium 4s with 512 megabytes of real memory that do not allow
the
> > control over whether UTF-8 or UTF-16 are emitted, we are expecting to
> burden
> > $49 printers with code to be able to detect and interpret both.
>
> No Don. It is about interoperability and conforming to standards. XML
> allows
> documents to be encoded in either UTF8 or UTF 16: consumers must accept
> both, producers may produce either. An XHTML-Print printer will be just a
> consumer of an XML byte-stream at some IP address; we don't want to
burden
> every program in the world that can produce XML with a switch that says
> "this output is going to a poor lowly XHTML Print processor that can't
deal
> with UTF-16, so please produce UTF-8", especially since UTF 16 is the
easy
> one to implement, and can only cost a few dozen bytes at best.
>
> If we changed this, XHTML Print would have to go back to last call, and
you
> can bet your boots that the XML community would rise up against us, as it
> has in the past, and I can tell you we don't want to go there, and we
would
> have a hundred people registering objections.
>
> Conforming to XML requirements comes with the territory of being XHTML.
The
> XML community will not take lightly to us messing with their standards.
>
> Best wishes,
>
> Steven Pemberton
>
>
>
>
>
>
>
FOLLOWUP 24:
From: elliott.bradshaw@zoran.com
Don,
I agree with the argument that a front end can convert from UTF-16 to UTF-8
or whatever internal form is used, and have essentially no impact on memory
needs.
"A couple of dozen bytes" might be a little optimistic for this logic :^)
, but it's pretty straightforward:
-look at first 16 bits to detect a UTF-16 mark
-for each double byte emit the UTF-8 (or other) equivalent
Of course a printer could choose to store Asian data differently than
Latin, and save some space compared to native UTF-8. This decision is
orthogonal to the form of the input. But this logic may not be worth it
and is not needed for compliance.
Frugally,
Elliott
--------------------------------------------------------------------------------
Elliott Bradshaw
Director, Software Engineering
Zoran Imaging Division (formerly Oak Technology Imaging Group)
781 638-7534
Rowland Shaw
<Rowland.Shaw@crystaldeci To: "'don@lexmark.com'" <don@lexmark.com>, Steven
sions.com> Pemberton <steven.pemberton@cwi.nl>
cc: "BIGELOW,JIM (HP-Boise,ex1)"
10/16/2003 09:16 AM <jim.bigelow@hp.com>, w3c-html-wg@w3.org,
voyager-issues@mn.aptest.com,
elliott.bradshaw@zoran.com, www-html@w3.org
Subject: RE: allow UTF-16 not just UTF-8
(PR#6774)
...and for every Asian language, each character can take up to three bytes
(in UTF-8 vs. two in UTF-16)
Taking a complete random Japanese character (Hiragana Letter Small A)
U+3041, in UTF-8 as 0xE3 0x81 0x81 -- this assumes that you are willing to
deal with characters as a MBCS, and that you aren't going to convert to
UCS2
internally.
English has the biggest saving by saving as UTF-8 (so let it), but for most
other languages, there is no benefit or worse, a 50% growth in sizes (vs.
UTF-16).
If UTF-16 is disallowed, it's no longer an XML application (which may be a
road to go down) by definition on the minimum bar set for XML (back in the
days of 486's and 8Mb machines). Thinking about it, my printer nowadays at
home has more RAM in it than my PC when XML was being created...
-----Original Message-----
From: don@lexmark.com [mailto:don@lexmark.com]
Sent: 16 October 2003 14:00
To: Steven Pemberton
Cc: don@lexmark.com; BIGELOW,JIM (HP-Boise,ex1); w3c-html-wg@w3.org;
voyager-issues@mn.aptest.com; elliott.bradshaw@zoran.com; www-html@w3.org
Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
Steven:
I think your answer proves my point that the XML commmunity did not and
does not consider the limitations of low cost, constrained embedded
environments when developing XML.
You make the assertion that no extra memory is required yet the reality is
quite the opposite.
Please tell me if I'm wrong, but my understanding of UTF-8 and UTF-16 is
that:
1) Every XHTML tag will require twice as many bytes when represented in
UTF-16 versus UTF-8
2) Every English XHTML-Print print job will be twice as big encoded with
UTF-16 versus UTF-8
3) Every "Latin 1" print job will be larger approaching 2X in size.
When you double the data's size, buffers have to double to be able to hold
and manipulate an equivalent amount of print stream content. There is real
cost and performance costs to be paid to deal with UTF-16 encoding
especially when dealing with western character sets. When a device is
designed to deal with the far east "characters" there are other penalties
to be paid in things like the size of the font load that mitigate the
UTF-16 versus UTF-8 encoding issue.
*******************************************
Don Wright don@lexmark.com
Chair, IEEE SA Standards Board
Member, IEEE-ISTO Board of Directors
f.wright@ieee.org / f.wright@computer.org
Director, Alliances and Standards
Lexmark International
740 New Circle Rd C14/082-3
Lexington, Ky 40550
859-825-4808 (phone) 603-963-8352 (fax)
*******************************************
"Steven Pemberton" <steven.pemberton@cwi.nl> on 10/15/2003 07:26:24 PM
To: <don@lexmark.com>
cc: "BIGELOW,JIM \(HP-Boise,ex1\)" <jim.bigelow@hp.com>,
<w3c-html-wg@w3.org>, <don@lexmark.com>,
<voyager-issues@mn.aptest.com>, <elliott.bradshaw@zoran.com>,
<www-html@w3.org>
Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
But support for UTF 16 adds a few dozen bytes of code, and no extra memory
requirements. It is simpler than UTF 8! What's the problem?
Steven
----- Original Message -----
From: <don@lexmark.com>
To: "Steven Pemberton" <Steven.Pemberton@cwi.nl>
Cc: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>;
<w3c-html-wg@w3.org>;
<don@lexmark.com>; <voyager-issues@mn.aptest.com>;
<elliott.bradshaw@zoran.com>; <www-html@w3.org>
Sent: Thursday, October 16, 2003 12:20 AM
Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
>
> Steven, et al:
>
> The real problem is that the entire XML architecture was designed
assuming
> high end boxes like the 3 GHz Pentium with 512 megabytes of memory. We
> have already seen push back in other standards groups that consumer
> electronic devices and other smaller, lighter devices cannot afford all
the
> luxuries demand by an obese XML architecture. Unless the XML community
> accepts subsetting, we can't expect the broadest support for XML to
happen
> at the low end until the price/performance ratios experience another
order
> or two magnitude improvement. As recently reported in several of the
trade
> magazines focused on IT professionals, the deployment of XML and Web
> Services are have significant negative impacts on the IT infrastructure
> especially in the area of bandwidth utilization. This is just another
> symptom of the same problem.
>
> I know I will lose this argument in the W3C but the realities of the
> XHTML-Print implementations will blow off UTF-16 as more fat with no
> benefit and simply not support it, "interoperable" or not.
>
> Sorry I'm not pure but practical.
>
> *******************************************
> Don Wright don@lexmark.com
>
> Chair, IEEE SA Standards Board
> Member, IEEE-ISTO Board of Directors
> f.wright@ieee.org / f.wright@computer.org
>
> Director, Alliances and Standards
> Lexmark International
> 740 New Circle Rd C14/082-3
> Lexington, Ky 40550
> 859-825-4808 (phone) 603-963-8352 (fax)
> *******************************************
>
>
>
>
> "Steven Pemberton" <Steven.Pemberton@cwi.nl> on 10/15/2003 09:18:15 AM
>
> To: "BIGELOW,JIM \(HP-Boise,ex1\)" <jim.bigelow@hp.com>,
> <w3c-html-wg@w3.org>, <don@lexmark.com>
> cc: <voyager-issues@mn.aptest.com>, <elliott.bradshaw@zoran.com>,
> <www-html@w3.org>
> Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
>
>
> > From: don@lexmark.com [mailto:don@lexmark.com]
>
> > So let me understand this....
> >
> > Because people have poorly designed and written XML applications
running
> on
> > 3 GHz Pentium 4s with 512 megabytes of real memory that do not allow
the
> > control over whether UTF-8 or UTF-16 are emitted, we are expecting to
> burden
> > $49 printers with code to be able to detect and interpret both.
>
> No Don. It is about interoperability and conforming to standards. XML
> allows
> documents to be encoded in either UTF8 or UTF 16: consumers must accept
> both, producers may produce either. An XHTML-Print printer will be just a
> consumer of an XML byte-stream at some IP address; we don't want to
burden
> every program in the world that can produce XML with a switch that says
> "this output is going to a poor lowly XHTML Print processor that can't
deal
> with UTF-16, so please produce UTF-8", especially since UTF 16 is the
easy
> one to implement, and can only cost a few dozen bytes at best.
>
> If we changed this, XHTML Print would have to go back to last call, and
you
> can bet your boots that the XML community would rise up against us, as it
> has in the past, and I can tell you we don't want to go there, and we
would
> have a hundred people registering objections.
>
> Conforming to XML requirements comes with the territory of being XHTML.
The
> XML community will not take lightly to us messing with their standards.
>
> Best wishes,
>
> Steven Pemberton
>
>
>
>
>
>
>
FOLLOWUP 25:
From: don@lexmark.com
Steven:
Of course I knew this was jsut the external representation.
I'm trying to reduce conversions and reduce the sizes of buffers, etc.
necessary to do this work. I have no doubt it can be done, I'm just trying
to do things with smaller less powerful processors and with less available
memory than what programmers normally expect to be available in today's
environment.
*******************************************
Don Wright don@lexmark.com
Chair, IEEE SA Standards Board
Member, IEEE-ISTO Board of Directors
f.wright@ieee.org / f.wright@computer.org
Director, Alliances and Standards
Lexmark International
740 New Circle Rd C14/082-3
Lexington, Ky 40550
859-825-4808 (phone) 603-963-8352 (fax)
*******************************************
"Steven Pemberton" <steven.pemberton@cwi.nl> on 10/16/2003 09:10:59 AM
To: <don@lexmark.com>
cc: <don@lexmark.com>, "BIGELOW,JIM \(HP-Boise,ex1\)"
<jim.bigelow@hp.com>, <w3c-html-wg@w3.org>,
<voyager-issues@mn.aptest.com>, <elliott.bradshaw@zoran.com>,
<www-html@w3.org>
Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
Don,
I've been wondering for a long time if that was the misunderstanding, but I
was assured it wasn't.
UTF 16 and UTF 8 are *external* representations. The internal amount of
storage needed for them is identical, and completely up to you how you
store.
The only extra memory needed is the couple of dozen extra bytes of code to
convert UTF 16 into whatever internal representation you use.
Best wishes,
Steven
----- Original Message -----
From: <don@lexmark.com>
To: "Steven Pemberton" <steven.pemberton@cwi.nl>
Cc: <don@lexmark.com>; "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>;
<w3c-html-wg@w3.org>; <voyager-issues@mn.aptest.com>;
<elliott.bradshaw@zoran.com>; <www-html@w3.org>
Sent: Thursday, October 16, 2003 2:51 PM
Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
>
>
> Steven:
>
> I think your answer proves my point that the XML commmunity did not and
> does not consider the limitations of low cost, constrained embedded
> environments when developing XML.
>
> You make the assertion that no extra memory is required yet the reality
is
> quite the opposite.
>
> Please tell me if I'm wrong, but my understanding of UTF-8 and UTF-16 is
> that:
>
> 1) Every XHTML tag will require twice as many bytes when represented in
> UTF-16 versus UTF-8
> 2) Every English XHTML-Print print job will be twice as big encoded with
> UTF-16 versus UTF-8
> 3) Every "Latin 1" print job will be larger approaching 2X in size.
>
> When you double the data's size, buffers have to double to be able to
hold
> and manipulate an equivalent amount of print stream content. There is
real
> cost and performance costs to be paid to deal with UTF-16 encoding
> especially when dealing with western character sets. When a device is
> designed to deal with the far east "characters" there are other penalties
> to be paid in things like the size of the font load that mitigate the
> UTF-16 versus UTF-8 encoding issue.
>
> *******************************************
> Don Wright don@lexmark.com
>
> Chair, IEEE SA Standards Board
> Member, IEEE-ISTO Board of Directors
> f.wright@ieee.org / f.wright@computer.org
>
> Director, Alliances and Standards
> Lexmark International
> 740 New Circle Rd C14/082-3
> Lexington, Ky 40550
> 859-825-4808 (phone) 603-963-8352 (fax)
> *******************************************
>
>
>
>
>
> "Steven Pemberton" <steven.pemberton@cwi.nl> on 10/15/2003 07:26:24 PM
>
> To: <don@lexmark.com>
> cc: "BIGELOW,JIM \(HP-Boise,ex1\)" <jim.bigelow@hp.com>,
> <w3c-html-wg@w3.org>, <don@lexmark.com>,
> <voyager-issues@mn.aptest.com>, <elliott.bradshaw@zoran.com>,
> <www-html@w3.org>
> Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
>
>
> But support for UTF 16 adds a few dozen bytes of code, and no extra
memory
> requirements. It is simpler than UTF 8! What's the problem?
>
> Steven
>
> ----- Original Message -----
> From: <don@lexmark.com>
> To: "Steven Pemberton" <Steven.Pemberton@cwi.nl>
> Cc: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>;
> <w3c-html-wg@w3.org>;
> <don@lexmark.com>; <voyager-issues@mn.aptest.com>;
> <elliott.bradshaw@zoran.com>; <www-html@w3.org>
> Sent: Thursday, October 16, 2003 12:20 AM
> Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
>
>
> >
> > Steven, et al:
> >
> > The real problem is that the entire XML architecture was designed
> assuming
> > high end boxes like the 3 GHz Pentium with 512 megabytes of memory. We
> > have already seen push back in other standards groups that consumer
> > electronic devices and other smaller, lighter devices cannot afford all
> the
> > luxuries demand by an obese XML architecture. Unless the XML community
> > accepts subsetting, we can't expect the broadest support for XML to
> happen
> > at the low end until the price/performance ratios experience another
> order
> > or two magnitude improvement. As recently reported in several of the
> trade
> > magazines focused on IT professionals, the deployment of XML and Web
> > Services are have significant negative impacts on the IT infrastructure
> > especially in the area of bandwidth utilization. This is just another
> > symptom of the same problem.
> >
> > I know I will lose this argument in the W3C but the realities of the
> > XHTML-Print implementations will blow off UTF-16 as more fat with no
> > benefit and simply not support it, "interoperable" or not.
> >
> > Sorry I'm not pure but practical.
> >
> > *******************************************
> > Don Wright don@lexmark.com
> >
> > Chair, IEEE SA Standards Board
> > Member, IEEE-ISTO Board of Directors
> > f.wright@ieee.org / f.wright@computer.org
> >
> > Director, Alliances and Standards
> > Lexmark International
> > 740 New Circle Rd C14/082-3
> > Lexington, Ky 40550
> > 859-825-4808 (phone) 603-963-8352 (fax)
> > *******************************************
> >
> >
> >
> >
> > "Steven Pemberton" <Steven.Pemberton@cwi.nl> on 10/15/2003 09:18:15 AM
> >
> > To: "BIGELOW,JIM \(HP-Boise,ex1\)" <jim.bigelow@hp.com>,
> > <w3c-html-wg@w3.org>, <don@lexmark.com>
> > cc: <voyager-issues@mn.aptest.com>, <elliott.bradshaw@zoran.com>,
> > <www-html@w3.org>
> > Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
> >
> >
> > > From: don@lexmark.com [mailto:don@lexmark.com]
> >
> > > So let me understand this....
> > >
> > > Because people have poorly designed and written XML applications
> running
> > on
> > > 3 GHz Pentium 4s with 512 megabytes of real memory that do not allow
> the
> > > control over whether UTF-8 or UTF-16 are emitted, we are expecting to
> > burden
> > > $49 printers with code to be able to detect and interpret both.
> >
> > No Don. It is about interoperability and conforming to standards. XML
> > allows
> > documents to be encoded in either UTF8 or UTF 16: consumers must accept
> > both, producers may produce either. An XHTML-Print printer will be just
a
> > consumer of an XML byte-stream at some IP address; we don't want to
> burden
> > every program in the world that can produce XML with a switch that says
> > "this output is going to a poor lowly XHTML Print processor that can't
> deal
> > with UTF-16, so please produce UTF-8", especially since UTF 16 is the
> easy
> > one to implement, and can only cost a few dozen bytes at best.
> >
> > If we changed this, XHTML Print would have to go back to last call, and
> you
> > can bet your boots that the XML community would rise up against us, as
it
> > has in the past, and I can tell you we don't want to go there, and we
> would
> > have a hundred people registering objections.
> >
> > Conforming to XML requirements comes with the territory of being XHTML.
> The
> > XML community will not take lightly to us messing with their standards.
> >
> > Best wishes,
> >
> > Steven Pemberton
> >
> >
> >
> >
> >
> >
> >
>
>
>
>
>
>
>
FOLLOWUP 26:
From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>
Don and Steven,
I want to expand on what you have said:
Don wrote:
> > 1) Every XHTML tag will require twice as many bytes when
> > represented in UTF-16 versus UTF-8
> > 2) Every English XHTML-Print print job will be twice as
> > big encoded with UTF-16 versus UTF-8
> > 3) Every "Latin 1" print job will be larger approaching
> > 2X in size.
> >
> > When you double the data's size, buffers have to double to
> > be able to hold and manipulate an equivalent amount of print
> > stream content.
This statement is only true for some print streams. See the discussion below
in "The problem space".
Steven wrote:
> UTF 16 and UTF 8 are *external* representations. The internal
> amount of storage needed for them is identical, and
> completely up to you how you store.
If a printer uses 16 bits internally to represent a character, then there
shouldn't be a difference in buffering requirements between utf-8 and utf-16
encoded files (see below for a more complete discussion). However, if a
printer uses 8 bits per character, then it has restricted itself to only
handle a subset of possible documents, those with ASCII characters. This is
a product-specific decision akin to that of whether to make a device print
in color or black & white or support landscape as well as portrait printing.
Therefore, I suggest that the spec say that a printer should support utf-16,
just as it now says it should support CSS, landscape printing, and color --
within the limits of the device. If a user buys a low-cost device that can
only print ASCII characters in portrait orientation, without color, style
sheets, or images, hopefully the price was inline with the printer's
abilities and other, more expensive, more capable devices are available as
needed.
Jim
The problem space
----------------------
There is a document composition continuum from documents with only text,
through mixed text and images, to documents that contain only images. At
the text-only end of the continuum, the effects on the document size of
UTF-16 vs. UTF-8 is a doubling of document size. At the image-only end of
the continuum, the effects on the document size of encoding in UTF-16 versus
UTF-8 are over-shadowed by the image data.
The table below illustrates three points on the document composition
continuum:
1. Text-only: a document that prints as one page of ASCII text (times, 10pt,
8in by 11in paper) [1]. Size, in bytes, is 6,282.
2. Text & Image: a one page document with one 3in x 5in image (166.7K bytes)
and the remainder text [2]. Size, in bytes, of document and image is
171,531.
3. Image-only: a one page document with eight 2in x 3.25in images (703.2K
bytes) and no text. [3] Size, in bytes, of document and eight images is
705,108.
Size (bytes): utf-8: %doc : utf-16: %doc
Text-only: 6,282: 100 : 12,566: 100
Text+Image: 4,776: 3.2 : 9,554: 5.4 (9,554 /(9,954+166,675)* 100)
Image-only: 1,916: .27 : 3,834: .54
There is another point of variability: the characters in the text portions
of the document. This is another continuum from ASCII only at one end to
Japanese, Chinese, Korean, and Hindi at the other.
"Table 1: UTF types" of [4] gives the following average bytes per code point
utf-8 utf-16
English 1 2
Latin-1 1.1 2
Greek,
Russian,
Arabic,
Hebrew 1.7 2
Japanese,
Chinese
Korean
Hindi 3 2
As the language/script of the text portion of the document changes from
English-only toward other scripts and languages, the size difference between
utf-8 and utf-16 decreases.
End-to-end solution
-------------------
If you look at the end-to-end solution, from the sending application to the
printer, the stages can be thought of as:
1. Sending Device: the data as represented in the sending device (a cell
phone for example)
2. Transmission: the data combined with markup and style information as and
XHTML-Print data stream and then encoded in either UTF-8 or UTF-16
3. Receiving Device: the printer -- breaking this into two parts gives:
3.a The XHTML-Print data stream as received
3.b The data without markup and style information and before printing. How
the data is stored is implementation dependent and how much memory is used
depends on how a character is represented -- 8 or 16 bits, and how much
buffer of the document is buffered. Each printer makes these choices,
8bits/char restricted the documents processed to Latin1 characters.
Stage Size utf-8 utf-16
1. app n - -
2. xmit n n-3n* 2n
3a. Pr n n-3n 2n
3b. Pr** n n-2n n-2n
* n-3n shows the variable sizing depending on characters being encode:
English only (n), CJK (3n)
** at Stage 3b, representing a character with 8bits restricts the characters
that can be represented to ASCII or Latin 1, 16 bits can represent all
characters.
Internal representation
If a printer uses 16 bits internally to represent a character, then there
shouldn't be difference in buffering requirements between utf-8 and utf-16
encoded files. However, if a printer uses 8 bits, then it has restricted
itself to only handle a subset of documents. This is a product-specific
decision akin to that of supporting color or not. Therefore, I suggest that
the spec say that a printer should sup