Copyright © 2004 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply.
This document outlines the way in which the HTML Working Group addressed the comments submitted during the XHTML-Print Last Call Working Draft review period.
During the Last Call Working Draft review period for XHTML-Print a number of comments were received from both inside and outside of the W3C. This document summarizes those comments and describes the ways in which the comments were addressed by the HTML Working Group.
Note that the majority of this document is automatically generated from the Working Group's database of comments. As such, it may contain typographical or stylistic errors. If so, these are contained in the original submissions, and the HTML Working Group elected to not change these submissions.
This document is a product of the W3C's HTML Working Group. This document may be updated, replaced or rendered obsolete by other W3C documents at any time. It is inappropriate to use this document as reference material or to cite it as other than "work in progress". This document is work in progress and does not imply endorsement by the W3C membership.
This document has been produced as part of the W3C HTML Activity. The goals of the HTML Working Group (members only) are discussed in the HTML Working Group charter (members only).
Please send detailed comments on this document to www-html-editor@w3.org. We cannot guarantee a personal response, but we will try when it is appropriate. Public discussion on HTML features takes place on the mailing list www-html@w3.org.
A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR.
PROBLEM ID: 6492
STATE: Closed
RESOLUTION: Modify and Accept
USER POSITION: Agree
NOTES:
Fixed incorrect syntax of example
ORIGINAL MESSAGE:
From: Jun Fujisawa <fujisawa.jun@canon.co.jp> From: Jun Fujisawa <fujisawa.jun@canon.co.jp> To: www-html-editor@w3.org Cc: www-html@w3.org, xp@pwg.org, Jon Ferraiolo <jon.ferraiolo@adobe.com> Subject: Incorrect example in Appendix B.3 of XHTML Print Date: Fri, 25 Jul 2003 12:48:47 +0900 Message-Id: <p05111011bb4654080f6f@[172.23.45.13]> X-Archived-At: http://www.w3.org/mid/p05111011bb4654080f6f@%5B172.23.45.13%5D Hello HTML editors, Here is a comment to the last call draft for XHTML Print. At 6:28 PM +0200 03.7.24, Steven Pemberton wrote: >XHTMLT-Print >http://www.w3.org/MarkUp/Group/2003/WD-xhtml-print-20030723 Jon Ferraiolo of SVG WG found out that the example in Appendix B.3 looks strange since the two instances of 'object' element have the sample value for 'id' attribute in a single XML document. <object declare="declare" height="20 mm" width="20 mm" type="image/jpeg" id="image_1" > </object> . . . . <object id="image_1" data="data:image/jpeg;base64,aGh67Fghsapa0Hji7dfGSweTa . . . "> </object> I believe the example is not correct. Also, I think the choice of this particular example is not appropriate because we don't need to use the case for 'object' element with 'declare' attributes in order to show how we can include inline image data in XHTML-Print by using data URI scheme. I would like to suggest to replace this example by simpler ones such as the following: <object height="20 mm" width="20 mm" type="image/jpeg" data="data:image/jpeg;base64,aGh67Fghsapa0Hji7dfGSweTa . . . "> Example Image </object> or <img height="20 mm" width="20 mm" alt="Example Image" src="data:image/jpeg;base64,aGh67Fghsapa0Hji7dfGSweTa . . . " /> -- Jun Fujisawa <mailto:fujisawa.jun@canon.co.jp>
FOLLOWUP 1:
From: Masayasu Ishikawa <mimasa@w3.org> So, we are receiving Last Call comments even before publication. Great. Jim, do you think this is an easy-to-fix thing that we should just do it now (i.e. fix it and publish the Last Call WD, which should happen today), or leave it for now and fix later? -- Masayasu Ishikawa / mimasa@w3.org W3C - World Wide Web Consortium mimasa@w3.mag.keio.ac.jp wrote: > From: Jun Fujisawa <fujisawa.jun@canon.co.jp> > To: www-html-editor@w3.org > Cc: www-html@w3.org, xp@pwg.org, > Jon Ferraiolo <jon.ferraiolo@adobe.com> > Subject: Incorrect example in Appendix B.3 of XHTML Print > Date: Fri, 25 Jul 2003 12:48:47 +0900 > Message-Id: <p05111011bb4654080f6f@[172.23.45.13]> > X-Archived-At: http://www.w3.org/mid/p05111011bb4654080f6f@%5B172.23.45.13%5D > > Hello HTML editors, > > Here is a comment to the last call draft for XHTML Print. > > At 6:28 PM +0200 03.7.24, Steven Pemberton wrote: > >XHTMLT-Print > >http://www.w3.org/MarkUp/Group/2003/WD-xhtml-print-20030723 > > Jon Ferraiolo of SVG WG found out that the example in Appendix > B.3 looks strange since the two instances of 'object' element have > the sample value for 'id' attribute in a single XML document. > > <object declare="declare" > height="20 mm" width="20 mm" > type="image/jpeg" > id="image_1" > > </object> > > . . . . > > <object id="image_1" > data="data:image/jpeg;base64,aGh67Fghsapa0Hji7dfGSweTa . . . "> > </object> > > I believe the example is not correct. Also, I think the choice of this > particular example is not appropriate because we don't need to use > the case for 'object' element with 'declare' attributes in order to > show how we can include inline image data in XHTML-Print by using > data URI scheme. > > I would like to suggest to replace this example by simpler ones such > as the following: > > <object height="20 mm" width="20 mm" type="image/jpeg" > data="data:image/jpeg;base64,aGh67Fghsapa0Hji7dfGSweTa . . . "> > Example Image > </object> > > or > > <img height="20 mm" width="20 mm" alt="Example Image" > src="data:image/jpeg;base64,aGh67Fghsapa0Hji7dfGSweTa . . . " /> > > -- > Jun Fujisawa > <mailto:fujisawa.jun@canon.co.jp>
FOLLOWUP 2:
From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com> > From: don@lexmark.com [mailto:don@lexmark.com] > Sent: Friday, July 25, 2003 5:15 AM > To: Jun Fujisawa > Cc: xp@pwg.org; jim.bigelow@hp.com > Subject: Re: XP> Incorrect example in Appendix B.3 of XHTML Print > > > > Jun: > > The intent of this example was to show how an image can be > declared inline with the other XHTML while the actual data > for the image may come later. Neither of your two > alternatives separate the delaration of the image from the > actual data of the image. If the example provided is > incorrect, can you provide an example that achieves this separation? > > ********************************************** > Don Wright don@lexmark.com > > Chair, IEEE SA Standards Board > Member, IEEE-ISTO Board of Directors > f.wright@ieee.org / f.wright@computer.org > > Director, Alliances & Standards > Lexmark International > 740 New Circle Rd > Lexington, Ky 40550 > 859-825-4808 (phone) 603-963-8352 (fax) > ********************************************** >
FOLLOWUP 3:
From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com> > From: Jun Fujisawa [mailto:fujisawa.jun@canon.co.jp] Sent: Monday, July 28, 2003 3:44 AM To: don@lexmark.com Cc: xp@pwg.org; jim.bigelow@hp.com Subject: Re: XP> Incorrect example in Appendix B.3 of XHTML Print Hello Don, At 8:15 AM -0400 03.7.25, don@lexmark.com wrote: >The intent of this example was to show how an image can be declared >inline with the other XHTML while the actual data for the image may >come later. I don't understand the intent. I you want to get actual image data later (not at the declaration), you can just use 'img' or 'object' element without 'declare' attribute. >If the example provided is incorrect, can >you provide an example that achieves this separation? The following example shows one type of separation, but I don't think that meets your need. <object id="image_1" declare="declare" type="image/jpeg" data="data:image/jpeg;base64,aGh67Fghsapa0Hji7dfGSweTa . . . "> </object> . . . . <object height="20 mm" width="20 mm" data="#image_1" > </object> -- Jun Fujisawa <mailto:fujisawa.jun@canon.co.jp>
FOLLOWUP 4:
From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com> From: ElliottBradshaw@oaktech.com [mailto:ElliottBradshaw@oaktech.com] Sent: Friday, August 01, 2003 8:07 AM To: Jun Fujisawa Cc: don@lexmark.com; jim.bigelow@hp.com; owner-xp@pwg.org; xp@pwg.org Subject: Re: XP> Incorrect example in Appendix B.3 of XHTML Print I see two issues here, perhaps separable. 1. Use of inline data. This can be accomplished by adding support for the data scheme. Examples (from Fujisawa-san): <object height="20 mm" width="20 mm" type="image/jpeg" data="data:image/jpeg;base64,aGh67Fghsapa0Hji7dfGSweTa . . . "> Example Image </object> or <img height="20 mm" width="20 mm" alt="Example Image" src="data:image/jpeg;base64,aGh67Fghsapa0Hji7dfGSweTa . . . " /> 2. Separation of the data from the reference This is where the declare attribute comes in. I went back and read http://www.w3.org/TR/html4/struct/objects.html#h-13.3.4 It seems to me that the declare facility would let a client supply the content for the object before its reference, not after. If the requirement is that the client can send the image data at the end, I'm not sure that HTML supports that. If there is a requirement that the client can send the data first, then refer to it, then an example (again, thanks Fujisawa) is: <object id="image_1" declare="declare" type="image/jpeg" data="data:image/jpeg;base64,aGh67Fghsapa0Hji7dfGSweTa . . . "> </object> . . . . <object height="20 mm" width="20 mm" data="#image_1" > </object> I think the first requirement is good to have, but we can probably drop the second, especially since the ordering is probably not what we want. ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534
FOLLOWUP 5:
From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com> From: BIGELOW,JIM (HP-Boise,ex1) Sent: Friday, August 01, 2003 8:38 AM To: 'ElliottBradshaw@oaktech.com'; Jun Fujisawa Cc: don@lexmark.com; BIGELOW,JIM (HP-Boise,ex1); owner-xp@pwg.org; xp@pwg.org Subject: RE: XP> Incorrect example in Appendix B.3 of XHTML Print Elliott wrote: > I see two issues here, perhaps separable. > 1. Use of inline data. > > This can be accomplished by adding support for the data scheme. ... > > 2. Separation of the data from the reference > > ... > > I think the first requirement is good to have, but we can > probably drop the second, especially since the ordering is > probably not what we want. > I'm not perfectly clear on what you think the requirements should be. The current spec says that printer may support in-line data via the object/img elements, but is not required to. Are you calling for a change to this statement? Arguments against requiring support for in-line image data have been that: 1. it requires too much buffering 2. the image data could overflow the memory used to store element attributes. Alternately, to avoid the possibility of exceeding the memory set aside for storing element attributes while processing a job, a printer must either reserve large amounts of memory to avoid problems in this one, almost unique case, or implement a complex, dynamic memory allocation scheme. In any event supporting in-line data via the object and image attributes means that the entire image is funneled through the document parser, whereas, alternate means of handling image data are possible if the image is referenced via the cid or http schemes. There is another method for managing image data buffering, Section B.2.1 In-line images of the W3C spec provides some informative suggestions about ways to stage the delivery of image data using the (required) multiplexed document format. This method seeks to reduce the memory needed to store images while processing the document, by providing enough of the image header to determine the image's size, synchronized with the image's reference. The remainder or bulk of the image is delivered later in the document, hopefully, when the printer is ready to commit the image to the page. Jim --
FOLLOWUP 6:
From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com> From: ElliottBradshaw@oaktech.com [mailto:ElliottBradshaw@oaktech.com] Sent: Friday, August 01, 2003 9:46 AM To: BIGELOW,JIM (HP-Boise,ex1) Cc: don@lexmark.com; Jun Fujisawa; BIGELOW,JIM (HP-Boise,ex1); owner-xp@pwg.org; xp@pwg.org Subject: RE: XP> Incorrect example in Appendix B.3 of XHTML Print Sorry, I didn't mean to change the actual requirements. Section B.3 should stay informative and just be a discussion of different things a printer may choose to implement. However, there is at least one case of a conditional requirement elsewhere in the document (the Object Module) that refers to this section. But, it is confusing what problem this section is trying to solve (in an optional way). And, it looks like the example for use of the declare attribute is just plain wrong. I propose that we re-write this section to eliminate all discussion of the declare attribute, and simply show how to use the data URL scheme to handle inline data. For example: <proposal> This section is informative. An alternative method to include inline image data in XHTML-Print is via the "data" URL scheme (see RFC2397). Because this method normally encodes the binary image data using base64 encoding, a significant increase in the size of the data transmitted will be experienced. This SHOULD be avoided over low speed connections. Printers supporting inline data MAYsupport base64 encoding using the img or object element. <object height="20 mm" width="20 mm" type="image/jpeg" data="data:image/jpeg;base64,aGh67Fghsapa0Hji7dfGSweTa . . . "> Example Image </object> or <img height="20 mm" width="20 mm" alt="Example Image" src="data:image/jpeg;base64,aGh67Fghsapa0Hji7dfGSweTa . . . " /> This method MAY be useful for very simple clients that cannot afford a server for image downloading or for some reason cannot utilize the Application/Multiplexed MIME type; however, it is not RECOMMENDED for general use especially if the size of the printer's buffer is unknown. </proposal>
REPLY 1:
From: Jim Bigelow <voyager-issues@mn.aptest.com> Fujisawa-san, Thank you for you comment. It is recorded as issue 6492 [1] in the HTML Working Group's issue tracking system. The working group has elected to accept this defect and modify XHTML-Print spec by accepting Elliott Bradshaw's proposal to change Appendix B.3 to read as shown below. If this is not acceptable, please respond to this message with your comments. Jim Bigelow -- This section is informative. An alternative method to include inline image data in XHTML-Print is via the "data" URL scheme (see RFC2397). Because this method normally encodes the binary image data using base64 encoding, a significant increase in the size of the data transmitted will be experienced. This SHOULD be avoided over low speed connections.. Printers supporting inline data MAY support base64 encoding using the img or object element. <object height="20 mm" width="20 mm" type="image/jpeg" data="data:image/jpeg;base64,aGh67Fghsapa0Hji7dfGSweTa . . . "> Example Image </object> or <img height="20 mm" width="20 mm" alt="Example Image" src="data:image/jpeg;base64,aGh67Fghsapa0Hji7dfGSweTa . . . " /> This method MAY be useful for very simple clients that cannot afford a server for image downloading or for some reason cannot utilize the Application/Multiplexed MIME type; however, it is not RECOMMENDED for general use especially if the size of the printer's buffer is unknown. [1] http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6492;user=guest
PROBLEM ID: 6782
STATE: Closed
RESOLUTION: Accept
USER POSITION: Agree
NOTES:
Apply changes as noted -- Jim
ORIGINAL MESSAGE:
From: Susan Lesch [mailto:lesch@w3.org] These are minor editorial comments for your XHTML-Print Last Call Working Draft [1]. Kudos to the editor and your group(s). It looks great. s/family of XHTML Languages/family of XHTML languages/ s/members/Members/ s/whitespace/white space/ s/Style Sheet/style sheet/ s/guillemots/guillemets/ s/ththe/the/ s/, support/. Support/ [extracted from 6899] [1] http://www.w3.org/TR/2003/WD-xhtml-print-20030729/ Best wishes for your project, -- Susan Lesch http://www.w3.org/People/Lesch/ mailto:lesch@w3.org tel:+1.858.483.4819 World Wide Web Consortium (W3C) http://www.w3.org/
FOLLOWUP 1:
From: Mail Delivery Subsystem <MAILER-DAEMON@hades.mn.aptest.com> This is a MIME-encapsulated message --h8QNj9b28021.1064619909/hades.mn.aptest.com The original message was received at Fri, 26 Sep 2003 18:45:09 -0500 from IDENT:7ywgpQCDze4q049jyJPGDf82aNuXvKE8@localhost [127.0.0.1] ----- The following addresses had permanent fatal errors ----- <[mailto:lesch@w3.org]> (reason: 550 Host unknown) ----- Transcript of session follows ----- 550 5.1.2 <[mailto:lesch@w3.org]>... Host unknown (Name server: w3.org]: host not found) --h8QNj9b28021.1064619909/hades.mn.aptest.com Content-Type: message/delivery-status Reporting-MTA: dns; hades.mn.aptest.com Received-From-MTA: DNS; localhost Arrival-Date: Fri, 26 Sep 2003 18:45:09 -0500 Final-Recipient: RFC822; [mailto:lesch@w3.org] Action: failed Status: 5.1.2 Remote-MTA: DNS; w3.org] Diagnostic-Code: SMTP; 550 Host unknown Last-Attempt-Date: Fri, 26 Sep 2003 18:45:09 -0500 --h8QNj9b28021.1064619909/hades.mn.aptest.com Content-Type: message/rfc822 Return-Path: <voyager-issues@mn.aptest.com> Received: from localhost (IDENT:7ywgpQCDze4q049jyJPGDf82aNuXvKE8@localhost [127.0.0.1]) by hades.mn.aptest.com (8.11.6/8.11.6) with ESMTP id h8QNj9b28019 for <[mailto:lesch@w3.org]>; Fri, 26 Sep 2003 18:45:09 -0500 Date: Fri, 26 Sep 2003 18:45:09 -0500 Message-Id: <200309262345.h8QNj9b28019@hades.mn.aptest.com> From: Jim Bigelow <voyager-issues@mn.aptest.com> To: lesch@w3.org] Subject: Re: Minor editorial comments (PR#6782) X-Loop: voyager-issues@mn.aptest.com Thank you for your comment on the XHTML-Print Last Call Working Draft. It is recorded as issue 6782 [1] in the HTML Working Group's issue tracking system. The working group has elected to implement you suggestions. Jim Bigelow Editor [1] http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6782;user=guest --h8QNj9b28021.1064619909/hades.mn.aptest.com--
REPLY 1:
From: Jim Bigelow <voyager-issues@mn.aptest.com> Thank you for your comment on the XHTML-Print Last Call Working Draft. It is recorded as issue 6782 [1] in the HTML Working Group's issue tracking system. The working group has elected to implement you suggestions. Jim Bigelow Editor [1] http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6782;user=guest
PROBLEM ID: 6869
STATE: Closed
RESOLUTION: Modify and Accept
USER POSITION: Agree
NOTES:
Equivalent of 6777
ORIGINAL MESSAGE:
From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com> From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com> To: www-html-editor@w3.org Cc: xp@pwg.org Subject: XHTML-Print: change of url from xhtml-print.org to w3c.org breaks current implementations. Date: Thu, 4 Sep 2003 11:02:17 -0700 Message-ID: <020A3CF87FB5AC47AA67966B33845755050DB585@xboi22.boise.itc.hp.com> X-Archived-At: http://www.w3.org/mid/020A3CF87FB5AC47AA67966B33845755050DB585@xboi22.boise.itc.hp.com The W3C Last Call Working Draft of XHTML-Print [1] changes the URL in the DOCTYPE from "http://www.xhtml-print.org/xhtml-print/xhtml-print10.dtd" to "http://www.w3.org/MarkUp/DTD/xhtml-print10.dtd". This breaks compatibility with existing implementations. Can this situation be handled by redirecting the xhtml-print.org url to the w3.org url? If so, how is this done? [1] http://www.w3.org/TR/2003/WD-xhtml-print-20030729/ Jim Bigelow Hewlett-Packard
REPLY 1:
From: Jim Bigelow <voyager-issues@mn.aptest.com> Jonny Axelsson wrote: Just for my curiosity: How does that break backwards compatibility? The old DTD will presumably remain at the www.xhtml-print.org location for at least as long as is needed (for the current implementations), while new or updated XHTML-Print implementations will use the new location. Or? -- Jonny Axelsson, Web Standards, Opera Software
REPLY 2:
From: Jim Bigelow <voyager-issues@mn.aptest.com> Elliott Bradshaw wrote: Don is going to remind us (as well he should) that the URL is not used for a live retrieval from that server. So a redirect doesn't work. So I think this is, technically, an incompatible change. But I think it's one we could live with. -------------------------------------------------------------------------------- Elliott Bradshaw Director, Software Engineering Zoran Imaging Group (formerly Oak Technology Imaging Group) 781 638-7534
REPLY 3:
From: Jim Bigelow <voyager-issues@mn.aptest.com> Jim Bigelow wrote: Jonny, Thanks for the question. If a document with the w3c DTD is sent to a printer that shipped with firmware written using the spec saying that conforming XHTML-Print documents must have a DTD containing a URL to the xhtml-print.org DTD, then the it is possible that the document wouldn't print correctly, even though the printer is not validating. In the extreme case, it is possible that the document wouldn't print at all, since Section 2.3.1, item 1 says, "A printer MAY ignore or otherwise reject a non-conforming XHTML-Print document." I think we're all better off avoiding things that could make the user unhappy! :-) Jim
REPLY 4:
From: Jim Bigelow <voyager-issues@mn.aptest.com> Thank you for your comment on the XHTML-Print Last Call Working Draft. It is recorded as issue 6869 [1] in the HTML Working Group's issue tracking system. The working group following the reasoning of issue 6780 [2] decided that the DTD in in Appendix C of the spec [3] and the DTD in Appendix C of XHTML-Print [4] must be accepted. However, the DTD in Appendix C of XHTML-Print [4] is deprecated in favor of the DTD in Appendic C. Future releases of this specification may remove the required support for the DTD in Appendix C of XHTML-Print [4]. If you feel that this resolution of your comment is not acceptable, please respond to this message with your comments. Jim Bigelow Editor [1] http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6869;user=guest [2] http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6780;user=guest [3] http://www.w3.org/TR/2003/WD-xhtml-print-20030729/ [4] http://www.pwg.org/xhtml-print/HTML-Version/XHTML-Print.html
PROBLEM ID: 6772
STATE: Closed
RESOLUTION: Accept
USER POSITION: Agree
NOTES:
Defined ignore as display:none
ORIGINAL MESSAGE:
From: Henri Sivonen <hsivonen@iki.fi> From: Henri Sivonen <hsivonen@iki.fi> To: www-html-editor@w3.org Subject: Scripts and Events Date: Sun, 3 Aug 2003 22:01:47 +0300 Message-Id: <EE667E7F-C5E4-11D7-B77B-003065B8CF0E@iki.fi> X-Archived-At: http://www.w3.org/mid/EE667E7F-C5E4-11D7-B77B-003065B8CF0E@iki.fi 1.3.1 Script and Events Since the specification requires the documents to conform to restrictions that are not applicable to all XHTML documents, it is unlikely that casually authored XHTML documents would happen to be conforming XHTML-Print documents. Therefore, it is reasonable to expect some preprocessing to take place in the application before sending a document to the printer. That application could be required to discard script elements without burdening the printer with that task. Such modification would change the document tree, though, and could change the matching of CSS selectors. If it is important to take into account the special case that someone could use a CSS selector such as "script + p" to style a paragraph, it would be necessary to elaborate on what "discarding" an element on the printer means (that is, is it discarded from the document tree or merely defaulted to display: none;). [extracted from issue 6548] -- Henri Sivonen hsivonen@iki.fi http://www.iki.fi/hsivonen/
REPLY 1:
From: Jim Bigelow <voyager-issues@mn.aptest.com> Thank you for your comment. It is recorded as issue 6772 [1] in the HTML Working Group's issue tracking system. The working group has elected to accept your comment by clarifying that discarding an element should be the equivalent to setting its display property to "none". If this resolution of you comment is not acceptable, please respond to this message with your comments. Jim Bigelow Editor [1] http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6772;user=guest
PROBLEM ID: 6773
STATE: Closed
RESOLUTION: Reject
USER POSITION: No Response
NOTES:
DOCTYPE does not add an extra burden on printers
ORIGINAL MESSAGE:
From: Henri Sivonen <hsivonen@iki.fi> From: Henri Sivonen <hsivonen@iki.fi> To: www-html-editor@w3.org Subject: Document Conformance Date: Sun, 3 Aug 2003 22:01:47 +0300 Message-Id: <EE667E7F-C5E4-11D7-B77B-003065B8CF0E@iki.fi> X-Archived-At: http://www.w3.org/mid/EE667E7F-C5E4-11D7-B77B-003065B8CF0E@iki.fi 2.1 Document Conformance Considering that printers are allowed to ignore non-conforming documents, requiring a particular doctype declaration and DTD validity looks like a significant burden for applications producing XHTML-Print documents. In particular, DTD validity requires namespaces to be represented in a particular way even though other representations would be semantically equivalent. This means applications producing XHTML-Print documents cannot use any off-the-shelf XML serializer but need a serializer specifically tailored to meet the requirements of XML-Print. Wouldn't it be enough allow DTDless documents as long as the element structure meets the requirements expressed in the DTD (even though this kind of conformance can't be checked with a [DTD-]validating XML processor)? [extracted from issue 6548] -- Henri Sivonen hsivonen@iki.fi http://www.iki.fi/hsivonen/
REPLY 1:
From: Jim Bigelow <voyager-issues@mn.aptest.com> Thank you for your comment on the XHTML-Print Last Call Working Draft. It is recorded as issue 6773 [1] in the HTML Working Group's issue tracking system. The working group does not agree that the inclusion of the required doctype element in XHTML-Print documents would be a burden either to an application that produced XHTML-Print documents or a printer that processed them. Therefore, no change is planned to the specific regarding your issue. If you feel that this resolution of your comment is not acceptable, please respond to this message with your comments. Jim Bigelow Editor [1] http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6773;user=guest
PROBLEM ID: 6774
STATE: Closed
RESOLUTION: Accept
USER POSITION: Agree
NOTES:
Lexmark dessenting, all others accept
ORIGINAL MESSAGE:
From: Henri Sivonen <hsivonen@iki.fi> From: Henri Sivonen <hsivonen@iki.fi> To: www-html-editor@w3.org Subject: allow UTF-16 not just UTF-8 Date: Sun, 3 Aug 2003 22:01:47 +0300 Message-Id: <EE667E7F-C5E4-11D7-B77B-003065B8CF0E@iki.fi> X-Archived-At: http://www.w3.org/mid/EE667E7F-C5E4-11D7-B77B-003065B8CF0E@iki.fi It is said that if a "charset" parameter is present for the application/xhtml+xml MIME type, the only valid value is "utf-8". It would make sense to allow "utf-16" as well. All XML processors are required to support UTF-16 in addition to UTF-8, so allowing UTF-16 for XHTML-Print doesn't cause any additional burden to implementations. Also, the payload of Application/Vnd.pwg-multiplexed chunks is defined as octets, so UTF-16 strings can be delivered as Application/Vnd.pwg-multiplexed chunks without any further encoding. [extracted from issue 6548] -- Henri Sivonen hsivonen@iki.fi http://www.iki.fi/hsivonen/
FOLLOWUP 1:
From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com> From: don@lexmark.com [mailto:don@lexmark.com] Sent: Tuesday, September 02, 2003 6:06 PM To: BIGELOW,JIM (HP-Boise,ex1) Cc: xp@pwg.org Subject: Re: XP> Relaxing XHTML-Print's restriction to UTF-8 to include UTF-16 Jim: I would disagree. I don't believe that all XHTML-Print enabled printers will necessarily bite the bullet and include a complete XML parser that requires support for UTF-16. I don't believe we should force that to occur. Perhaps you should remind the group that XHTML-Print is target for LOW-END printers with this embedded. No 3 gigahertz Pentium 4's with 512 MB of memory!!! ******************************************* Don Wright don@lexmark.com Chair, IEEE SA Standards Board Member, IEEE-ISTO Board of Directors f.wright@ieee.org / f.wright@computer.org Director, Alliances and Standards Lexmark International 740 New Circle Rd C14/082-3 Lexington, Ky 40550 859-825-4808 (phone) 603-963-8352 (fax) *******************************************
FOLLOWUP 2:
From: jim.bigelow@hp.com I tend to agree with Henri. -- Jim Bigelow
FOLLOWUP 3:
From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com> > From: elliott.bradshaw@zoran.com [mailto:elliott.bradshaw@zoran.com] > Sent: Wednesday, September 03, 2003 7:07 AM > To: don@lexmark.com > Cc: BIGELOW,JIM (HP-Boise,ex1); owner-xp@pwg.org; xp@pwg.org > Subject: Re: XP> Relaxing XHTML-Print's restriction to UTF-8 > to include UTF-16 > Or to put it another way, XHTML-Print describes a single way of doing something. Wherease HTML and its derivatives frequently support multiple ways of getting the same effect. In the past, we have have resisted features that appear easy, unless they actually extend the capabilities of what can be done. Since I think a UTF-8 oriented client can get the same work done as a UTF-16 client, we should not mandate the extension. IMHO. E.
FOLLOWUP 4:
From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com> > From: Michael Sweet [mailto:mike@easysw.com] > Sent: Wednesday, September 03, 2003 7:26 AM > To: don@lexmark.com > Cc: BIGELOW,JIM (HP-Boise,ex1); xp@pwg.org > Subject: Re: XP> Relaxing XHTML-Print's restriction to UTF-8 > to include UTF-16 I'm not so worried about memory usage; converting UTF-16 to UTF-8 on the input side is not expensive in terms of memory or processor. However, reliably detecting UTF-16 and managing the endianess of the words is a pain in the ass in the real world. Assuming that all UTF-16 files start with FFFE or FEFF, the XML parser can handle the UTF-16 encoding without difficulty, however certain large convicted software monopolies regularly omit this important information making autodetection unreliable. Given the limited scope of XHTML-Print and the desire for maximum interoperability, I would recommend that we stick with UTF-8 as the only requirement so that applications that send XHTML-Print data have to use UTF-8 and manage whatever perversion of UTF-16 they use internally themselves... -- ______________________________________________________________________ Michael Sweet, Easy Software Products mike at easysw dot com Printing Software for UNIX http://www.easysw.com
FOLLOWUP 5:
From: don@lexmark.com I maintain my disagreement with this decision for all the reasons previously mentioned including: 1) There are no characters which can be represented in UTF16 that connot be represented in UTF8 2) Reliable detection of UTF16 has not been proven 3) High "zoot" clients can much more easily convert any UTF16 to UTF8 4) Many of the target printers will have no need to deal with generic XML and hence no reason to support UTF16 Jim Bigelow <voyager-issues@mn.aptest.com> on 09/26/2003 03:48:41 PM To: hsivonen@iki.fi cc: don@lexmark.com, elliott.bradshaw@zoran.com Subject: Re: allow UTF-16 not just UTF-8 (PR#6774) Thank you for your comment on the XHTML-Print Last Call Working Draft. It is recorded as issue 6774 [1] in the HTML Working Group's issue tracking system. The working group agrees that since XHTML-Print is a member of the family of XHTML 1.0 languages documents encodings cannot be restricted to UTF-8 but must also include UTF-16. The specification will be modified to remove the sentence, 'The only valid value for the "charset" parameter is "utf-8".' If you feel that this resolution of your comment is not acceptable, please respond to this message with your comments. Jim Bigelow Editor [1] http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6774;user=guest
FOLLOWUP 6:
From: don@lexmark.com Works for me. ********************************************** Don Wright don@lexmark.com Chair, IEEE SA Standards Board Member, IEEE-ISTO Board of Directors f.wright@ieee.org / f.wright@computer.org Director, Alliances & Standards Lexmark International 740 New Circle Rd Lexington, Ky 40550 859-825-4808 (phone) 603-963-8352 (fax) ********************************************** Jim Bigelow <voyager-issues@mn.aptest.com> on 09/29/2003 05:39:11 PM To: don@lexmark.com cc: Subject: Re: allow UTF-16 not just UTF-8 (PR#6774) Don, What do you think of the following compromise? 1. say nothing about whether a printer supports UTF-8 or UTF-16 2. require that conforming XHTML-Print documents be encoded in UTF-8 by requiring that conforming clients (Section 2.2) creating documents that are encoded in UF-8. This means adding the following to item 1 of Section 2.2: 1. Clients SHALL produce a well-formed XHTML-Print document as defined in XHTML 1.0 [XHTML1] and in Document Conformance. The document SHALL be encoded using UTF-8 [RFC2279]. Jim Bigelow
FOLLOWUP 7:
From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com> To the HTML WG: Hello, Please help me understand this facet of XHTML-Print as a member of the Family of Languages defined by the Modularization of XHTML 1.0 -- must an application that processes XHTML-Print documents be a conforming XML processor? I'm sure that it must be able to process XHTML-Print documents as described by the XHTML-Print specification, but are there other constraints? For example, an xml processor is supposed to be able to process documents in UTF-8 and UTF-16. Why does an XHTML-Print processor have support UTF-16? What would be the reasons for not restricting the encoding to UTF-8? The potential benefit of only requiring support for UTF-8, rather than both UTF-8 and UTF-16, is that a more low-cost (in terms of memory and processing power) printers could process utf-8 encoded XHTML-Print documents. Requiring support for both UTF-8 and UTF-16 increases the memory and processing requirements and thereby reduces the number of devices that could process XHTML-Print documents. One of the goals of XHTML-Print is to provide document format for printing from and to low-cost devices, so keeping requirements to a minimum increases the possibilities that low-cost printers will implement support for it. Several representative of printer manufactures have expressed the opinion that support for UTF-8 and not for UTF-16 is preferred. Can you help me understand the technical reasons why UTF-16 support should be required, so we can judge the trade-offs in implementation costs versus capabilities? Jim
FOLLOWUP 8:
From: elliott.bradshaw@zoran.com Jim, Um, seems to me like a game of semantics. Whether we make a statement about the language or a statement about how the client generates it, seems like it's the same thing. I think the conflict here is: 1. PWG wanted a simple way to send print jobs. No need for multiple ways to accomplish the same thing. 2. But there seem to be W3C rules about how one derives languages from XHTML. I do think that #2 is contrary to the purpose of the original project. Just as we are able to say that XHTML-Print does not mandate certain properties which are too hard for a printer (e.g. the caveats on the position property) we ought to be able to exclude something that is not appropriate to the problem at hand. The only justification for this extension is "W3C says so." In principle we shouldn't do it. But, as a compromise I could live with it if I had to. -- Elliott Bradshaw Director, Software Engineering Zoran Imaging Division (formerly Oak Technology Imaging Group) 781 638-7534 0
FOLLOWUP 9:
From: Mail Delivery Subsystem <MAILER-DAEMON@hades.mn.aptest.com> This is a MIME-encapsulated message --h91Hhtb18706.1065030235/hades.mn.aptest.com The original message was received at Wed, 1 Oct 2003 12:43:53 -0500 from IDENT:i5LhU/0sXY+dwkWULvPvTYjef6dRQYOI@localhost [127.0.0.1] ----- The following addresses had permanent fatal errors ----- <don@lexmark> (reason: 550 Host unknown) ----- Transcript of session follows ----- 550 5.1.2 <don@lexmark>... Host unknown (Name server: lexmark: host not found) --h91Hhtb18706.1065030235/hades.mn.aptest.com Content-Type: message/delivery-status Reporting-MTA: dns; hades.mn.aptest.com Received-From-MTA: DNS; localhost Arrival-Date: Wed, 1 Oct 2003 12:43:53 -0500 Final-Recipient: RFC822; don@lexmark Action: failed Status: 5.1.2 Remote-MTA: DNS; lexmark Diagnostic-Code: SMTP; 550 Host unknown Last-Attempt-Date: Wed, 1 Oct 2003 12:43:54 -0500 --h91Hhtb18706.1065030235/hades.mn.aptest.com Content-Type: message/rfc822 Return-Path: <voyager-issues@mn.aptest.com> Received: from localhost (IDENT:i5LhU/0sXY+dwkWULvPvTYjef6dRQYOI@localhost [127.0.0.1]) by hades.mn.aptest.com (8.11.6/8.11.6) with ESMTP id h91Hhrb18704; Wed, 1 Oct 2003 12:43:53 -0500 Date: Wed, 1 Oct 2003 12:43:53 -0500 Message-Id: <200310011743.h91Hhrb18704@hades.mn.aptest.com> From: Jim Bigelow <voyager-issues@mn.aptest.com> To: don@lexmark, elliott.bradshaw@zoran.com Subject: Re: allow UTF-16 not just UTF-8 (PR#6774) X-Loop: voyager-issues@mn.aptest.com Don and Elliott, The HTML working group discussed my question of why and XHTML-Print processor must be a conforming XML processor (in particular, why it must support both UTF-8 and UTF-16 encodings) on October 1, 2003. The answer is that XHTML-Print must be a conforming XML processor and support both UTF-8 and UTF-16 encodings to preserve compatibility between xml-based applications. If XHTML-Print processors only supported UTF-8 then an xml-based application could not be reliably depended upon to emit an XHTML-Print document that the XHTML-print application could process. For example, an xml-based Xforms application's output of an XHTML-Print document cannot be restricted by the XHTML-Print specification to UTF-8 since the application may not be able to control the encoding. Section 4.3.3 [1] and Appendix F [2] of the XML specification [3] give heuristics for determing a document's encoding when the charset parameter of the MIME type [4] is absent. An example UTF-16 decoder is available at [5] other encodings are at [6]. Jim Bigelow [1] http://www.w3.org/TR/REC-xml#charencoding [2] http://www.w3.org/TR/REC-xml#sec-guessing [3] http://www.w3.org/TR/REC-xml [4] http://www.ietf.org/rfc/rfc3023.txt [5] http://interscript.sourceforge.net/interscript/doc/en_iscr_0282.html [6] http://interscript.sourceforge.net/interscript/doc/en_iscr_0275.html --h91Hhtb18706.1065030235/hades.mn.aptest.com--
FOLLOWUP 10:
From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com> Here is Don Wright's objection to UTF-16 support. Jim http://oz.boi.hp.com/~jhb/ -----Original Message----- From: don@lexmark.com [mailto:don@lexmark.com] Sent: Wednesday, October 08, 2003 9:42 AM To: BIGELOW,JIM (HP-Boise,ex1) Cc: elliott.bradshaw@zoran.com; www-html@w3.org Subject: Re: allow UTF-16 not just UTF-8 (PR#6774) Jim: So let me understand this.... Because people have poorly designed and written XML applications running on 3 GHz Pentium 4s with 512 megabytes of real memory that do not allow the control over whether UTF-8 or UTF-16 are emitted, we are expecting to burden $49 printers with code to be able to detect and interpret both. I maintain my objection and my no vote. ********************************************** Don Wright don@lexmark.com Chair, IEEE SA Standards Board Member, IEEE-ISTO Board of Directors f.wright@ieee.org / f.wright@computer.org Director, Alliances & Standards Lexmark International 740 New Circle Rd Lexington, Ky 40550 859-825-4808 (phone) 603-963-8352 (fax) ********************************************** "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com> on 10/08/2003 10:24:45 AM To: don@lexmark.com cc: elliott.bradshaw@zoran.com, www-html@w3.org Subject: Re: allow UTF-16 not just UTF-8 (PR#6774) From http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6774;user=g uest - reply #3 Date: Wed Oct 1 12:43:54 2003 Don and Elliott, The HTML working group discussed my question of why and XHTML-Print processor must be a conforming XML processor (in particular, why it must support both UTF-8 and UTF-16 encodings) on October 1, 2003. The answer is that XHTML-Print must be a conforming XML processor and support both UTF-8 and UTF-16 encodings to preserve compatibility between xml-based applications. If XHTML-Print processors only supported UTF-8 then an xml-based application could not be reliably depended upon to emit an XHTML-Print document that the XHTML-print application could process. For example, an xml-based Xforms application's output of an XHTML-Print document cannot be restricted by the XHTML-Print specification to UTF-8 since the application may not be able to control the encoding. Section 4.3.3 [1] and Appendix F [2] of the XML specification [3] give heuristics for determing a document's encoding when the charset parameter of the MIME type [4] is absent. An example UTF-16 decoder is available at [5] other encodings are at [6]. Jim Bigelow [1] http://www.w3.org/TR/REC-xml#charencoding [2] http://www.w3.org/TR/REC-xml#sec-guessing [3] http://www.w3.org/TR/REC-xml [4] http://www.ietf.org/rfc/rfc3023.txt [5] http://interscript.sourceforge.net/interscript/doc/en_iscr_0282.html [6] http://interscript.sourceforge.net/interscript/doc/en_iscr_0275.html Jim http://oz.boi.hp.com/~jhb/
FOLLOWUP 11:
From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com> -----Original Message----- From: elliott.bradshaw@zoran.com [mailto:elliott.bradshaw@zoran.com] Sent: Thursday, October 09, 2003 2:14 PM To: don@lexmark.com Cc: BIGELOW,JIM (HP-Boise,ex1) Subject: Re: allow UTF-16 not just UTF-8 (PR#6774) Don, As you know I have been skeptical of feature creep all along. But I think this one may be different...here's why. When we originally conceived XHTML-Print the idea was that the client code would be essentially a hand-coded print driver. But this W3C discussion brings up the idea that people could use XML application development tools as well. This could be in our interest if it gives people an easy way to write XHTML-Print aware applications. (And it seems to be pretty fundamental to the way they defined XML.) It seems that such tools don't like to be constrained to only one of UTF-8 vs. UTF-16...it would be "unnatural" to limit a developer in this way. It sort of reminds me of 10baseT vs. 100baseT, in which it seems odd to support one but not the other. How much complexity would this add to the $49 printer? Once we know whether or not we are in UTF-16, it would add very little (if nothing else do a brute force conversion from UTF-16 to UTF-8). Detection of UTF-16 is also straightforward, as described in 4.3.3 of http://www.w3.org/TR/REC-xml, which says the special Byte Order Mark is required at the beginning of UTF-16. (It also says very clearly that UTF-16 support is required.) So I think the cost is low, the benefit of XML-based application tools might be significant, and technical alignment with XML makes it worth doing. E. ---------------------------------------------------------------------------- ---- Elliott Bradshaw Director, Software Engineering Zoran Imaging Division (formerly Oak Technology Imaging Group) 781 638-7534 don@lexmark.co m To: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com> 10/08/2003 cc: elliott.bradshaw@zoran.com, www-html@w3.org 12:41 PM Subject: Re: allow UTF-16 not just UTF-8 (PR#6774) Jim: So let me understand this.... Because people have poorly designed and written XML applications running on 3 GHz Pentium 4s with 512 megabytes of real memory that do not allow the control over whether UTF-8 or UTF-16 are emitted, we are expecting to burden $49 printers with code to be able to detect and interpret both. I maintain my objection and my no vote. ********************************************** Don Wright don@lexmark.com Chair, IEEE SA Standards Board Member, IEEE-ISTO Board of Directors f.wright@ieee.org / f.wright@computer.org Director, Alliances & Standards Lexmark International 740 New Circle Rd Lexington, Ky 40550 859-825-4808 (phone) 603-963-8352 (fax) ********************************************** "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com> on 10/08/2003 10:24:45 AM To: don@lexmark.com cc: elliott.bradshaw@zoran.com, www-html@w3.org Subject: Re: allow UTF-16 not just UTF-8 (PR#6774) From http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6774;user=g uest - reply #3 Date: Wed Oct 1 12:43:54 2003 Don and Elliott, The HTML working group discussed my question of why and XHTML-Print processor must be a conforming XML processor (in particular, why it must support both UTF-8 and UTF-16 encodings) on October 1, 2003. The answer is that XHTML-Print must be a conforming XML processor and support both UTF-8 and UTF-16 encodings to preserve compatibility between xml-based applications. If XHTML-Print processors only supported UTF-8 then an xml-based application could not be reliably depended upon to emit an XHTML-Print document that the XHTML-print application could process. For example, an xml-based Xforms application's output of an XHTML-Print document cannot be restricted by the XHTML-Print specification to UTF-8 since the application may not be able to control the encoding. Section 4.3.3 [1] and Appendix F [2] of the XML specification [3] give heuristics for determing a document's encoding when the charset parameter of the MIME type [4] is absent. An example UTF-16 decoder is available at [5] other encodings are at [6]. Jim Bigelow [1] http://www.w3.org/TR/REC-xml#charencoding [2] http://www.w3.org/TR/REC-xml#sec-guessing [3] http://www.w3.org/TR/REC-xml [4] http://www.ietf.org/rfc/rfc3023.txt [5] http://interscript.sourceforge.net/interscript/doc/en_iscr_0282.html [6] http://interscript.sourceforge.net/interscript/doc/en_iscr_0275.html Jim http://oz.boi.hp.com/~jhb/
FOLLOWUP 12:
From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com> Mike, I've neglected to update you on the discussions about UTF-8/UTF-16 support for XHTML-Print. Please let us know you thoughts on the matter. You can see these discussion using the following link to the W3C's HTML Working Group issue database: http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6774;user=g uest In summary: HTML WG: must support UTF-8 & UTF-16 for interoperability with all other xml and xml-derived applications and processors. Lexmark: UTF-16 support is too expensive to support in a low-cost printer, and too hard to reliably detect, ... Oak/Zoran: UTF-16 wouldn't be too expensive to implement and enables a new class of XHTML-Print producing devices HP: UTF-16 allows for more compact representation of Asian character documents and would not be too much to implement. Jim Bigelow, Editor: XHTML-Print & CSS Print Profile W3C HTML and CSS Working Groups http://www.w3.org/TR/xhtml-print/ http://www.w3.org/TR/css-print/ Hewlett-Packard 208-396-2068 jim.bigelow@hp.com
FOLLOWUP 13:
From: don@lexmark.com Steven, et al: The real problem is that the entire XML architecture was designed assuming high end boxes like the 3 GHz Pentium with 512 megabytes of memory. We have already seen push back in other standards groups that consumer electronic devices and other smaller, lighter devices cannot afford all the luxuries demand by an obese XML architecture. Unless the XML community accepts subsetting, we can't expect the broadest support for XML to happen at the low end until the price/performance ratios experience another order or two magnitude improvement. As recently reported in several of the trade magazines focused on IT professionals, the deployment of XML and Web Services are have significant negative impacts on the IT infrastructure especially in the area of bandwidth utilization. This is just another symptom of the same problem. I know I will lose this argument in the W3C but the realities of the XHTML-Print implementations will blow off UTF-16 as more fat with no benefit and simply not support it, "interoperable" or not. Sorry I'm not pure but practical. ******************************************* Don Wright don@lexmark.com Chair, IEEE SA Standards Board Member, IEEE-ISTO Board of Directors f.wright@ieee.org / f.wright@computer.org Director, Alliances and Standards Lexmark International 740 New Circle Rd C14/082-3 Lexington, Ky 40550 859-825-4808 (phone) 603-963-8352 (fax) ******************************************* "Steven Pemberton" <Steven.Pemberton@cwi.nl> on 10/15/2003 09:18:15 AM To: "BIGELOW,JIM \(HP-Boise,ex1\)" <jim.bigelow@hp.com>, <w3c-html-wg@w3.org>, <don@lexmark.com> cc: <voyager-issues@mn.aptest.com>, <elliott.bradshaw@zoran.com>, <www-html@w3.org> Subject: Re: allow UTF-16 not just UTF-8 (PR#6774) > From: don@lexmark.com [mailto:don@lexmark.com] > So let me understand this.... > > Because people have poorly designed and written XML applications running on > 3 GHz Pentium 4s with 512 megabytes of real memory that do not allow the > control over whether UTF-8 or UTF-16 are emitted, we are expecting to burden > $49 printers with code to be able to detect and interpret both. No Don. It is about interoperability and conforming to standards. XML allows documents to be encoded in either UTF8 or UTF 16: consumers must accept both, producers may produce either. An XHTML-Print printer will be just a consumer of an XML byte-stream at some IP address; we don't want to burden every program in the world that can produce XML with a switch that says "this output is going to a poor lowly XHTML Print processor that can't deal with UTF-16, so please produce UTF-8", especially since UTF 16 is the easy one to implement, and can only cost a few dozen bytes at best. If we changed this, XHTML Print would have to go back to last call, and you can bet your boots that the XML community would rise up against us, as it has in the past, and I can tell you we don't want to go there, and we would have a hundred people registering objections. Conforming to XML requirements comes with the territory of being XHTML. The XML community will not take lightly to us messing with their standards. Best wishes, Steven Pemberton
FOLLOWUP 14:
From: "Steven Pemberton" <Steven.Pemberton@cwi.nl> > From: don@lexmark.com [mailto:don@lexmark.com] > So let me understand this.... > > Because people have poorly designed and written XML applications running on > 3 GHz Pentium 4s with 512 megabytes of real memory that do not allow the > control over whether UTF-8 or UTF-16 are emitted, we are expecting to burden > $49 printers with code to be able to detect and interpret both. No Don. It is about interoperability and conforming to standards. XML allows documents to be encoded in either UTF8 or UTF 16: consumers must accept both, producers may produce either. An XHTML-Print printer will be just a consumer of an XML byte-stream at some IP address; we don't want to burden every program in the world that can produce XML with a switch that says "this output is going to a poor lowly XHTML Print processor that can't deal with UTF-16, so please produce UTF-8", especially since UTF 16 is the easy one to implement, and can only cost a few dozen bytes at best. If we changed this, XHTML Print would have to go back to last call, and you can bet your boots that the XML community would rise up against us, as it has in the past, and I can tell you we don't want to go there, and we would have a hundred people registering objections. Conforming to XML requirements comes with the territory of being XHTML. The XML community will not take lightly to us messing with their standards. Best wishes, Steven Pemberton
FOLLOWUP 15:
From: Michael Sweet <mike@easysw.com> BIGELOW,JIM (HP-Boise,ex1) wrote: > Mike, > > I've neglected to update you on the discussions about UTF-8/UTF-16 > support for XHTML-Print. Please let us know you thoughts on the > matter. My concerns have always been concerning the detection between UTF-8 and UTF-16. After looking through the archive and the current XML spec, it does look like the BOM is required at the beginning of any UTF-16 XML document, so any autodetection problems can safely be blamed on Microsoft or whatever vendor is producing a non-conforming document. I do like the idea of recommending (a SHOULD, not a MUST) that the XHTML-Print client use the UTF-8 encoding, and add a note that the typical XHTML-Print device has limited CPU/memory available and the use of UTF-8 will potentially provide faster printing, etc. -- ______________________________________________________________________ Michael Sweet, Easy Software Products mike at easysw dot com Printing Software for UNIX http://www.easysw.com
FOLLOWUP 16:
From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com> From: elliott.bradshaw@zoran.com [mailto:elliott.bradshaw@zoran.com] Sent: Thursday, October 09, 2003 2:14 PM To: don@lexmark.com Cc: BIGELOW,JIM (HP-Boise,ex1) Subject: Re: allow UTF-16 not just UTF-8 (PR#6774) Don, As you know I have been skeptical of feature creep all along. But I think this one may be different...here's why. When we originally conceived XHTML-Print the idea was that the client code would be essentially a hand-coded print driver. But this W3C discussion brings up the idea that people could use XML application development tools as well. This could be in our interest if it gives people an easy way to write XHTML-Print aware applications. (And it seems to be pretty fundamental to the way they defined XML.) It seems that such tools don't like to be constrained to only one of UTF-8 vs. UTF-16...it would be "unnatural" to limit a developer in this way. It sort of reminds me of 10baseT vs. 100baseT, in which it seems odd to support one but not the other. How much complexity would this add to the $49 printer? Once we know whether or not we are in UTF-16, it would add very little (if nothing else do a brute force conversion from UTF-16 to UTF-8). Detection of UTF-16 is also straightforward, as described in 4.3.3 of http://www.w3.org/TR/REC-xml, which says the special Byte Order Mark is required at the beginning of UTF-16. (It also says very clearly that UTF-16 support is required.) So I think the cost is low, the benefit of XML-based application tools might be significant, and technical alignment with XML makes it worth doing. E. ---------------------------------------------------------------------------- ---- Elliott Bradshaw Director, Software Engineering Zoran Imaging Division (formerly Oak Technology Imaging Group) 781 638-7534
FOLLOWUP 17:
From: "Steven Pemberton" <steven.pemberton@cwi.nl> But support for UTF 16 adds a few dozen bytes of code, and no extra memory requirements. It is simpler than UTF 8! What's the problem? Steven ----- Original Message ----- From: <don@lexmark.com> To: "Steven Pemberton" <Steven.Pemberton@cwi.nl> Cc: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>; <w3c-html-wg@w3.org>; <don@lexmark.com>; <voyager-issues@mn.aptest.com>; <elliott.bradshaw@zoran.com>; <www-html@w3.org> Sent: Thursday, October 16, 2003 12:20 AM Subject: Re: allow UTF-16 not just UTF-8 (PR#6774) > > Steven, et al: > > The real problem is that the entire XML architecture was designed assuming > high end boxes like the 3 GHz Pentium with 512 megabytes of memory. We > have already seen push back in other standards groups that consumer > electronic devices and other smaller, lighter devices cannot afford all the > luxuries demand by an obese XML architecture. Unless the XML community > accepts subsetting, we can't expect the broadest support for XML to happen > at the low end until the price/performance ratios experience another order > or two magnitude improvement. As recently reported in several of the trade > magazines focused on IT professionals, the deployment of XML and Web > Services are have significant negative impacts on the IT infrastructure > especially in the area of bandwidth utilization. This is just another > symptom of the same problem. > > I know I will lose this argument in the W3C but the realities of the > XHTML-Print implementations will blow off UTF-16 as more fat with no > benefit and simply not support it, "interoperable" or not. > > Sorry I'm not pure but practical. > > ******************************************* > Don Wright don@lexmark.com > > Chair, IEEE SA Standards Board > Member, IEEE-ISTO Board of Directors > f.wright@ieee.org / f.wright@computer.org > > Director, Alliances and Standards > Lexmark International > 740 New Circle Rd C14/082-3 > Lexington, Ky 40550 > 859-825-4808 (phone) 603-963-8352 (fax) > ******************************************* > > > > > "Steven Pemberton" <Steven.Pemberton@cwi.nl> on 10/15/2003 09:18:15 AM > > To: "BIGELOW,JIM \(HP-Boise,ex1\)" <jim.bigelow@hp.com>, > <w3c-html-wg@w3.org>, <don@lexmark.com> > cc: <voyager-issues@mn.aptest.com>, <elliott.bradshaw@zoran.com>, > <www-html@w3.org> > Subject: Re: allow UTF-16 not just UTF-8 (PR#6774) > > > > From: don@lexmark.com [mailto:don@lexmark.com] > > > So let me understand this.... > > > > Because people have poorly designed and written XML applications running > on > > 3 GHz Pentium 4s with 512 megabytes of real memory that do not allow the > > control over whether UTF-8 or UTF-16 are emitted, we are expecting to > burden > > $49 printers with code to be able to detect and interpret both. > > No Don. It is about interoperability and conforming to standards. XML > allows > documents to be encoded in either UTF8 or UTF 16: consumers must accept > both, producers may produce either. An XHTML-Print printer will be just a > consumer of an XML byte-stream at some IP address; we don't want to burden > every program in the world that can produce XML with a switch that says > "this output is going to a poor lowly XHTML Print processor that can't deal > with UTF-16, so please produce UTF-8", especially since UTF 16 is the easy > one to implement, and can only cost a few dozen bytes at best. > > If we changed this, XHTML Print would have to go back to last call, and you > can bet your boots that the XML community would rise up against us, as it > has in the past, and I can tell you we don't want to go there, and we would > have a hundred people registering objections. > > Conforming to XML requirements comes with the territory of being XHTML. The > XML community will not take lightly to us messing with their standards. > > Best wishes, > > Steven Pemberton > > > > > > >
FOLLOWUP 18:
From: don@lexmark.com One more thing, just one more thing. Every option or alternative adds one more thing. I think I'll pass on that one more thin mint. ******************************************* Don Wright don@lexmark.com Chair, IEEE SA Standards Board Member, IEEE-ISTO Board of Directors f.wright@ieee.org / f.wright@computer.org Director, Alliances and Standards Lexmark International 740 New Circle Rd C14/082-3 Lexington, Ky 40550 859-825-4808 (phone) 603-963-8352 (fax) ******************************************* "Steven Pemberton" <steven.pemberton@cwi.nl> on 10/15/2003 07:26:24 PM To: <don@lexmark.com> cc: "BIGELOW,JIM \(HP-Boise,ex1\)" <jim.bigelow@hp.com>, <w3c-html-wg@w3.org>, <don@lexmark.com>, <voyager-issues@mn.aptest.com>, <elliott.bradshaw@zoran.com>, <www-html@w3.org> Subject: Re: allow UTF-16 not just UTF-8 (PR#6774) But support for UTF 16 adds a few dozen bytes of code, and no extra memory requirements. It is simpler than UTF 8! What's the problem? Steven ----- Original Message ----- From: <don@lexmark.com> To: "Steven Pemberton" <Steven.Pemberton@cwi.nl> Cc: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>; <w3c-html-wg@w3.org>; <don@lexmark.com>; <voyager-issues@mn.aptest.com>; <elliott.bradshaw@zoran.com>; <www-html@w3.org> Sent: Thursday, October 16, 2003 12:20 AM Subject: Re: allow UTF-16 not just UTF-8 (PR#6774) > > Steven, et al: > > The real problem is that the entire XML architecture was designed assuming > high end boxes like the 3 GHz Pentium with 512 megabytes of memory. We > have already seen push back in other standards groups that consumer > electronic devices and other smaller, lighter devices cannot afford all the > luxuries demand by an obese XML architecture. Unless the XML community > accepts subsetting, we can't expect the broadest support for XML to happen > at the low end until the price/performance ratios experience another order > or two magnitude improvement. As recently reported in several of the trade > magazines focused on IT professionals, the deployment of XML and Web > Services are have significant negative impacts on the IT infrastructure > especially in the area of bandwidth utilization. This is just another > symptom of the same problem. > > I know I will lose this argument in the W3C but the realities of the > XHTML-Print implementations will blow off UTF-16 as more fat with no > benefit and simply not support it, "interoperable" or not. > > Sorry I'm not pure but practical. > > ******************************************* > Don Wright don@lexmark.com > > Chair, IEEE SA Standards Board > Member, IEEE-ISTO Board of Directors > f.wright@ieee.org / f.wright@computer.org > > Director, Alliances and Standards > Lexmark International > 740 New Circle Rd C14/082-3 > Lexington, Ky 40550 > 859-825-4808 (phone) 603-963-8352 (fax) > ******************************************* > > > > > "Steven Pemberton" <Steven.Pemberton@cwi.nl> on 10/15/2003 09:18:15 AM > > To: "BIGELOW,JIM \(HP-Boise,ex1\)" <jim.bigelow@hp.com>, > <w3c-html-wg@w3.org>, <don@lexmark.com> > cc: <voyager-issues@mn.aptest.com>, <elliott.bradshaw@zoran.com>, > <www-html@w3.org> > Subject: Re: allow UTF-16 not just UTF-8 (PR#6774) > > > > From: don@lexmark.com [mailto:don@lexmark.com] > > > So let me understand this.... > > > > Because people have poorly designed and written XML applications running > on > > 3 GHz Pentium 4s with 512 megabytes of real memory that do not allow the > > control over whether UTF-8 or UTF-16 are emitted, we are expecting to > burden > > $49 printers with code to be able to detect and interpret both. > > No Don. It is about interoperability and conforming to standards. XML > allows > documents to be encoded in either UTF8 or UTF 16: consumers must accept > both, producers may produce either. An XHTML-Print printer will be just a > consumer of an XML byte-stream at some IP address; we don't want to burden > every program in the world that can produce XML with a switch that says > "this output is going to a poor lowly XHTML Print processor that can't deal > with UTF-16, so please produce UTF-8", especially since UTF 16 is the easy > one to implement, and can only cost a few dozen bytes at best. > > If we changed this, XHTML Print would have to go back to last call, and you > can bet your boots that the XML community would rise up against us, as it > has in the past, and I can tell you we don't want to go there, and we would > have a hundred people registering objections. > > Conforming to XML requirements comes with the territory of being XHTML. The > XML community will not take lightly to us messing with their standards. > > Best wishes, > > Steven Pemberton > > > > > > >
FOLLOWUP 19:
From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com> Don, Here is a new section in the Design Rationale portion of the spec: <h3 id="s.1.3.7">1.3.7 Character Model</h3> <p> The W3C architectural specification <cite>Character Model for the World Wide Web 1.0</cite> [<a href="#ref_charmod">CHARMOD</a>] gives the <em title="RECOMMENDED in RFC 2119 context" class="RFC2119">RECOMMENDED</em> representation of characters in XHTML-Print. Authors of XHTML-Print producing applications <em title="SHOULD in RFC 2119 context" class="RFC2119">SHOULD</em> be aware that lost cost printers might be limited in both processing power and memory and therefore, that fully-normalized ([<a href="#ref_charmod">CHARMOD</a>], <a href="http://www.w3.org/TR/charmod/#sec-FullyNormalized">4.2.3) utf-8 encoded documents could print more quickly than documents in other forms and encodings. </p> I hope that this section will help discourage UTF-16. Jim
FOLLOWUP 20:
From: Henri Sivonen <hsivonen@iki.fi> On Thursday, Oct 16, 2003, at 01:20 Europe/Helsinki, don@lexmark.com wrote: > The real problem is that the entire XML architecture was designed > assuming > high end boxes like the 3 GHz Pentium with 512 megabytes of memory. Lesser devices can host expat. However, if a device can't host expat, perhaps it would be better to use something other than XML to communicate with the device. > We have already seen push back in other standards groups that consumer > electronic devices and other smaller, lighter devices cannot afford > all the > luxuries demand by an obese XML architecture. Unless the XML community > accepts subsetting, we can't expect the broadest support for XML to > happen > at the low end until the price/performance ratios experience another > order > or two magnitude improvement. If you subset XML, is support for the subset support for XML? What's the point of building a language on application-specific almost-XML? A Language built on such almost-XML breaks expectations (either in software or in the minds of people who need to deal with the language). If you can't use tools that are based on the assumption that the data they process is *exactly* XML and the programmers' knowledge about XML isn't guaranteed to apply, wouldn't it be less confusing to invent another grammar entirely and not call it XML? A well-defined extended subset of XML (for example: UTF-8 only, normalization form C only, no doctype, no PIs, no CDATA sections, no epilog, all HTML character entities predefined, namespace processing mandatory) would be more useful that having specs layered on top of XML 1.0 trying to readjust what XML 1.0 is. XHTML-Print printers get data over HTTP which is over TCP. It would be ludicrous to tweak the TCP header format in the XHTML-Print spec. > I know I will lose this argument in the W3C but the realities of the > XHTML-Print implementations will blow off UTF-16 as more fat with no > benefit and simply not support it, "interoperable" or not. Converting UTF-16 to UTF-8 really isn't a big deal. It's basically a matter of shifting bits. Considering eliminating fat, I'd much rather eliminate character entities[1] and references to the external DTD subset[2]. Character entities are a burden in any case. They require either processing the external DTD subset (bad for execution speed and memory requirements) or implementing an extra feature which doesn't belong in an XML processor (bad for conformance and yet redundant since there are conforming ways of representing characters). [1] http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML- Print?id=6776;user=guest [2] http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML- Print?id=6773;user=guest -- Henri Sivonen hsivonen@iki.fi http://www.iki.fi/hsivonen/
FOLLOWUP 21:
From: don@lexmark.com Steven: I think your answer proves my point that the XML commmunity did not and does not consider the limitations of low cost, constrained embedded environments when developing XML. You make the assertion that no extra memory is required yet the reality is quite the opposite. Please tell me if I'm wrong, but my understanding of UTF-8 and UTF-16 is that: 1) Every XHTML tag will require twice as many bytes when represented in UTF-16 versus UTF-8 2) Every English XHTML-Print print job will be twice as big encoded with UTF-16 versus UTF-8 3) Every "Latin 1" print job will be larger approaching 2X in size. When you double the data's size, buffers have to double to be able to hold and manipulate an equivalent amount of print stream content. There is real cost and performance costs to be paid to deal with UTF-16 encoding especially when dealing with western character sets. When a device is designed to deal with the far east "characters" there are other penalties to be paid in things like the size of the font load that mitigate the UTF-16 versus UTF-8 encoding issue. ******************************************* Don Wright don@lexmark.com Chair, IEEE SA Standards Board Member, IEEE-ISTO Board of Directors f.wright@ieee.org / f.wright@computer.org Director, Alliances and Standards Lexmark International 740 New Circle Rd C14/082-3 Lexington, Ky 40550 859-825-4808 (phone) 603-963-8352 (fax) ******************************************* "Steven Pemberton" <steven.pemberton@cwi.nl> on 10/15/2003 07:26:24 PM To: <don@lexmark.com> cc: "BIGELOW,JIM \(HP-Boise,ex1\)" <jim.bigelow@hp.com>, <w3c-html-wg@w3.org>, <don@lexmark.com>, <voyager-issues@mn.aptest.com>, <elliott.bradshaw@zoran.com>, <www-html@w3.org> Subject: Re: allow UTF-16 not just UTF-8 (PR#6774) But support for UTF 16 adds a few dozen bytes of code, and no extra memory requirements. It is simpler than UTF 8! What's the problem? Steven ----- Original Message ----- From: <don@lexmark.com> To: "Steven Pemberton" <Steven.Pemberton@cwi.nl> Cc: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>; <w3c-html-wg@w3.org>; <don@lexmark.com>; <voyager-issues@mn.aptest.com>; <elliott.bradshaw@zoran.com>; <www-html@w3.org> Sent: Thursday, October 16, 2003 12:20 AM Subject: Re: allow UTF-16 not just UTF-8 (PR#6774) > > Steven, et al: > > The real problem is that the entire XML architecture was designed assuming > high end boxes like the 3 GHz Pentium with 512 megabytes of memory. We > have already seen push back in other standards groups that consumer > electronic devices and other smaller, lighter devices cannot afford all the > luxuries demand by an obese XML architecture. Unless the XML community > accepts subsetting, we can't expect the broadest support for XML to happen > at the low end until the price/performance ratios experience another order > or two magnitude improvement. As recently reported in several of the trade > magazines focused on IT professionals, the deployment of XML and Web > Services are have significant negative impacts on the IT infrastructure > especially in the area of bandwidth utilization. This is just another > symptom of the same problem. > > I know I will lose this argument in the W3C but the realities of the > XHTML-Print implementations will blow off UTF-16 as more fat with no > benefit and simply not support it, "interoperable" or not. > > Sorry I'm not pure but practical. > > ******************************************* > Don Wright don@lexmark.com > > Chair, IEEE SA Standards Board > Member, IEEE-ISTO Board of Directors > f.wright@ieee.org / f.wright@computer.org > > Director, Alliances and Standards > Lexmark International > 740 New Circle Rd C14/082-3 > Lexington, Ky 40550 > 859-825-4808 (phone) 603-963-8352 (fax) > ******************************************* > > > > > "Steven Pemberton" <Steven.Pemberton@cwi.nl> on 10/15/2003 09:18:15 AM > > To: "BIGELOW,JIM \(HP-Boise,ex1\)" <jim.bigelow@hp.com>, > <w3c-html-wg@w3.org>, <don@lexmark.com> > cc: <voyager-issues@mn.aptest.com>, <elliott.bradshaw@zoran.com>, > <www-html@w3.org> > Subject: Re: allow UTF-16 not just UTF-8 (PR#6774) > > > > From: don@lexmark.com [mailto:don@lexmark.com] > > > So let me understand this.... > > > > Because people have poorly designed and written XML applications running > on > > 3 GHz Pentium 4s with 512 megabytes of real memory that do not allow the > > control over whether UTF-8 or UTF-16 are emitted, we are expecting to > burden > > $49 printers with code to be able to detect and interpret both. > > No Don. It is about interoperability and conforming to standards. XML > allows > documents to be encoded in either UTF8 or UTF 16: consumers must accept > both, producers may produce either. An XHTML-Print printer will be just a > consumer of an XML byte-stream at some IP address; we don't want to burden > every program in the world that can produce XML with a switch that says > "this output is going to a poor lowly XHTML Print processor that can't deal > with UTF-16, so please produce UTF-8", especially since UTF 16 is the easy > one to implement, and can only cost a few dozen bytes at best. > > If we changed this, XHTML Print would have to go back to last call, and you > can bet your boots that the XML community would rise up against us, as it > has in the past, and I can tell you we don't want to go there, and we would > have a hundred people registering objections. > > Conforming to XML requirements comes with the territory of being XHTML. The > XML community will not take lightly to us messing with their standards. > > Best wishes, > > Steven Pemberton > > > > > > >
FOLLOWUP 22:
From: "Steven Pemberton" <steven.pemberton@cwi.nl> Don, I've been wondering for a long time if that was the misunderstanding, but I was assured it wasn't. UTF 16 and UTF 8 are *external* representations. The internal amount of storage needed for them is identical, and completely up to you how you store. The only extra memory needed is the couple of dozen extra bytes of code to convert UTF 16 into whatever internal representation you use. Best wishes, Steven ----- Original Message ----- From: <don@lexmark.com> To: "Steven Pemberton" <steven.pemberton@cwi.nl> Cc: <don@lexmark.com>; "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>; <w3c-html-wg@w3.org>; <voyager-issues@mn.aptest.com>; <elliott.bradshaw@zoran.com>; <www-html@w3.org> Sent: Thursday, October 16, 2003 2:51 PM Subject: Re: allow UTF-16 not just UTF-8 (PR#6774) > > > Steven: > > I think your answer proves my point that the XML commmunity did not and > does not consider the limitations of low cost, constrained embedded > environments when developing XML. > > You make the assertion that no extra memory is required yet the reality is > quite the opposite. > > Please tell me if I'm wrong, but my understanding of UTF-8 and UTF-16 is > that: > > 1) Every XHTML tag will require twice as many bytes when represented in > UTF-16 versus UTF-8 > 2) Every English XHTML-Print print job will be twice as big encoded with > UTF-16 versus UTF-8 > 3) Every "Latin 1" print job will be larger approaching 2X in size. > > When you double the data's size, buffers have to double to be able to hold > and manipulate an equivalent amount of print stream content. There is real > cost and performance costs to be paid to deal with UTF-16 encoding > especially when dealing with western character sets. When a device is > designed to deal with the far east "characters" there are other penalties > to be paid in things like the size of the font load that mitigate the > UTF-16 versus UTF-8 encoding issue. > > ******************************************* > Don Wright don@lexmark.com > > Chair, IEEE SA Standards Board > Member, IEEE-ISTO Board of Directors > f.wright@ieee.org / f.wright@computer.org > > Director, Alliances and Standards > Lexmark International > 740 New Circle Rd C14/082-3 > Lexington, Ky 40550 > 859-825-4808 (phone) 603-963-8352 (fax) > ******************************************* > > > > > > "Steven Pemberton" <steven.pemberton@cwi.nl> on 10/15/2003 07:26:24 PM > > To: <don@lexmark.com> > cc: "BIGELOW,JIM \(HP-Boise,ex1\)" <jim.bigelow@hp.com>, > <w3c-html-wg@w3.org>, <don@lexmark.com>, > <voyager-issues@mn.aptest.com>, <elliott.bradshaw@zoran.com>, > <www-html@w3.org> > Subject: Re: allow UTF-16 not just UTF-8 (PR#6774) > > > But support for UTF 16 adds a few dozen bytes of code, and no extra memory > requirements. It is simpler than UTF 8! What's the problem? > > Steven > > ----- Original Message ----- > From: <don@lexmark.com> > To: "Steven Pemberton" <Steven.Pemberton@cwi.nl> > Cc: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>; > <w3c-html-wg@w3.org>; > <don@lexmark.com>; <voyager-issues@mn.aptest.com>; > <elliott.bradshaw@zoran.com>; <www-html@w3.org> > Sent: Thursday, October 16, 2003 12:20 AM > Subject: Re: allow UTF-16 not just UTF-8 (PR#6774) > > > > > > Steven, et al: > > > > The real problem is that the entire XML architecture was designed > assuming > > high end boxes like the 3 GHz Pentium with 512 megabytes of memory. We > > have already seen push back in other standards groups that consumer > > electronic devices and other smaller, lighter devices cannot afford all > the > > luxuries demand by an obese XML architecture. Unless the XML community > > accepts subsetting, we can't expect the broadest support for XML to > happen > > at the low end until the price/performance ratios experience another > order > > or two magnitude improvement. As recently reported in several of the > trade > > magazines focused on IT professionals, the deployment of XML and Web > > Services are have significant negative impacts on the IT infrastructure > > especially in the area of bandwidth utilization. This is just another > > symptom of the same problem. > > > > I know I will lose this argument in the W3C but the realities of the > > XHTML-Print implementations will blow off UTF-16 as more fat with no > > benefit and simply not support it, "interoperable" or not. > > > > Sorry I'm not pure but practical. > > > > ******************************************* > > Don Wright don@lexmark.com > > > > Chair, IEEE SA Standards Board > > Member, IEEE-ISTO Board of Directors > > f.wright@ieee.org / f.wright@computer.org > > > > Director, Alliances and Standards > > Lexmark International > > 740 New Circle Rd C14/082-3 > > Lexington, Ky 40550 > > 859-825-4808 (phone) 603-963-8352 (fax) > > ******************************************* > > > > > > > > > > "Steven Pemberton" <Steven.Pemberton@cwi.nl> on 10/15/2003 09:18:15 AM > > > > To: "BIGELOW,JIM \(HP-Boise,ex1\)" <jim.bigelow@hp.com>, > > <w3c-html-wg@w3.org>, <don@lexmark.com> > > cc: <voyager-issues@mn.aptest.com>, <elliott.bradshaw@zoran.com>, > > <www-html@w3.org> > > Subject: Re: allow UTF-16 not just UTF-8 (PR#6774) > > > > > > > From: don@lexmark.com [mailto:don@lexmark.com] > > > > > So let me understand this.... > > > > > > Because people have poorly designed and written XML applications > running > > on > > > 3 GHz Pentium 4s with 512 megabytes of real memory that do not allow > the > > > control over whether UTF-8 or UTF-16 are emitted, we are expecting to > > burden > > > $49 printers with code to be able to detect and interpret both. > > > > No Don. It is about interoperability and conforming to standards. XML > > allows > > documents to be encoded in either UTF8 or UTF 16: consumers must accept > > both, producers may produce either. An XHTML-Print printer will be just a > > consumer of an XML byte-stream at some IP address; we don't want to > burden > > every program in the world that can produce XML with a switch that says > > "this output is going to a poor lowly XHTML Print processor that can't > deal > > with UTF-16, so please produce UTF-8", especially since UTF 16 is the > easy > > one to implement, and can only cost a few dozen bytes at best. > > > > If we changed this, XHTML Print would have to go back to last call, and > you > > can bet your boots that the XML community would rise up against us, as it > > has in the past, and I can tell you we don't want to go there, and we > would > > have a hundred people registering objections. > > > > Conforming to XML requirements comes with the territory of being XHTML. > The > > XML community will not take lightly to us messing with their standards. > > > > Best wishes, > > > > Steven Pemberton > > > > > > > > > > > > > > > > > > > > >
FOLLOWUP 23:
From: Rowland Shaw <Rowland.Shaw@crystaldecisions.com> ...and for every Asian language, each character can take up to three bytes (in UTF-8 vs. two in UTF-16) Taking a complete random Japanese character (Hiragana Letter Small A) U+3041, in UTF-8 as 0xE3 0x81 0x81 -- this assumes that you are willing to deal with characters as a MBCS, and that you aren't going to convert to UCS2 internally. English has the biggest saving by saving as UTF-8 (so let it), but for most other languages, there is no benefit or worse, a 50% growth in sizes (vs. UTF-16). If UTF-16 is disallowed, it's no longer an XML application (which may be a road to go down) by definition on the minimum bar set for XML (back in the days of 486's and 8Mb machines). Thinking about it, my printer nowadays at home has more RAM in it than my PC when XML was being created... -----Original Message----- From: don@lexmark.com [mailto:don@lexmark.com] Sent: 16 October 2003 14:00 To: Steven Pemberton Cc: don@lexmark.com; BIGELOW,JIM (HP-Boise,ex1); w3c-html-wg@w3.org; voyager-issues@mn.aptest.com; elliott.bradshaw@zoran.com; www-html@w3.org Subject: Re: allow UTF-16 not just UTF-8 (PR#6774) Steven: I think your answer proves my point that the XML commmunity did not and does not consider the limitations of low cost, constrained embedded environments when developing XML. You make the assertion that no extra memory is required yet the reality is quite the opposite. Please tell me if I'm wrong, but my understanding of UTF-8 and UTF-16 is that: 1) Every XHTML tag will require twice as many bytes when represented in UTF-16 versus UTF-8 2) Every English XHTML-Print print job will be twice as big encoded with UTF-16 versus UTF-8 3) Every "Latin 1" print job will be larger approaching 2X in size. When you double the data's size, buffers have to double to be able to hold and manipulate an equivalent amount of print stream content. There is real cost and performance costs to be paid to deal with UTF-16 encoding especially when dealing with western character sets. When a device is designed to deal with the far east "characters" there are other penalties to be paid in things like the size of the font load that mitigate the UTF-16 versus UTF-8 encoding issue. ******************************************* Don Wright don@lexmark.com Chair, IEEE SA Standards Board Member, IEEE-ISTO Board of Directors f.wright@ieee.org / f.wright@computer.org Director, Alliances and Standards Lexmark International 740 New Circle Rd C14/082-3 Lexington, Ky 40550 859-825-4808 (phone) 603-963-8352 (fax) ******************************************* "Steven Pemberton" <steven.pemberton@cwi.nl> on 10/15/2003 07:26:24 PM To: <don@lexmark.com> cc: "BIGELOW,JIM \(HP-Boise,ex1\)" <jim.bigelow@hp.com>, <w3c-html-wg@w3.org>, <don@lexmark.com>, <voyager-issues@mn.aptest.com>, <elliott.bradshaw@zoran.com>, <www-html@w3.org> Subject: Re: allow UTF-16 not just UTF-8 (PR#6774) But support for UTF 16 adds a few dozen bytes of code, and no extra memory requirements. It is simpler than UTF 8! What's the problem? Steven ----- Original Message ----- From: <don@lexmark.com> To: "Steven Pemberton" <Steven.Pemberton@cwi.nl> Cc: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>; <w3c-html-wg@w3.org>; <don@lexmark.com>; <voyager-issues@mn.aptest.com>; <elliott.bradshaw@zoran.com>; <www-html@w3.org> Sent: Thursday, October 16, 2003 12:20 AM Subject: Re: allow UTF-16 not just UTF-8 (PR#6774) > > Steven, et al: > > The real problem is that the entire XML architecture was designed assuming > high end boxes like the 3 GHz Pentium with 512 megabytes of memory. We > have already seen push back in other standards groups that consumer > electronic devices and other smaller, lighter devices cannot afford all the > luxuries demand by an obese XML architecture. Unless the XML community > accepts subsetting, we can't expect the broadest support for XML to happen > at the low end until the price/performance ratios experience another order > or two magnitude improvement. As recently reported in several of the trade > magazines focused on IT professionals, the deployment of XML and Web > Services are have significant negative impacts on the IT infrastructure > especially in the area of bandwidth utilization. This is just another > symptom of the same problem. > > I know I will lose this argument in the W3C but the realities of the > XHTML-Print implementations will blow off UTF-16 as more fat with no > benefit and simply not support it, "interoperable" or not. > > Sorry I'm not pure but practical. > > ******************************************* > Don Wright don@lexmark.com > > Chair, IEEE SA Standards Board > Member, IEEE-ISTO Board of Directors > f.wright@ieee.org / f.wright@computer.org > > Director, Alliances and Standards > Lexmark International > 740 New Circle Rd C14/082-3 > Lexington, Ky 40550 > 859-825-4808 (phone) 603-963-8352 (fax) > ******************************************* > > > > > "Steven Pemberton" <Steven.Pemberton@cwi.nl> on 10/15/2003 09:18:15 AM > > To: "BIGELOW,JIM \(HP-Boise,ex1\)" <jim.bigelow@hp.com>, > <w3c-html-wg@w3.org>, <don@lexmark.com> > cc: <voyager-issues@mn.aptest.com>, <elliott.bradshaw@zoran.com>, > <www-html@w3.org> > Subject: Re: allow UTF-16 not just UTF-8 (PR#6774) > > > > From: don@lexmark.com [mailto:don@lexmark.com] > > > So let me understand this.... > > > > Because people have poorly designed and written XML applications running > on > > 3 GHz Pentium 4s with 512 megabytes of real memory that do not allow the > > control over whether UTF-8 or UTF-16 are emitted, we are expecting to > burden > > $49 printers with code to be able to detect and interpret both. > > No Don. It is about interoperability and conforming to standards. XML > allows > documents to be encoded in either UTF8 or UTF 16: consumers must accept > both, producers may produce either. An XHTML-Print printer will be just a > consumer of an XML byte-stream at some IP address; we don't want to burden > every program in the world that can produce XML with a switch that says > "this output is going to a poor lowly XHTML Print processor that can't deal > with UTF-16, so please produce UTF-8", especially since UTF 16 is the easy > one to implement, and can only cost a few dozen bytes at best. > > If we changed this, XHTML Print would have to go back to last call, and you > can bet your boots that the XML community would rise up against us, as it > has in the past, and I can tell you we don't want to go there, and we would > have a hundred people registering objections. > > Conforming to XML requirements comes with the territory of being XHTML. The > XML community will not take lightly to us messing with their standards. > > Best wishes, > > Steven Pemberton > > > > > > >
FOLLOWUP 24:
From: elliott.bradshaw@zoran.com Don, I agree with the argument that a front end can convert from UTF-16 to UTF-8 or whatever internal form is used, and have essentially no impact on memory needs. "A couple of dozen bytes" might be a little optimistic for this logic :^) , but it's pretty straightforward: -look at first 16 bits to detect a UTF-16 mark -for each double byte emit the UTF-8 (or other) equivalent Of course a printer could choose to store Asian data differently than Latin, and save some space compared to native UTF-8. This decision is orthogonal to the form of the input. But this logic may not be worth it and is not needed for compliance. Frugally, Elliott -------------------------------------------------------------------------------- Elliott Bradshaw Director, Software Engineering Zoran Imaging Division (formerly Oak Technology Imaging Group) 781 638-7534 Rowland Shaw <Rowland.Shaw@crystaldeci To: "'don@lexmark.com'" <don@lexmark.com>, Steven sions.com> Pemberton <steven.pemberton@cwi.nl> cc: "BIGELOW,JIM (HP-Boise,ex1)" 10/16/2003 09:16 AM <jim.bigelow@hp.com>, w3c-html-wg@w3.org, voyager-issues@mn.aptest.com, elliott.bradshaw@zoran.com, www-html@w3.org Subject: RE: allow UTF-16 not just UTF-8 (PR#6774) ...and for every Asian language, each character can take up to three bytes (in UTF-8 vs. two in UTF-16) Taking a complete random Japanese character (Hiragana Letter Small A) U+3041, in UTF-8 as 0xE3 0x81 0x81 -- this assumes that you are willing to deal with characters as a MBCS, and that you aren't going to convert to UCS2 internally. English has the biggest saving by saving as UTF-8 (so let it), but for most other languages, there is no benefit or worse, a 50% growth in sizes (vs. UTF-16). If UTF-16 is disallowed, it's no longer an XML application (which may be a road to go down) by definition on the minimum bar set for XML (back in the days of 486's and 8Mb machines). Thinking about it, my printer nowadays at home has more RAM in it than my PC when XML was being created... -----Original Message----- From: don@lexmark.com [mailto:don@lexmark.com] Sent: 16 October 2003 14:00 To: Steven Pemberton Cc: don@lexmark.com; BIGELOW,JIM (HP-Boise,ex1); w3c-html-wg@w3.org; voyager-issues@mn.aptest.com; elliott.bradshaw@zoran.com; www-html@w3.org Subject: Re: allow UTF-16 not just UTF-8 (PR#6774) Steven: I think your answer proves my point that the XML commmunity did not and does not consider the limitations of low cost, constrained embedded environments when developing XML. You make the assertion that no extra memory is required yet the reality is quite the opposite. Please tell me if I'm wrong, but my understanding of UTF-8 and UTF-16 is that: 1) Every XHTML tag will require twice as many bytes when represented in UTF-16 versus UTF-8 2) Every English XHTML-Print print job will be twice as big encoded with UTF-16 versus UTF-8 3) Every "Latin 1" print job will be larger approaching 2X in size. When you double the data's size, buffers have to double to be able to hold and manipulate an equivalent amount of print stream content. There is real cost and performance costs to be paid to deal with UTF-16 encoding especially when dealing with western character sets. When a device is designed to deal with the far east "characters" there are other penalties to be paid in things like the size of the font load that mitigate the UTF-16 versus UTF-8 encoding issue. ******************************************* Don Wright don@lexmark.com Chair, IEEE SA Standards Board Member, IEEE-ISTO Board of Directors f.wright@ieee.org / f.wright@computer.org Director, Alliances and Standards Lexmark International 740 New Circle Rd C14/082-3 Lexington, Ky 40550 859-825-4808 (phone) 603-963-8352 (fax) ******************************************* "Steven Pemberton" <steven.pemberton@cwi.nl> on 10/15/2003 07:26:24 PM To: <don@lexmark.com> cc: "BIGELOW,JIM \(HP-Boise,ex1\)" <jim.bigelow@hp.com>, <w3c-html-wg@w3.org>, <don@lexmark.com>, <voyager-issues@mn.aptest.com>, <elliott.bradshaw@zoran.com>, <www-html@w3.org> Subject: Re: allow UTF-16 not just UTF-8 (PR#6774) But support for UTF 16 adds a few dozen bytes of code, and no extra memory requirements. It is simpler than UTF 8! What's the problem? Steven ----- Original Message ----- From: <don@lexmark.com> To: "Steven Pemberton" <Steven.Pemberton@cwi.nl> Cc: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>; <w3c-html-wg@w3.org>; <don@lexmark.com>; <voyager-issues@mn.aptest.com>; <elliott.bradshaw@zoran.com>; <www-html@w3.org> Sent: Thursday, October 16, 2003 12:20 AM Subject: Re: allow UTF-16 not just UTF-8 (PR#6774) > > Steven, et al: > > The real problem is that the entire XML architecture was designed assuming > high end boxes like the 3 GHz Pentium with 512 megabytes of memory. We > have already seen push back in other standards groups that consumer > electronic devices and other smaller, lighter devices cannot afford all the > luxuries demand by an obese XML architecture. Unless the XML community > accepts subsetting, we can't expect the broadest support for XML to happen > at the low end until the price/performance ratios experience another order > or two magnitude improvement. As recently reported in several of the trade > magazines focused on IT professionals, the deployment of XML and Web > Services are have significant negative impacts on the IT infrastructure > especially in the area of bandwidth utilization. This is just another > symptom of the same problem. > > I know I will lose this argument in the W3C but the realities of the > XHTML-Print implementations will blow off UTF-16 as more fat with no > benefit and simply not support it, "interoperable" or not. > > Sorry I'm not pure but practical. > > ******************************************* > Don Wright don@lexmark.com > > Chair, IEEE SA Standards Board > Member, IEEE-ISTO Board of Directors > f.wright@ieee.org / f.wright@computer.org > > Director, Alliances and Standards > Lexmark International > 740 New Circle Rd C14/082-3 > Lexington, Ky 40550 > 859-825-4808 (phone) 603-963-8352 (fax) > ******************************************* > > > > > "Steven Pemberton" <Steven.Pemberton@cwi.nl> on 10/15/2003 09:18:15 AM > > To: "BIGELOW,JIM \(HP-Boise,ex1\)" <jim.bigelow@hp.com>, > <w3c-html-wg@w3.org>, <don@lexmark.com> > cc: <voyager-issues@mn.aptest.com>, <elliott.bradshaw@zoran.com>, > <www-html@w3.org> > Subject: Re: allow UTF-16 not just UTF-8 (PR#6774) > > > > From: don@lexmark.com [mailto:don@lexmark.com] > > > So let me understand this.... > > > > Because people have poorly designed and written XML applications running > on > > 3 GHz Pentium 4s with 512 megabytes of real memory that do not allow the > > control over whether UTF-8 or UTF-16 are emitted, we are expecting to > burden > > $49 printers with code to be able to detect and interpret both. > > No Don. It is about interoperability and conforming to standards. XML > allows > documents to be encoded in either UTF8 or UTF 16: consumers must accept > both, producers may produce either. An XHTML-Print printer will be just a > consumer of an XML byte-stream at some IP address; we don't want to burden > every program in the world that can produce XML with a switch that says > "this output is going to a poor lowly XHTML Print processor that can't deal > with UTF-16, so please produce UTF-8", especially since UTF 16 is the easy > one to implement, and can only cost a few dozen bytes at best. > > If we changed this, XHTML Print would have to go back to last call, and you > can bet your boots that the XML community would rise up against us, as it > has in the past, and I can tell you we don't want to go there, and we would > have a hundred people registering objections. > > Conforming to XML requirements comes with the territory of being XHTML. The > XML community will not take lightly to us messing with their standards. > > Best wishes, > > Steven Pemberton > > > > > > >
FOLLOWUP 25:
From: don@lexmark.com Steven: Of course I knew this was jsut the external representation. I'm trying to reduce conversions and reduce the sizes of buffers, etc. necessary to do this work. I have no doubt it can be done, I'm just trying to do things with smaller less powerful processors and with less available memory than what programmers normally expect to be available in today's environment. ******************************************* Don Wright don@lexmark.com Chair, IEEE SA Standards Board Member, IEEE-ISTO Board of Directors f.wright@ieee.org / f.wright@computer.org Director, Alliances and Standards Lexmark International 740 New Circle Rd C14/082-3 Lexington, Ky 40550 859-825-4808 (phone) 603-963-8352 (fax) ******************************************* "Steven Pemberton" <steven.pemberton@cwi.nl> on 10/16/2003 09:10:59 AM To: <don@lexmark.com> cc: <don@lexmark.com>, "BIGELOW,JIM \(HP-Boise,ex1\)" <jim.bigelow@hp.com>, <w3c-html-wg@w3.org>, <voyager-issues@mn.aptest.com>, <elliott.bradshaw@zoran.com>, <www-html@w3.org> Subject: Re: allow UTF-16 not just UTF-8 (PR#6774) Don, I've been wondering for a long time if that was the misunderstanding, but I was assured it wasn't. UTF 16 and UTF 8 are *external* representations. The internal amount of storage needed for them is identical, and completely up to you how you store. The only extra memory needed is the couple of dozen extra bytes of code to convert UTF 16 into whatever internal representation you use. Best wishes, Steven ----- Original Message ----- From: <don@lexmark.com> To: "Steven Pemberton" <steven.pemberton@cwi.nl> Cc: <don@lexmark.com>; "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>; <w3c-html-wg@w3.org>; <voyager-issues@mn.aptest.com>; <elliott.bradshaw@zoran.com>; <www-html@w3.org> Sent: Thursday, October 16, 2003 2:51 PM Subject: Re: allow UTF-16 not just UTF-8 (PR#6774) > > > Steven: > > I think your answer proves my point that the XML commmunity did not and > does not consider the limitations of low cost, constrained embedded > environments when developing XML. > > You make the assertion that no extra memory is required yet the reality is > quite the opposite. > > Please tell me if I'm wrong, but my understanding of UTF-8 and UTF-16 is > that: > > 1) Every XHTML tag will require twice as many bytes when represented in > UTF-16 versus UTF-8 > 2) Every English XHTML-Print print job will be twice as big encoded with > UTF-16 versus UTF-8 > 3) Every "Latin 1" print job will be larger approaching 2X in size. > > When you double the data's size, buffers have to double to be able to hold > and manipulate an equivalent amount of print stream content. There is real > cost and performance costs to be paid to deal with UTF-16 encoding > especially when dealing with western character sets. When a device is > designed to deal with the far east "characters" there are other penalties > to be paid in things like the size of the font load that mitigate the > UTF-16 versus UTF-8 encoding issue. > > ******************************************* > Don Wright don@lexmark.com > > Chair, IEEE SA Standards Board > Member, IEEE-ISTO Board of Directors > f.wright@ieee.org / f.wright@computer.org > > Director, Alliances and Standards > Lexmark International > 740 New Circle Rd C14/082-3 > Lexington, Ky 40550 > 859-825-4808 (phone) 603-963-8352 (fax) > ******************************************* > > > > > > "Steven Pemberton" <steven.pemberton@cwi.nl> on 10/15/2003 07:26:24 PM > > To: <don@lexmark.com> > cc: "BIGELOW,JIM \(HP-Boise,ex1\)" <jim.bigelow@hp.com>, > <w3c-html-wg@w3.org>, <don@lexmark.com>, > <voyager-issues@mn.aptest.com>, <elliott.bradshaw@zoran.com>, > <www-html@w3.org> > Subject: Re: allow UTF-16 not just UTF-8 (PR#6774) > > > But support for UTF 16 adds a few dozen bytes of code, and no extra memory > requirements. It is simpler than UTF 8! What's the problem? > > Steven > > ----- Original Message ----- > From: <don@lexmark.com> > To: "Steven Pemberton" <Steven.Pemberton@cwi.nl> > Cc: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>; > <w3c-html-wg@w3.org>; > <don@lexmark.com>; <voyager-issues@mn.aptest.com>; > <elliott.bradshaw@zoran.com>; <www-html@w3.org> > Sent: Thursday, October 16, 2003 12:20 AM > Subject: Re: allow UTF-16 not just UTF-8 (PR#6774) > > > > > > Steven, et al: > > > > The real problem is that the entire XML architecture was designed > assuming > > high end boxes like the 3 GHz Pentium with 512 megabytes of memory. We > > have already seen push back in other standards groups that consumer > > electronic devices and other smaller, lighter devices cannot afford all > the > > luxuries demand by an obese XML architecture. Unless the XML community > > accepts subsetting, we can't expect the broadest support for XML to > happen > > at the low end until the price/performance ratios experience another > order > > or two magnitude improvement. As recently reported in several of the > trade > > magazines focused on IT professionals, the deployment of XML and Web > > Services are have significant negative impacts on the IT infrastructure > > especially in the area of bandwidth utilization. This is just another > > symptom of the same problem. > > > > I know I will lose this argument in the W3C but the realities of the > > XHTML-Print implementations will blow off UTF-16 as more fat with no > > benefit and simply not support it, "interoperable" or not. > > > > Sorry I'm not pure but practical. > > > > ******************************************* > > Don Wright don@lexmark.com > > > > Chair, IEEE SA Standards Board > > Member, IEEE-ISTO Board of Directors > > f.wright@ieee.org / f.wright@computer.org > > > > Director, Alliances and Standards > > Lexmark International > > 740 New Circle Rd C14/082-3 > > Lexington, Ky 40550 > > 859-825-4808 (phone) 603-963-8352 (fax) > > ******************************************* > > > > > > > > > > "Steven Pemberton" <Steven.Pemberton@cwi.nl> on 10/15/2003 09:18:15 AM > > > > To: "BIGELOW,JIM \(HP-Boise,ex1\)" <jim.bigelow@hp.com>, > > <w3c-html-wg@w3.org>, <don@lexmark.com> > > cc: <voyager-issues@mn.aptest.com>, <elliott.bradshaw@zoran.com>, > > <www-html@w3.org> > > Subject: Re: allow UTF-16 not just UTF-8 (PR#6774) > > > > > > > From: don@lexmark.com [mailto:don@lexmark.com] > > > > > So let me understand this.... > > > > > > Because people have poorly designed and written XML applications > running > > on > > > 3 GHz Pentium 4s with 512 megabytes of real memory that do not allow > the > > > control over whether UTF-8 or UTF-16 are emitted, we are expecting to > > burden > > > $49 printers with code to be able to detect and interpret both. > > > > No Don. It is about interoperability and conforming to standards. XML > > allows > > documents to be encoded in either UTF8 or UTF 16: consumers must accept > > both, producers may produce either. An XHTML-Print printer will be just a > > consumer of an XML byte-stream at some IP address; we don't want to > burden > > every program in the world that can produce XML with a switch that says > > "this output is going to a poor lowly XHTML Print processor that can't > deal > > with UTF-16, so please produce UTF-8", especially since UTF 16 is the > easy > > one to implement, and can only cost a few dozen bytes at best. > > > > If we changed this, XHTML Print would have to go back to last call, and > you > > can bet your boots that the XML community would rise up against us, as it > > has in the past, and I can tell you we don't want to go there, and we > would > > have a hundred people registering objections. > > > > Conforming to XML requirements comes with the territory of being XHTML. > The > > XML community will not take lightly to us messing with their standards. > > > > Best wishes, > > > > Steven Pemberton > > > > > > > > > > > > > > > > > > > > >
FOLLOWUP 26:
From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com> Don and Steven, I want to expand on what you have said: Don wrote: > > 1) Every XHTML tag will require twice as many bytes when > > represented in UTF-16 versus UTF-8 > > 2) Every English XHTML-Print print job will be twice as > > big encoded with UTF-16 versus UTF-8 > > 3) Every "Latin 1" print job will be larger approaching > > 2X in size. > > > > When you double the data's size, buffers have to double to > > be able to hold and manipulate an equivalent amount of print > > stream content. This statement is only true for some print streams. See the discussion below in "The problem space". Steven wrote: > UTF 16 and UTF 8 are *external* representations. The internal > amount of storage needed for them is identical, and > completely up to you how you store. If a printer uses 16 bits internally to represent a character, then there shouldn't be a difference in buffering requirements between utf-8 and utf-16 encoded files (see below for a more complete discussion). However, if a printer uses 8 bits per character, then it has restricted itself to only handle a subset of possible documents, those with ASCII characters. This is a product-specific decision akin to that of whether to make a device print in color or black & white or support landscape as well as portrait printing. Therefore, I suggest that the spec say that a printer should support utf-16, just as it now says it should support CSS, landscape printing, and color -- within the limits of the device. If a user buys a low-cost device that can only print ASCII characters in portrait orientation, without color, style sheets, or images, hopefully the price was inline with the printer's abilities and other, more expensive, more capable devices are available as needed. Jim The problem space ---------------------- There is a document composition continuum from documents with only text, through mixed text and images, to documents that contain only images. At the text-only end of the continuum, the effects on the document size of UTF-16 vs. UTF-8 is a doubling of document size. At the image-only end of the continuum, the effects on the document size of encoding in UTF-16 versus UTF-8 are over-shadowed by the image data. The table below illustrates three points on the document composition continuum: 1. Text-only: a document that prints as one page of ASCII text (times, 10pt, 8in by 11in paper) [1]. Size, in bytes, is 6,282. 2. Text & Image: a one page document with one 3in x 5in image (166.7K bytes) and the remainder text [2]. Size, in bytes, of document and image is 171,531. 3. Image-only: a one page document with eight 2in x 3.25in images (703.2K bytes) and no text. [3] Size, in bytes, of document and eight images is 705,108. Size (bytes): utf-8: %doc : utf-16: %doc Text-only: 6,282: 100 : 12,566: 100 Text+Image: 4,776: 3.2 : 9,554: 5.4 (9,554 /(9,954+166,675)* 100) Image-only: 1,916: .27 : 3,834: .54 There is another point of variability: the characters in the text portions of the document. This is another continuum from ASCII only at one end to Japanese, Chinese, Korean, and Hindi at the other. "Table 1: UTF types" of [4] gives the following average bytes per code point utf-8 utf-16 English 1 2 Latin-1 1.1 2 Greek, Russian, Arabic, Hebrew 1.7 2 Japanese, Chinese Korean Hindi 3 2 As the language/script of the text portion of the document changes from English-only toward other scripts and languages, the size difference between utf-8 and utf-16 decreases. End-to-end solution ------------------- If you look at the end-to-end solution, from the sending application to the printer, the stages can be thought of as: 1. Sending Device: the data as represented in the sending device (a cell phone for example) 2. Transmission: the data combined with markup and style information as and XHTML-Print data stream and then encoded in either UTF-8 or UTF-16 3. Receiving Device: the printer -- breaking this into two parts gives: 3.a The XHTML-Print data stream as received 3.b The data without markup and style information and before printing. How the data is stored is implementation dependent and how much memory is used depends on how a character is represented -- 8 or 16 bits, and how much buffer of the document is buffered. Each printer makes these choices, 8bits/char restricted the documents processed to Latin1 characters. Stage Size utf-8 utf-16 1. app n - - 2. xmit n n-3n* 2n 3a. Pr n n-3n 2n 3b. Pr** n n-2n n-2n * n-3n shows the variable sizing depending on characters being encode: English only (n), CJK (3n) ** at Stage 3b, representing a character with 8bits restricts the characters that can be represented to ASCII or Latin 1, 16 bits can represent all characters. Internal representation If a printer uses 16 bits internally to represent a character, then there shouldn't be difference in buffering requirements between utf-8 and utf-16 encoded files. However, if a printer uses 8 bits, then it has restricted itself to only handle a subset of documents. This is a product-specific decision akin to that of supporting color or not. Therefore, I suggest that the spec say that a printer should support utf-16 just as it now say it should support CSS, landscape printing, and color -- within the limits of the device. If a user buys a low-cost device that can only print ASCII characters in portrait orientation, without color, images or style, hopefully the price is inline with the printer's abilities and other, more expensive, more capable devices are available as needed. [1] http://www.pwg.org/xhtml-print/W3C-Version/georgeb.html [2] http://www.pwg.org/xhtml-print/W3C-Version/text+image.html [3] http://www.pwg.org/xhtml-print/W3C-Version/image-only.html [4] http://www-106.ibm.com/developerworks/library/utfencodingforms/
FOLLOWUP 27:
From: Michael Sweet <mike@easysw.com> BIGELOW,JIM (HP-Boise,ex1) wrote: > ... > If a printer uses 16 bits internally to represent a character, then there > shouldn't be a difference in buffering requirements between utf-8 and utf-16 > encoded files (see below for a more complete discussion). However, if a > printer uses 8 bits per character, then it has restricted itself to only > handle a subset of possible documents, those with ASCII characters. This is > ... I suggest there is another alternative - the implementation can simply convert UTF-16 to UTF-8 as the document is being read, so contrary to the previous comments there is no additional buffer memory overhead, merely a small amount of code to convert from UTF-16 to UTF-8. Whether the implementation chooses to limit support to "latin" text or not is another issue, but either way the *internal* representation can be controlled by the vendor separate from the external UTF-8/UTF-16/whatever representation. -- ______________________________________________________________________ Michael Sweet, Easy Software Products mike at easysw dot com Printing Software for UNIX http://www.easysw.com
FOLLOWUP 28:
From: "Steven Pemberton" <steven.pemberton@cwi.nl> UTF 8 and UTF 16 are just definitions of how you send a Unicode character stream in an interoperable way over the wire. The character set is the same, the characters are the same, it is just the encoding that is different. It is orthogonal to questions of how characters are stored internally. You can do what you like internally, it is completely up to you. It has no effect on the memory requirements of the receiving device, because you have to convert to your internal form anyway. Steven ----- Original Message ----- From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com> To: "Steven Pemberton" <steven.pemberton@cwi.nl>; <don@lexmark.com> Cc: <w3c-html-wg@w3.org>; <voyager-issues@mn.aptest.com>; <elliott.bradshaw@zoran.com>; <www-html@w3.org>; <mike@easysw.com> Sent: Friday, October 17, 2003 3:15 AM Subject: RE: allow UTF-16 not just UTF-8 (PR#6774) > Don and Steven, > > I want to expand on what you have said: > Don wrote: > > > 1) Every XHTML tag will require twice as many bytes when > > > represented in UTF-16 versus UTF-8 > > > 2) Every English XHTML-Print print job will be twice as > > > big encoded with UTF-16 versus UTF-8 > > > 3) Every "Latin 1" print job will be larger approaching > > > 2X in size. > > > > > > When you double the data's size, buffers have to double to > > > be able to hold and manipulate an equivalent amount of print > > > stream content. > > This statement is only true for some print streams. See the discussion below > in "The problem space". > > Steven wrote: > > UTF 16 and UTF 8 are *external* representations. The internal > > amount of storage needed for them is identical, and > > completely up to you how you store. > > If a printer uses 16 bits internally to represent a character, then there > shouldn't be a difference in buffering requirements between utf-8 and utf-16 > encoded files (see below for a more complete discussion). However, if a > printer uses 8 bits per character, then it has restricted itself to only > handle a subset of possible documents, those with ASCII characters. This is > a product-specific decision akin to that of whether to make a device print > in color or black & white or support landscape as well as portrait printing. > Therefore, I suggest that the spec say that a printer should support utf-16, > just as it now says it should support CSS, landscape printing, and color -- > within the limits of the device. If a user buys a low-cost device that can > only print ASCII characters in portrait orientation, without color, style > sheets, or images, hopefully the price was inline with the printer's > abilities and other, more expensive, more capable devices are available as > needed. > > Jim > > > The problem space > ---------------------- > There is a document composition continuum from documents with only text, > through mixed text and images, to documents that contain only images. At > the text-only end of the continuum, the effects on the document size of > UTF-16 vs. UTF-8 is a doubling of document size. At the image-only end of > the continuum, the effects on the document size of encoding in UTF-16 versus > UTF-8 are over-shadowed by the image data. > > The table below illustrates three points on the document composition > continuum: > 1. Text-only: a document that prints as one page of ASCII text (times, 10pt, > 8in by 11in paper) [1]. Size, in bytes, is 6,282. > > 2. Text & Image: a one page document with one 3in x 5in image (166.7K bytes) > and the remainder text [2]. Size, in bytes, of document and image is > 171,531. > > 3. Image-only: a one page document with eight 2in x 3.25in images (703.2K > bytes) and no text. [3] Size, in bytes, of document and eight images is > 705,108. > > Size (bytes): utf-8: %doc : utf-16: %doc > Text-only: 6,282: 100 : 12,566: 100 > Text+Image: 4,776: 3.2 : 9,554: 5.4 (9,554 /(9,954+166,675)* 100) > Image-only: 1,916: .27 : 3,834: .54 > > There is another point of variability: the characters in the text portions > of the document. This is another continuum from ASCII only at one end to > Japanese, Chinese, Korean, and Hindi at the other. > > "Table 1: UTF types" of [4] gives the following average bytes per code point > > utf-8 utf-16 > English 1 2 > Latin-1 1.1 2 > Greek, > Russian, > Arabic, > Hebrew 1.7 2 > Japanese, > Chinese > Korean > Hindi 3 2 > > As the language/script of the text portion of the document changes from > English-only toward other scripts and languages, the size difference between > utf-8 and utf-16 decreases. > > > End-to-end solution > ------------------- > If you look at the end-to-end solution, from the sending application to the > printer, the stages can be thought of as: > 1. Sending Device: the data as represented in the sending device (a cell > phone for example) > 2. Transmission: the data combined with markup and style information as and > XHTML-Print data stream and then encoded in either UTF-8 or UTF-16 > 3. Receiving Device: the printer -- breaking this into two parts gives: > 3.a The XHTML-Print data stream as received > 3.b The data without markup and style information and before printing. How > the data is stored is implementation dependent and how much memory is used > depends on how a character is represented -- 8 or 16 bits, and how much > buffer of the document is buffered. Each printer makes these choices, > 8bits/char restricted the documents processed to Latin1 characters. > > > > Stage Size utf-8 utf-16 > 1. app n - - > 2. xmit n n-3n* 2n > 3a. Pr n n-3n 2n > 3b. Pr** n n-2n n-2n > > * n-3n shows the variable sizing depending on characters being encode: > English only (n), CJK (3n) > ** at Stage 3b, representing a character with 8bits restricts the characters > that can be represented to ASCII or Latin 1, 16 bits can represent all > characters. > > Internal representation > > If a printer uses 16 bits internally to represent a character, then there > shouldn't be difference in buffering requirements between utf-8 and utf-16 > encoded files. However, if a printer uses 8 bits, then it has restricted > itself to only handle a subset of documents. This is a product-specific > decision akin to that of supporting color or not. Therefore, I suggest that > the spec say that a printer should support utf-16 just as it now say it > should support CSS, landscape printing, and color -- within the limits of > the device. If a user buys a low-cost device that can only print ASCII > characters in portrait orientation, without color, images or style, > hopefully the price is inline with the printer's abilities and other, more > expensive, more capable devices are available as needed. > > > > [1] http://www.pwg.org/xhtml-print/W3C-Version/georgeb.html > [2] http://www.pwg.org/xhtml-print/W3C-Version/text+image.html > [3] http://www.pwg.org/xhtml-print/W3C-Version/image-only.html > > [4] http://www-106.ibm.com/developerworks/library/utfencodingforms/ > >
FOLLOWUP 29:
From: don@lexmark.com Steven: You perception of how this works in an embedded device especially in a printer that will use this in Bluetooth, UPNP and other environments is clearly tainted by your experience of this with the Web and PCs. 0) Of course UTF-8 versus UTF-16 is orthogonal to the internal representation of the "printer" but not until it is in the "printer" and off the "network" 1) As defined to be used by Bluetooth and in other environments, the data is PUSHed to the device rather than being pulled. You have less control over the amount of data being sent. 2) The network buffers are in the same constrained memory space as the processor for XHTML-Print. Chunks from the network have to be buffered by the network process until they can be dealt with by the TCP processes which buffers them until they can be dealt with by the XHTML-Print process. All this is done in that same limited, constrained memory space. If I'm going to maintain performance levels customers expect, I need to be able to buffer up in multiple buffers this data equivalent amounts of CONTENT which in English encoded UTF-16 is TWICE as many bytes as UTF-8. It is unreasonable to expected the network or TCP process within the device to convert UTF-16 to the internal format; that happens when it actually hits the "printer." So while it might not take any more memory in the "printer" because the content is converted to an internal format, before it reaches the "printer" but while it is in the embedded physical device called a printer, it does. Do you get it yet? In the PC world, the user agent doesn't have to worry about all the underlying details necessary to have the content delivered from the network. We don't have that luxury in the embedded space. All that work is done by the same processor and with the same limited memory. How else do you think we can sell printers for $29?? ******************************************* Don Wright don@lexmark.com Chair, IEEE SA Standards Board Member, IEEE-ISTO Board of Directors f.wright@ieee.org / f.wright@computer.org Director, Alliances and Standards Lexmark International 740 New Circle Rd C14/082-3 Lexington, Ky 40550 859-825-4808 (phone) 603-963-8352 (fax) ******************************************* "Steven Pemberton" <steven.pemberton@cwi.nl> on 10/17/2003 08:55:07 AM To: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>, <don@lexmark.com> cc: <w3c-html-wg@w3.org>, <voyager-issues@mn.aptest.com>, <elliott.bradshaw@zoran.com>, <www-html@w3.org>, <mike@easysw.com> Subject: Re: allow UTF-16 not just UTF-8 (PR#6774) UTF 8 and UTF 16 are just definitions of how you send a Unicode character stream in an interoperable way over the wire. The character set is the same, the characters are the same, it is just the encoding that is different. It is orthogonal to questions of how characters are stored internally. You can do what you like internally, it is completely up to you. It has no effect on the memory requirements of the receiving device, because you have to convert to your internal form anyway. Steven ----- Original Message ----- From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com> To: "Steven Pemberton" <steven.pemberton@cwi.nl>; <don@lexmark.com> Cc: <w3c-html-wg@w3.org>; <voyager-issues@mn.aptest.com>; <elliott.bradshaw@zoran.com>; <www-html@w3.org>; <mike@easysw.com> Sent: Friday, October 17, 2003 3:15 AM Subject: RE: allow UTF-16 not just UTF-8 (PR#6774) > Don and Steven, > > I want to expand on what you have said: > Don wrote: > > > 1) Every XHTML tag will require twice as many bytes when > > > represented in UTF-16 versus UTF-8 > > > 2) Every English XHTML-Print print job will be twice as > > > big encoded with UTF-16 versus UTF-8 > > > 3) Every "Latin 1" print job will be larger approaching > > > 2X in size. > > > > > > When you double the data's size, buffers have to double to > > > be able to hold and manipulate an equivalent amount of print > > > stream content. > > This statement is only true for some print streams. See the discussion below > in "The problem space". > > Steven wrote: > > UTF 16 and UTF 8 are *external* representations. The internal > > amount of storage needed for them is identical, and > > completely up to you how you store. > > If a printer uses 16 bits internally to represent a character, then there > shouldn't be a difference in buffering requirements between utf-8 and utf-16 > encoded files (see below for a more complete discussion). However, if a > printer uses 8 bits per character, then it has restricted itself to only > handle a subset of possible documents, those with ASCII characters. This is > a product-specific decision akin to that of whether to make a device print > in color or black & white or support landscape as well as portrait printing. > Therefore, I suggest that the spec say that a printer should support utf-16, > just as it now says it should support CSS, landscape printing, and color -- > within the limits of the device. If a user buys a low-cost device that can > only print ASCII characters in portrait orientation, without color, style > sheets, or images, hopefully the price was inline with the printer's > abilities and other, more expensive, more capable devices are available as > needed. > > Jim > > > The problem space > ---------------------- > There is a document composition continuum from documents with only text, > through mixed text and images, to documents that contain only images. At > the text-only end of the continuum, the effects on the document size of > UTF-16 vs. UTF-8 is a doubling of document size. At the image-only end of > the continuum, the effects on the document size of encoding in UTF-16 versus > UTF-8 are over-shadowed by the image data. > > The table below illustrates three points on the document composition > continuum: > 1. Text-only: a document that prints as one page of ASCII text (times, 10pt, > 8in by 11in paper) [1]. Size, in bytes, is 6,282. > > 2. Text & Image: a one page document with one 3in x 5in image (166.7K bytes) > and the remainder text [2]. Size, in bytes, of document and image is > 171,531. > > 3. Image-only: a one page document with eight 2in x 3.25in images (703.2K > bytes) and no text. [3] Size, in bytes, of document and eight images is > 705,108. > > Size (bytes): utf-8: %doc : utf-16: %doc > Text-only: 6,282: 100 : 12,566: 100 > Text+Image: 4,776: 3.2 : 9,554: 5.4 (9,554 /(9,954+166,675)* 100) > Image-only: 1,916: .27 : 3,834: .54 > > There is another point of variability: the characters in the text portions > of the document. This is another continuum from ASCII only at one end to > Japanese, Chinese, Korean, and Hindi at the other. > > "Table 1: UTF types" of [4] gives the following average bytes per code point > > utf-8 utf-16 > English 1 2 > Latin-1 1.1 2 > Greek, > Russian, > Arabic, > Hebrew 1.7 2 > Japanese, > Chinese > Korean > Hindi 3 2 > > As the language/script of the text portion of the document changes from > English-only toward other scripts and languages, the size difference between > utf-8 and utf-16 decreases. > > > End-to-end solution > ------------------- > If you look at the end-to-end solution, from the sending application to the > printer, the stages can be thought of as: > 1. Sending Device: the data as represented in the sending device (a cell > phone for example) > 2. Transmission: the data combined with markup and style information as and > XHTML-Print data stream and then encoded in either UTF-8 or UTF-16 > 3. Receiving Device: the printer -- breaking this into two parts gives: > 3.a The XHTML-Print data stream as received > 3.b The data without markup and style information and before printing. How > the data is stored is implementation dependent and how much memory is used > depends on how a character is represented -- 8 or 16 bits, and how much > buffer of the document is buffered. Each printer makes these choices, > 8bits/char restricted the documents processed to Latin1 characters. > > > > Stage Size utf-8 utf-16 > 1. app n - - > 2. xmit n n-3n* 2n > 3a. Pr n n-3n 2n > 3b. Pr** n n-2n n-2n > > * n-3n shows the variable sizing depending on characters being encode: > English only (n), CJK (3n) > ** at Stage 3b, representing a character with 8bits restricts the characters > that can be represented to ASCII or Latin 1, 16 bits can represent all > characters. > > Internal representation > > If a printer uses 16 bits internally to represent a character, then there > shouldn't be difference in buffering requirements between utf-8 and utf-16 > encoded files. However, if a printer uses 8 bits, then it has restricted > itself to only handle a subset of documents. This is a product-specific > decision akin to that of supporting color or not. Therefore, I suggest that > the spec say that a printer should support utf-16 just as it now say it > should support CSS, landscape printing, and color -- within the limits of > the device. If a user buys a low-cost device that can only print ASCII > characters in portrait orientation, without color, images or style, > hopefully the price is inline with the printer's abilities and other, more > expensive, more capable devices are available as needed. > > > > [1] http://www.pwg.org/xhtml-print/W3C-Version/georgeb.html > [2] http://www.pwg.org/xhtml-print/W3C-Version/text+image.html > [3] http://www.pwg.org/xhtml-print/W3C-Version/image-only.html > > [4] http://www-106.ibm.com/developerworks/library/utfencodingforms/ > >
FOLLOWUP 30:
From: Michael Sweet <mike@easysw.com> don@lexmark.com wrote: > ... > 1) As defined to be used by Bluetooth and in other environments, the > data is PUSHed to the device rather than being pulled. You have less > control over the amount of data being sent. > ... The "push" model is also used for USB, parallel, and serial printing, and the current print devices seem to have no problem with flow control over these or network interfaces. It might mean that customers will see slower printing with UTF-16 data, but between the spec and any documentation you provide to developers and customers, it shouldn't surprise anyone... -- ______________________________________________________________________ Michael Sweet, Easy Software Products mike@easysw.com Printing Software for UNIX http://www.easysw.com
FOLLOWUP 31:
From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com> Message from Don Wright of Lexmark: Jim: I noticed after my last message: http://lists.w3.org/Archives/Member/w3c-html-wg/2003OctDec/0086.html Pemberton and others in the group ceased the e-mail thread. Did I convince them or have they given up on me? ********************************************** Don Wright don@lexmark.com Chair, IEEE SA Standards Board Member, IEEE-ISTO Board of Directors f.wright@ieee.org / f.wright@computer.org Director, Alliances & Standards Lexmark International 740 New Circle Rd Lexington, Ky 40550 859-825-4808 (phone) 603-963-8352 (fax) **********************************************
FOLLOWUP 32:
From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com> My reply to Don's emailed question: >Pemberton and others in the group ceased the e-mail thread. Did >I convince them or have they given up on me? Don, I think that the case for and against UTF-16 support in XHTML-Print has been made. We discussed UTF-8/UTF-16 and the XHTML-Print spec in 10/22/03 HTML WG phone conference. The group has officially voted to ask the Director to make XHTML-Print W3C Working Draft 20 October 2003 [1] a Candidate Recommendation, noting your dissenting opinion on required UTF-16 support. Steven Pemberton feels that the director will agree to make the specification a Candidate Recommendation. You may register a formal objection [2] concerning UTF-16 support in XHTML-Print, if you feel that your comments on this issue haven't sufficiently represented your position. Please continue to CC: voyager-issues@mn.aptest.com on any further discussions, since this provide an archive. The Disposition of Comments for XHTML-Print is at [3]. Jim [1] http://www.w3.org/MarkUp/Group/2003/WD-xhtml-print-20031020/ [2] http://www.w3.org/2003/06/Process-20030618/policies.html#WGArchiveMinorityVi ews [3] http://www.w3.org/MarkUp/Group/2003/xhtml-print-cr-doc-20031017.html
REPLY 1:
From: Jim Bigelow <voyager-issues@mn.aptest.com> Thank you for your comment on the XHTML-Print Last Call Working Draft. It is recorded as issue 6774 [1] in the HTML Working Group's issue tracking system. The working group agrees that since XHTML-Print is a member of the family of XHTML 1.0 languages documents encodings cannot be restricted to UTF-8 but must also include UTF-16. The specification will be modified to remove the sentence, 'The only valid value for the "charset" parameter is "utf-8".' If you feel that this resolution of your comment is not acceptable, please respond to this message with your comments. Jim Bigelow Editor [1] http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6774;user=guest
REPLY 2:
From: Jim Bigelow <voyager-issues@mn.aptest.com> Don, What do you think of the following compromise? 1. say nothing about whether a printer supports UTF-8 or UTF-16 2. require that conforming XHTML-Print documents be encoded in UTF-8 by requiring that conforming clients (Section 2.2) creating documents that are encoded in UF-8. This means adding the following to item 1 of Section 2.2: 1. Clients SHALL produce a well-formed XHTML-Print document as defined in XHTML 1.0 [XHTML1] and in Document Conformance. The document SHALL be encoded using UTF-8 [RFC2279]. Jim Bigelow
REPLY 3:
From: Jim Bigelow <voyager-issues@mn.aptest.com> Don and Elliott, The HTML working group discussed my question of why and XHTML-Print processor must be a conforming XML processor (in particular, why it must support both UTF-8 and UTF-16 encodings) on October 1, 2003. The answer is that XHTML-Print must be a conforming XML processor and support both UTF-8 and UTF-16 encodings to preserve compatibility between xml-based applications. If XHTML-Print processors only supported UTF-8 then an xml-based application could not be reliably depended upon to emit an XHTML-Print document that the XHTML-print application could process. For example, an xml-based Xforms application's output of an XHTML-Print document cannot be restricted by the XHTML-Print specification to UTF-8 since the application may not be able to control the encoding. Section 4.3.3 [1] and Appendix F [2] of the XML specification [3] give heuristics for determing a document's encoding when the charset parameter of the MIME type [4] is absent. An example UTF-16 decoder is available at [5] other encodings are at [6]. Jim Bigelow [1] http://www.w3.org/TR/REC-xml#charencoding [2] http://www.w3.org/TR/REC-xml#sec-guessing [3] http://www.w3.org/TR/REC-xml [4] http://www.ietf.org/rfc/rfc3023.txt [5] http://interscript.sourceforge.net/interscript/doc/en_iscr_0282.html [6] http://interscript.sourceforge.net/interscript/doc/en_iscr_0275.html
PROBLEM ID: 6775
STATE: Closed
RESOLUTION: Accept
USER POSITION: Agree
NOTES:
Agreed changed wording to say resources
ORIGINAL MESSAGE:
From: Henri Sivonen <hsivonen@iki.fi> From: Henri Sivonen <hsivonen@iki.fi> To: www-html-editor@w3.org Subject: why does object type override content type/HTTP level? Date: Sun, 3 Aug 2003 22:01:47 +0300 Message-Id: <EE667E7F-C5E4-11D7-B77B-003065B8CF0E@iki.fi> X-Archived-At: http://www.w3.org/mid/EE667E7F-C5E4-11D7-B77B-003065B8CF0E@iki.fi 3.10 Object Module "A printer MUST treat the object as a jpeg image when the value of the object element's type attribute is 'text/jpeg'." Why is the type attribute allowed to override the content type information delivered on the Application/Vnd.pwg-multiplexed or HTTP level? Previously the type attribute has been considered advisory so that user agents may omit requesting object they know they can't handle. (I assume "text/jpeg" is a mistake and means "image/jpeg"). [extracted from issue 6548] -- Henri Sivonen hsivonen@iki.fi http://www.iki.fi/hsivonen/
REPLY 1:
From: Jim Bigelow <voyager-issues@mn.aptest.com> Thank you for your comment on the XHTML-Print Last Call Working Draft. It is recorded as issue 6775 [1] in the HTML Working Group's issue tracking system. The working group agrees with your comments by modifying the text of section 3.10 to read, "A printer must support resources of type 'image/jpeg'." If you feel that this resolution of your comment is not acceptable, please respond to this message with your comments. Jim Bigelow Editor [1] http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6775;user=guest
PROBLEM ID: 6870
STATE: Closed
RESOLUTION: Accept
USER POSITION: Agree
NOTES:
changed to "all"
ORIGINAL MESSAGE:
From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com> From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com> To: w3c-html-editor@w3.org Cc: xp@pwg.org Subject: XHTML-Print: treating a missing media attribute as media="screen" when printing not user's intent Date: Thu, 4 Sep 2003 14:10:55 -0400 Message-ID: <020A3CF87FB5AC47AA67966B33845755050DB594@xboi22.boise.itc.hp.com> Sections 3.13 and 3.15 of the W3C Last Call Working Draft of XHTML-Print [1] state, "The absence of the media attribute MUST be treat[ed] as if the media attribute had the value 'screen.'" At the risk of be accused of mind reading, I think that most document authors do not write style sheets for printing but would like the styles to be applied when printing as well as browsing. Therefore changing the value "screen" in the statement shown above to the value "all" would give more consistent results when browsing and printing. [1] http://www.w3.org/TR/2003/WD-xhtml-print-20030729/ Jim Bigelow Hewlett-Packard Co.
REPLY 1:
From: Jim Bigelow <voyager-issues@mn.aptest.com> Jonny wrote: I am starting to believe that this error isn't a bug (yes, the default value *is* "all"), but a virus the way it keeps replicating. Anyone willing to guess which spec it will infect next? -- Jonny Axelsson, Web Standards, Opera Software
REPLY 2:
From: Jim Bigelow <voyager-issues@mn.aptest.com> Don Wright wrote: Jim: "All" is consistant with XHTML2. See http://www.w3.org/MarkUp/Group/2003/WD-xhtml2-20030810/abstraction.html#dt_MediaDesc ********************************************** Don Wright don@lexmark.com
REPLY 3:
From: Jim Bigelow <voyager-issues@mn.aptest.com> Thank you for your comment on the XHTML-Print Last Call Working Draft. It is recorded as issue 6870 [1] in the HTML Working Group's issue tracking system. The working group has elected to implement you suggestions. Jim Bigelow Editor [1] http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6870;user=guest
PROBLEM ID: 6776
STATE: Closed
RESOLUTION: Reject
USER POSITION: Agree
NOTES:
No response to response to reply 2, assuming agreement.
ORIGINAL MESSAGE:
From: Henri Sivonen <hsivonen@iki.fi> From: Henri Sivonen <hsivonen@iki.fi> To: www-html-editor@w3.org Subject: support for character entities too expensive for low-cost printers Date: Sun, 3 Aug 2003 22:01:47 +0300 Message-Id: <EE667E7F-C5E4-11D7-B77B-003065B8CF0E@iki.fi> X-Archived-At: http://www.w3.org/mid/EE667E7F-C5E4-11D7-B77B-003065B8CF0E@iki.fi 3.17 Character Entities The specification mentions that character entities are defined but doesn't say whether printers should support them. I think requiring XHTML-Print implementations to support character entities would be a very bad idea. Support for character entities is the only feature of XHTML-Print that requires the printer to process external entities. The burden of implementing a DTD catalog and parsing the huge (relative to the size of the usual XHTML documents) DTD files is significant compared to using a non-validating XML processor and not processing enternal entities at all. Since XHTML-Print is intended to be used with low-cost printers and the overwhelmingly most likely use case is that the documents are generated by software as opposed to being written by hand by humans, I suggest explicitly stating that printers should not be expected to support character entities (or any other features of XML that depend on the external entities to be processed, such as attribute defaulting). [extracted from issue 6548] -- Henri Sivonen hsivonen@iki.fi http://www.iki.fi/hsivonen/
FOLLOWUP 1:
From: Henri Sivonen <hsivonen@iki.fi> On Saturday, Sep 27, 2003, at 00:26 Europe/Helsinki, Jim Bigelow wrote: > The working group does not agree with you concerning > requiring support a set of predefined character entities. > The group feels that the set of required character > entities has a small memory foot print when implemented as > a data set. Furthermore, such a data set does not require > that a printer read the DTD. Therefore, no change to the > specification is planned in this regard. The problem is that implementing such data set without reading the DTD would mean that the parser would not be a XML processor as defined in the XML spec. Using a modified parser would break one of XML's benefits: the ability to use a ready-made off-the-shelf parser whose functionality is well defined. Also, having such almost-XML processors around could cause interoperability problems, since different parsers would have different idea of what the pre-defined entities were and, therefore, what entity references rendered a document not well-formed. -- Henri Sivonen hsivonen@iki.fi http://www.iki.fi/hsivonen/
REPLY 1:
From: Jim Bigelow <voyager-issues@mn.aptest.com> Thank you for your comment on the XHTML-Print Last Call Working Draft. It is recorded as issue 6776 [1] in the HTML Working Group's issue tracking system. The working group does not agree with you concerning requiring support a set of predefined character entities. The group feels that the set of required character entities has a small memory foot print when implemented as a data set. Furthermore, such a data set does not require that a printer read the DTD. Therefore, no change to the specification is planned in this regard. If you feel that this resolution of your comment is not acceptable, please respond to this message with your comments. Jim Bigelow Editor [1] http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6776;user=guest
REPLY 2:
From: Jim Bigelow <voyager-issues@mn.aptest.com> Henri Sivonen wrote: > The problem is that implementing such data set without reading the DTD > would mean that the parser would not be a XML processor as defined in > the XML spec. Using a modified parser would break one of XML's > benefits: the ability to use a ready-made off-the-shelf parser whose > functionality is well defined. An XHTML-Print processor is only required to deal with XHTML-Print documents > Also, having such almost-XML processors > around could cause interoperability problems, since different parsers > would have different idea of what the pre-defined entities were and, > therefore, what entity references rendered a document not well-formed. > The pre-defined entities that an XHTML-Print processor must support is well-defined. These entities are specified in the XHTML-Print specification in [1]. No other entities are part of XHTML-Print and users do not have a means to create new entities. Therefore, a confroming printer need only implement means to recognize the set of pre-defined entities and replace them with required Unicode code points. It is then up to the implementation of a conforming printer on how best to process the pre-defined set of entities. Some implementations have done this via a data table that is compiled into the code, thereby relieving the printer of the need to redundently access the same information from the DTD for each XHTML-Print document. However, the specification does not constrain how a confroming printer should provide support for the set of pre-defined entities. Jim Bigelow Editor [1] http://www.w3.org/TR/2003/WD-xhtml-print-20030729/#s_charentities
PROBLEM ID: 6777
STATE: Closed
RESOLUTION: Accept
USER POSITION: Agree
NOTES:
correct spec as indicated in issue
ORIGINAL MESSAGE:
From: Henri Sivonen <hsivonen@iki.fi> From: Henri Sivonen <hsivonen@iki.fi> To: www-html-editor@w3.org Subject: MIME type Application/Multiplexed not correct Date: Sun, 3 Aug 2003 22:01:47 +0300 Message-Id: <EE667E7F-C5E4-11D7-B77B-003065B8CF0E@iki.fi> X-Archived-At: http://www.w3.org/mid/EE667E7F-C5E4-11D7-B77B-003065B8CF0E@iki.fi B.2 MIME type Application/Multiplexed The heading and the following reference to RFC3391 should say Application/Vnd.pwg-multiplexed instead of Application/Multiplexed. [extracted from issue 6548] -- Henri Sivonen hsivonen@iki.fi http://www.iki.fi/hsivonen/
REPLY 1:
From: Jim Bigelow <voyager-issues@mn.aptest.com> Thank you for your comment on the XHTML-Print Last Call Working Draft. It is recorded as issue 6777 [1] in the HTML Working Group's issue tracking system. The working group made the change you suggested. Jim Bigelow Editor [1]http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6777;user=guest
PROBLEM ID: 6871
STATE: Closed
RESOLUTION: Accept
USER POSITION: Agree
NOTES:
defined image header
ORIGINAL MESSAGE:
From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com> From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com> To: www-html-editor@w3.org Cc: xp@pwg.org Subject: XHTML-Print: Appendix B.2.1 uses "image header" without defining it. Date: Thu, 4 Sep 2003 14:20:46 -0400 Message-ID: <020A3CF87FB5AC47AA67966B33845755050DB5AA@xboi22.boise.itc.hp.com> X-Archived-At: http://www.w3.org/mid/020A3CF87FB5AC47AA67966B33845755050DB5AA@xboi22.boise.itc.hp.com Appendix B.2.1 of the W3C Last Call Working Draft of XHTML-Print [1] uses the term "image's header" without defining it. We at Hewlett-Packard suggest that the term be defined as the everything from the beginning of the image up to and including the "start of scan marker." Jim Bigelow Hewlett-Packard Co. [1] http://www.w3.org/TR/2003/WD-xhtml-print-20030729/
REPLY 1:
From: Jim Bigelow <voyager-issues@mn.aptest.com> Don Wright wrote: Makes sense to me. ********************************************** Don Wright don@lexmark.com
REPLY 2:
From: Jim Bigelow <voyager-issues@mn.aptest.com> Thank you for your comment on the XHTML-Print Last Call Working Draft. It is recorded as issue 6871 [1] in the HTML Working Group's issue tracking system. The working group has elected to implement you suggestions. Jim Bigelow Editor [1] http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6871;user=guest
PROBLEM ID: 6778
STATE: Closed
RESOLUTION: Accept
USER POSITION: Agree
NOTES:
same issue as 6772
ORIGINAL MESSAGE:
From: ElliottBradshaw@oaktech.com [mailto:ElliottBradshaw@oaktech.com] From: ElliottBradshaw@oaktech.com [mailto:ElliottBradshaw@oaktech.com] Sent: Thursday, July 31, 2003 1:29 PM To: BIGELOW,JIM (HP-Boise,ex1) Cc: xp@pwg.org Subject: Required support for script, noscript, and hidden 2. Required support for script, noscript, and hidden. I don't mind this change, exactly. But (at the risk of re-opening a long debate) if the assumption is that an XHTML-Print client is generating data specifically in this language, then it should never generate these cases. So mandating support seems redundant. On the other hand, if the intent is to gracefully degrade when receiving data from other sources, then there are other issues (e.g. frames) that also come up. [extracted from issue 6536] ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534
REPLY 1:
From: Jim Bigelow <voyager-issues@mn.aptest.com> > From: ElliottBradshaw@oaktech.com [mailto:ElliottBradshaw@oaktech.com] > Sent: Thursday, July 31, 2003 1:29 PM > To: BIGELOW,JIM (HP-Boise,ex1) > Cc: xp@pwg.org > Subject: Required support for script, noscript, and hidden > > 2. Required support for script, noscript, and hidden. I don't mind this > change, exactly. But (at the risk of re-opening a long debate) if the > assumption is that an XHTML-Print client is generating data specifically in > this language, then it should never generate these cases. So mandating > support seems redundant. On the other hand, if the intent is to gracefully > degrade when receiving data from other sources, then there are other issues > (e.g. frames) that also come up. > Adding support for <noscript> allows a document author to use a single document and have the script execute when browsing and the content of the noscript element be displayed when printing. The PWG version of XHTML-Print specifically said that the content of the script element should not be printed (Section 1.3.1) however it doesn't indicate how a printer was to recognize the script element treat it differently than all other unknown elements. This change indicates how the printer should recognize and script, that the content should be discarded, and the alternate content in the noscript be printed. So, I think this change cleans up the intent already expressed in previous versions and does not open to larger issue of graceful degradation in the face of non-XHTML-Print documents. Jim.
REPLY 2:
From: Jim Bigelow <voyager-issues@mn.aptest.com> Thank you for your comment on the XHTML-Print Last Call Working Draft. It is recorded as issue 6778 [1] in the HTML Working Group's issue tracking system. The working group does not agree that support for the script implies support for document types other than XHTML-Print. Therefore, no changes to the specificaton are planned regarding this issue. If you feel that this resolution of your comment is not acceptable, please respond to this message with your comments. Jim Bigelow Editor [1]http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6778;user=guest
PROBLEM ID: 6779
STATE: Closed
RESOLUTION: Accept
USER POSITION: Agree
NOTES:
resolve as comment (albeit a nice one)
ORIGINAL MESSAGE:
From: ElliottBradshaw@oaktech.com [mailto:ElliottBradshaw@oaktech.com] From: ElliottBradshaw@oaktech.com [mailto:ElliottBradshaw@oaktech.com] Sent: Thursday, July 31, 2003 1:29 PM To: BIGELOW,JIM (HP-Boise,ex1) Cc: xp@pwg.org Subject: treatment of attributes 3. The new treatment for attributes is nice. [extracted from issue 6536] ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534
PROBLEM ID: 6780
STATE: Closed
RESOLUTION: Modify and Accept
USER POSITION: Agree
NOTES:
Printers must support W3C and PWG MIME Type and DTD. PWG versions deprecated.
ORIGINAL MESSAGE:
From: ElliottBradshaw@oaktech.com [mailto:ElliottBradshaw@oaktech.com] From: ElliottBradshaw@oaktech.com [mailto:ElliottBradshaw@oaktech.com] Sent: Thursday, July 31, 2003 1:29 PM To: BIGELOW,JIM (HP-Boise,ex1) Cc: xp@pwg.org Subject: change of MIME type to application/xhtml+xml not compatible with UPnP 4. Section 2.1, last paragraph. Changing the MIME type makes sense. But I assume that "application/xhtml+xml" could refer to other kinds of data besides XHTML-Print. In other words, the receiving side can't tell that this data is XHTML-Print. Unless he looks at the DOCTYPE...right? I'm wondering if this change will be a problem for protocols such as UPnP that use the MIME type to distinguish "document format" (in the Semantic Model sense) when advertising capabilities. For example, http://www.upnp.org/download/Service_print_v1_020808.pdf says "All UPnP printers MUST support at least the 'application/vnd.pwg-xhtml-print' document format[XHTML-PRINT] ..." This would have to change to something new, in a way that specifically refers to XHTML-Print. [extracted from issue 6536] ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534
FOLLOWUP 1:
From: elliott.bradshaw@zoran.com I am not sure that this resolution solves the problem. Protocols such as UPnP and Bluetooth need a unique MIME type to describe support for documents formatted as XHTML-Print. I agree tha the current type application/vnd.pwg-xhtml-print+xml should be migrated to something more official, which would require such protocols to make revisions that moves away from the deprecated name. But they still need a unique way to identify XHTML-Print. Perhaps those groups have come up with another way to solve this, but to me a unique MIME type would be the right way to go. Can the W3C register a new MIME type for this purpose? Best regards, Elliott -------------------------------------------------------------------------------- Elliott Bradshaw Director, Software Engineering Zoran Imaging Group (formerly Oak Technology Imaging Group) 781 638-7534 Jim Bigelow <voyager-issues@mn.a To: ElliottBradshaw@oaktech.com ptest.com> cc: Subject: Re: change of MIME type to 09/26/2003 06:24 PM application/xhtml+xml not compatible with UPnP (PR#6780) Thank you for your comment on the XHTML-Print Last Call Working Draft. It is recorded as issue 6780 [1] in the HTML Working Group's issue tracking system. The working group decided that the MIME type "application/vnd.pwg-xhtml-print+xml" must be recognized as referring to a conforming XHTML-Print document, along with the MIME Type "application/xhtml+xml". However, the "application/vnd.pwg-xhtml-print+xml" MIME type is deprecated in favor of the MIME Type "application/xhtml+xml. Future releases of this specification may remove the required support for the MIME type "application/vnd.pwg-xhtml-print+xml" If you feel that this resolution of your comment is not acceptable, please respond to this message with your comments. Jim Bigelow Editor [1] http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6780;user=guest
REPLY 1:
From: Jim Bigelow <voyager-issues@mn.aptest.com> > From: ElliottBradshaw@oaktech.com [mailto:ElliottBradshaw@oaktech.com] > Sent: Thursday, July 31, 2003 1:29 PM > To: BIGELOW,JIM (HP-Boise,ex1) > Cc: xp@pwg.org > Subject: change of MIME type to application/xhtml+xml not compatible with UPnP > > 4. Section 2.1, last paragraph. Changing the MIME type makes sense. But I > assume that "application/xhtml+xml" could refer to other kinds of data > besides XHTML-Print. In other words, the receiving side can't tell that > this data is XHTML-Print. Unless he looks at the DOCTYPE...right? > > I'm wondering if this change will be a problem for protocols such as UPnP > that use the MIME type to distinguish "document format" (in the Semantic > Model sense) when advertising capabilities. For example, > http://www.upnp.org/download/Service_print_v1_020808.pdf says > > "All UPnP printers MUST support at least the > 'application/vnd.pwg-xhtml-print' document format[XHTML-PRINT] ..." > > This would have to change to something new, in a way that specifically > refers to XHTML-Print. > Your point also holds for Bluetooth Basic Print Profile (v .95) (http://www.bluetooth.com/pdf/Basic_Printing_Profile_0_95a.pdf). I think that XHTML-Print must continue to support the MIME type of 'application/vnd.pwg-xhtml-print' and support for "application/xhtml+xml" should be optional. I'll argue for this during the working group review. -- Jim
REPLY 2:
From: Jim Bigelow <voyager-issues@mn.aptest.com> Thank you for your comment on the XHTML-Print Last Call Working Draft. It is recorded as issue 6780 [1] in the HTML Working Group's issue tracking system. The working group decided that the MIME type "application/vnd.pwg-xhtml-print+xml" must be recognized as referring to a conforming XHTML-Print document, along with the MIME Type "application/xhtml+xml". However, the "application/vnd.pwg-xhtml-print+xml" MIME type is deprecated in favor of the MIME Type "application/xhtml+xml. Future releases of this specification may remove the required support for the MIME type "application/vnd.pwg-xhtml-print+xml" If you feel that this resolution of your comment is not acceptable, please respond to this message with your comments. Jim Bigelow Editor [1] http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6780;user=guest
PROBLEM ID: 6815
STATE: Approved
RESOLUTION: Accept
USER POSITION: Agree
NOTES:
duplicate of 6774
ORIGINAL MESSAGE:
From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com> From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com> To: www-html-editor@w3.org Cc: xp@pwg.org Subject: Relaxing XHTML-Print's restriction to UTF-8 to include UTF-16 Date: Tue, 2 Sep 2003 20:42:14 -0400 Message-ID: <020A3CF87FB5AC47AA67966B3384575504D1D0AD@xboi22.boise.itc.hp.com> X-Archived-At: http://www.w3.org/mid/020A3CF87FB5AC47AA67966B3384575504D1D0AD@xboi22.boise.itc.hp.com > From: Henri Sivonen [mailto:hsivonen@iki.fi] ... > It is said that if a "charset" parameter is present for the > application/xhtml+xml MIME type, the only valid value is "utf-8". It > would make sense to allow "utf-16" as well. All XML processors are > required to support UTF-16 in addition to UTF-8, so allowing > UTF-16 for XHTML-Print doesn't cause any additional burden > to implementations. Also, the payload of > Application/Vnd.pwg-multiplexed chunks is defined > as octets, so UTF-16 strings can be delivered as > Application/Vnd.pwg-multiplexed chunks without any further encoding. > I tend to agree with Henri when he says that support UTF-16 would not be much more expensive than UTF-8. Does anyone on this list or the PWG's XHTML-Print list disagree? Jim
PROBLEM ID: 6781
STATE: Closed
RESOLUTION: Accept
USER POSITION: Agree
NOTES:
change spec to use wording in followup 1
ORIGINAL MESSAGE:
From: ElliottBradshaw@oaktech.com [mailto:ElliottBradshaw@oaktech.com] From: ElliottBradshaw@oaktech.com [mailto:ElliottBradshaw@oaktech.com] Sent: Thursday, July 31, 2003 1:29 PM To: BIGELOW,JIM (HP-Boise,ex1) Cc: xp@pwg.org Subject: Change to wording of Section 2.3.1, "Images" section, fourth bullet confusing 5. Section 2.3.1, "Images" section, fourth bullet. It used to say "Image data within the object element need not be supported." and now it says "A printer MAY choose to omit images referenced by a URI [RFC2396] containing a scheme name other than cid [RFC2392] and http [RFC2616] ." I'm confused. [extracted from issue 6536] ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534
FOLLOWUP 1:
From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com> From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com> To: www-html-editor@w3.org Cc: xp@pwg.org Subject: RE: XP> FW: Last call announcement for XHTML Print Date: Thu, 31 Jul 2003 13:53:00 -0700 Message-ID: <020A3CF87FB5AC47AA67966B3384575503C7DBE0@xboi22.boise.itc.hp.com> X-Archived-At: http://www.w3.org/mid/020A3CF87FB5AC47AA67966B3384575503C7DBE0@xboi22.boise.itc.hp.com Elliott, You wrote: > > I reviewed the public version and here are a few comments. > ... > > > 5. Section 2.3.1, "Images" section, fourth bullet. It used > to say "Image data within the object element need not be > supported." and now it says "A printer MAY choose to omit > images referenced by a URI [RFC2396] containing a scheme name > other than cid [RFC2392] and http [RFC2616] ." I'm confused. > The rewording is an attempt to say, in the positive, what URI types must be supported and by implication that support for the data URI is not required. Perhaps it should actually say that in the positive :-). For example, A printer must support images referenced by a URI [RFC2396] containing a scheme name cid [RFC2392] and http [RFC2616], support for other scheme names is optional. However, support for a URI containing the data scheme name [REF NEEDED] is not required unless the printer chooses to implement the method for supporting in-line data given in Appendix B.3. Jim
FOLLOWUP 2:
From: ElliottBradshaw@oaktech.com From: ElliottBradshaw@oaktech.com To: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com> Cc: owner-xp@pwg.org, www-html-editor@w3.org, xp@pwg.org Subject: RE: XP> FW: Last call announcement for XHTML Print Date: Fri, 1 Aug 2003 09:28:23 -0400 Message-ID: <OF13B3AA0D.ACFCD949-ON85256D75.0049B11B-85256D75.004A382E@ne.oaktech.com> X-Archived-At: http://www.w3.org/mid/OF13B3AA0D.ACFCD949-ON85256D75.0049B11B-85256D75.004A382E@ne.oaktech.com Jim, I see. Actually the current draft now makes sense to me, but your revision is better. E. ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534
REPLY 1:
From: Jim Bigelow <voyager-issues@mn.aptest.com> Thank you for your comment on the XHTML-Print Last Call Working Draft. It is recorded as issue 6781 [1] in the HTML Working Group's issue tracking system. The working group decided to change the wording of section 2.3.1 to, "A printer must support images referenced by a URI [RFC2396] containing a scheme name cid [RFC2392] and http [RFC2616], support for other scheme names is optional." If you feel that this resolution of your comment is not acceptable, please respond to this message with your comments. Jim Bigelow Editor [1] http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6781;user=guest
PROBLEM ID: 6783
STATE: Closed
RESOLUTION: Accept
USER POSITION: Agree
NOTES:
Remove RFC 219 keyword annotations from informative section -- Jim
ORIGINAL MESSAGE:
From: Susan Lesch [mailto:lesch@w3.org] These are minor editorial comments for your XHTML-Print Last Call Working Draft [1]. Kudos to the editor and your group(s). It looks great. In 4.3 I am not sure the RFC 2119 key word MUST makes sense in an informative section (it might). [extracted from 6899] [1] http://www.w3.org/TR/2003/WD-xhtml-print-20030729/ Best wishes for your project, -- Susan Lesch http://www.w3.org/People/Lesch/ mailto:lesch@w3.org tel:+1.858.483.4819 World Wide Web Consortium (W3C) http://www.w3.org/
FOLLOWUP 1:
From: Mail Delivery Subsystem <MAILER-DAEMON@hades.mn.aptest.com> This is a MIME-encapsulated message --h8R03Rb28151.1064621007/hades.mn.aptest.com The original message was received at Fri, 26 Sep 2003 19:03:27 -0500 from IDENT:iRSa5sMQNGkPhi4tk8I2cCBuLNNxhSgu@localhost [127.0.0.1] ----- The following addresses had permanent fatal errors ----- <[mailto:lesch@w3.org]> (reason: 550 Host unknown) ----- Transcript of session follows ----- 550 5.1.2 <[mailto:lesch@w3.org]>... Host unknown (Name server: w3.org]: host not found) --h8R03Rb28151.1064621007/hades.mn.aptest.com Content-Type: message/delivery-status Reporting-MTA: dns; hades.mn.aptest.com Received-From-MTA: DNS; localhost Arrival-Date: Fri, 26 Sep 2003 19:03:27 -0500 Final-Recipient: RFC822; [mailto:lesch@w3.org] Action: failed Status: 5.1.2 Remote-MTA: DNS; w3.org] Diagnostic-Code: SMTP; 550 Host unknown Last-Attempt-Date: Fri, 26 Sep 2003 19:03:27 -0500 --h8R03Rb28151.1064621007/hades.mn.aptest.com Content-Type: message/rfc822 Return-Path: <voyager-issues@mn.aptest.com> Received: from localhost (IDENT:iRSa5sMQNGkPhi4tk8I2cCBuLNNxhSgu@localhost [127.0.0.1]) by hades.mn.aptest.com (8.11.6/8.11.6) with ESMTP id h8R03Qb28147 for <[mailto:lesch@w3.org]>; Fri, 26 Sep 2003 19:03:27 -0500 Date: Fri, 26 Sep 2003 19:03:27 -0500 Message-Id: <200309270003.h8R03Qb28147@hades.mn.aptest.com> From: Jim Bigelow <voyager-issues@mn.aptest.com> To: lesch@w3.org] Subject: Re: RFC 2119 keyword in informative section (PR#6783) X-Loop: voyager-issues@mn.aptest.com Thank you for your comment on the XHTML-Print Last Call Working Draft. It is recorded as issue 6783 [1] in the HTML Working Group's issue tracking system. The working group has elected to implement you suggestions. Jim Bigelow Editor [1] http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6783;user=guest --h8R03Rb28151.1064621007/hades.mn.aptest.com--
REPLY 1:
From: Jim Bigelow <voyager-issues@mn.aptest.com> Thank you for your comment on the XHTML-Print Last Call Working Draft. It is recorded as issue 6783 [1] in the HTML Working Group's issue tracking system. The working group has elected to implement you suggestions. Jim Bigelow Editor [1] http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6783;user=guest
PROBLEM ID: 6784
STATE: Closed
RESOLUTION: Accept
USER POSITION: Agree
NOTES:
Change height and width -- Jim
ORIGINAL MESSAGE:
From: Susan Lesch [mailto:lesch@w3.org] These are minor editorial comments for your XHTML-Print Last Call Working Draft [1]. Kudos to the editor and your group(s). It looks great. Diagram 1 is squished to height="303" width="450". The image is really height="404" width="600". [extracted from 6899] [1] http://www.w3.org/TR/2003/WD-xhtml-print-20030729/ Best wishes for your project, -- Susan Lesch http://www.w3.org/People/Lesch/ mailto:lesch@w3.org tel:+1.858.483.4819 World Wide Web Consortium (W3C) http://www.w3.org/
FOLLOWUP 1:
From: Mail Delivery Subsystem <MAILER-DAEMON@hades.mn.aptest.com> This is a MIME-encapsulated message --h8R05Yb28182.1064621134/hades.mn.aptest.com The original message was received at Fri, 26 Sep 2003 19:05:33 -0500 from IDENT:Zc2NOPzouIqfs4RLF62cMWmR8FF39Hkw@localhost [127.0.0.1] ----- The following addresses had permanent fatal errors ----- <[mailto:lesch@w3.org]> (reason: 550 Host unknown) ----- Transcript of session follows ----- 550 5.1.2 <[mailto:lesch@w3.org]>... Host unknown (Name server: w3.org]: host not found) --h8R05Yb28182.1064621134/hades.mn.aptest.com Content-Type: message/delivery-status Reporting-MTA: dns; hades.mn.aptest.com Received-From-MTA: DNS; localhost Arrival-Date: Fri, 26 Sep 2003 19:05:33 -0500 Final-Recipient: RFC822; [mailto:lesch@w3.org] Action: failed Status: 5.1.2 Remote-MTA: DNS; w3.org] Diagnostic-Code: SMTP; 550 Host unknown Last-Attempt-Date: Fri, 26 Sep 2003 19:05:33 -0500 --h8R05Yb28182.1064621134/hades.mn.aptest.com Content-Type: message/rfc822 Return-Path: <voyager-issues@mn.aptest.com> Received: from localhost (IDENT:Zc2NOPzouIqfs4RLF62cMWmR8FF39Hkw@localhost [127.0.0.1]) by hades.mn.aptest.com (8.11.6/8.11.6) with ESMTP id h8R05Xb28180 for <[mailto:lesch@w3.org]>; Fri, 26 Sep 2003 19:05:33 -0500 Date: Fri, 26 Sep 2003 19:05:33 -0500 Message-Id: <200309270005.h8R05Xb28180@hades.mn.aptest.com> From: Jim Bigelow <voyager-issues@mn.aptest.com> To: lesch@w3.org] Subject: Re: Diagram 1 height & width not right (PR#6784) X-Loop: voyager-issues@mn.aptest.com Thank you for your comment on the XHTML-Print Last Call Working Draft. It is recorded as issue 6784 [1] in the HTML Working Group's issue tracking system. The working group has elected to implement you suggestions. Jim Bigelow Editor [1] http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6784;user=guest --h8R05Yb28182.1064621134/hades.mn.aptest.com--
REPLY 1:
From: Jim Bigelow <voyager-issues@mn.aptest.com> Thank you for your comment on the XHTML-Print Last Call Working Draft. It is recorded as issue 6784 [1] in the HTML Working Group's issue tracking system. The working group has elected to implement you suggestions. Jim Bigelow Editor [1] http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6784;user=guest
PROBLEM ID: 6785
STATE: Closed
RESOLUTION: Accept
USER POSITION: Agree
NOTES:
Make changes as noted -- Jim
ORIGINAL MESSAGE:
From: Susan Lesch [mailto:lesch@w3.org] These are minor editorial comments for your XHTML-Print Last Call Working Draft [1]. Kudos to the editor and your group(s). It looks great. It would help to have these spelled out in their first occurrence: EXIF (Exchangeable Image File Format) JFIF (JPEG File Interchange Format) TIFF (Tag Image File Format) IFD (image file directory) [extracted from 6899] [1] http://www.w3.org/TR/2003/WD-xhtml-print-20030729/ Best wishes for your project, -- Susan Lesch http://www.w3.org/People/Lesch/ mailto:lesch@w3.org tel:+1.858.483.4819 World Wide Web Consortium (W3C) http://www.w3.org/
FOLLOWUP 1:
From: Mail Delivery Subsystem <MAILER-DAEMON@hades.mn.aptest.com> This is a MIME-encapsulated message --h8TICTb11035.1064859149/hades.mn.aptest.com The original message was received at Mon, 29 Sep 2003 13:12:29 -0500 from IDENT:nwLuTDTVJCKK4JcFBo8cL2lTwc7ivWnu@localhost [127.0.0.1] ----- The following addresses had permanent fatal errors ----- <[mailto:lesch@w3.org]> (reason: 550 Host unknown) ----- Transcript of session follows ----- 550 5.1.2 <[mailto:lesch@w3.org]>... Host unknown (Name server: w3.org]: host not found) --h8TICTb11035.1064859149/hades.mn.aptest.com Content-Type: message/delivery-status Reporting-MTA: dns; hades.mn.aptest.com Received-From-MTA: DNS; localhost Arrival-Date: Mon, 29 Sep 2003 13:12:29 -0500 Final-Recipient: RFC822; [mailto:lesch@w3.org] Action: failed Status: 5.1.2 Remote-MTA: DNS; w3.org] Diagnostic-Code: SMTP; 550 Host unknown Last-Attempt-Date: Mon, 29 Sep 2003 13:12:29 -0500 --h8TICTb11035.1064859149/hades.mn.aptest.com Content-Type: message/rfc822 Return-Path: <voyager-issues@mn.aptest.com> Received: from localhost (IDENT:nwLuTDTVJCKK4JcFBo8cL2lTwc7ivWnu@localhost [127.0.0.1]) by hades.mn.aptest.com (8.11.6/8.11.6) with ESMTP id h8TICTb11033 for <[mailto:lesch@w3.org]>; Mon, 29 Sep 2003 13:12:29 -0500 Date: Mon, 29 Sep 2003 13:12:29 -0500 Message-Id: <200309291812.h8TICTb11033@hades.mn.aptest.com> From: Jim Bigelow <voyager-issues@mn.aptest.com> To: lesch@w3.org] Subject: Re: Spell out abbreviations at first occurance (PR#6785) X-Loop: voyager-issues@mn.aptest.com Thank you for your comment on the XHTML-Print Last Call Working Draft. It is recorded as issue 6785 [1] in the HTML Working Group's issue tracking system. The working group has elected to implement you suggestions. Jim Bigelow Editor [1] http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6815;user=guest --h8TICTb11035.1064859149/hades.mn.aptest.com--
REPLY 1:
From: Jim Bigelow <voyager-issues@mn.aptest.com> Thank you for your comment on the XHTML-Print Last Call Working Draft. It is recorded as issue 6785 [1] in the HTML Working Group's issue tracking system. The working group has elected to implement you suggestions. Jim Bigelow Editor [1] http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6815;user=guest
PROBLEM ID: 6786
STATE: Closed
RESOLUTION: Accept
USER POSITION: Agree
NOTES:
Make changes as noted. -- Jim
ORIGINAL MESSAGE:
From: Susan Lesch [mailto:lesch@w3.org] These are minor editorial comments for your XHTML-Print Last Call Working Draft [1]. Kudos to the editor and your group(s). It looks great. It may make sense to mark up elements and attributes globally <code>thus</code>, as they are in 1.3.1 and some other places (that eliminates the need for quotes in the 4.1 heading). [extracted from 6899] [1] http://www.w3.org/TR/2003/WD-xhtml-print-20030729/ Best wishes for your project, -- Susan Lesch http://www.w3.org/People/Lesch/ mailto:lesch@w3.org tel:+1.858.483.4819 World Wide Web Consortium (W3C) http://www.w3.org/
FOLLOWUP 1:
From: Mail Delivery Subsystem <MAILER-DAEMON@hades.mn.aptest.com> This is a MIME-encapsulated message --h8TKR9b11244.1064867229/hades.mn.aptest.com The original message was received at Mon, 29 Sep 2003 15:27:09 -0500 from IDENT:ILAoNWEh7kDvCjBr+yg3+PbhRj66PWGZ@localhost [127.0.0.1] ----- The following addresses had permanent fatal errors ----- <[mailto:lesch@w3.org]> (reason: 550 Host unknown) ----- Transcript of session follows ----- 550 5.1.2 <[mailto:lesch@w3.org]>... Host unknown (Name server: w3.org]: host not found) --h8TKR9b11244.1064867229/hades.mn.aptest.com Content-Type: message/delivery-status Reporting-MTA: dns; hades.mn.aptest.com Received-From-MTA: DNS; localhost Arrival-Date: Mon, 29 Sep 2003 15:27:09 -0500 Final-Recipient: RFC822; [mailto:lesch@w3.org] Action: failed Status: 5.1.2 Remote-MTA: DNS; w3.org] Diagnostic-Code: SMTP; 550 Host unknown Last-Attempt-Date: Mon, 29 Sep 2003 15:27:09 -0500 --h8TKR9b11244.1064867229/hades.mn.aptest.com Content-Type: message/rfc822 Return-Path: <voyager-issues@mn.aptest.com> Received: from localhost (IDENT:ILAoNWEh7kDvCjBr+yg3+PbhRj66PWGZ@localhost [127.0.0.1]) by hades.mn.aptest.com (8.11.6/8.11.6) with ESMTP id h8TKR9b11242 for <[mailto:lesch@w3.org]>; Mon, 29 Sep 2003 15:27:09 -0500 Date: Mon, 29 Sep 2003 15:27:09 -0500 Message-Id: <200309292027.h8TKR9b11242@hades.mn.aptest.com> From: Jim Bigelow <voyager-issues@mn.aptest.com> To: lesch@w3.org] Subject: Re: markup elements and attributes globally (PR#6786) X-Loop: voyager-issues@mn.aptest.com Thank you for your comment on the XHTML-Print Last Call Working Draft. It is recorded as issue 6786 [1] in the HTML Working Group's issue tracking system. The working group has elected to implement you suggestions. Jim Bigelow Editor [1] http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6786;user=guest --h8TKR9b11244.1064867229/hades.mn.aptest.com--
REPLY 1:
From: Jim Bigelow <voyager-issues@mn.aptest.com> Thank you for your comment on the XHTML-Print Last Call Working Draft. It is recorded as issue 6786 [1] in the HTML Working Group's issue tracking system. The working group has elected to implement you suggestions. Jim Bigelow Editor [1] http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6786;user=guest