This document is one of several submitted jointly by Microsoft and Hewlett-Packard. It has been prepared expressly for the W3C Workshop on High Quality Printing from the Web on April 25, 1996. Microsoft and Hewlett-Packard are committed to Web standards and are interested in assisting the W3C in exploring alternatives and setting those standards.
Internet Printing is a broad subject that touches on all aspects of the printing experience for devices attached to the internet or to an intranet. Falling under this large category are such diverse printing topics as Web pages, documents, Web enabled applications, User Agent features, printer selection, control, and status, server side printing, document interchange, and service bureau printing. Many print features are strictly the purview of the User Agent (UA). This document will concentrate on Internet trends and the other half of the quality printing equation: authoring considerations. It will discuss features and methodologies to be made available to the publishers of material that may ultimately be printed from the internet. A separate paper, provides more details on the HTML extensions proposed.
The Internet is evolving very rapidly. In its infancy, information was most often exchanged at the file level or via email, and Internet Printing was a matter of printing files or messages that had been exchanged among colleagues. Often these people knew each other and there was little problem in agreeing upon file formats and applications. In any case, the exchange of information rarely involved more than a few hundred users.
In the recent past, Internet Printing might mean the occasional printing of Web page information that was primarily intended for display. After all, the Web as playground and curiosity has a small need for printing. But, as the Internet matures, and especially with the increase in use of Web technology within companies commonly known as the intranet, there is a greater need for commerce oriented and reliable business printing.
In the near future, there are a number of trends that will have a major impact on Internet Printing. The growth in popularity of the World Wide Web has caused many producers of content to see it as the principal vehicle for conveyance of information to interested users. Traditionally, when content was available as printed matter, these producers were known as publishers, while their electronic content brethren were known as broadcasters. The Internet is now blurring this distinction of both the sources and formats of content for entertainment, education, literature, art, business, government, research, politics, and virtually every other area of human communication. Former publishers are seeing this new medium as a means of publishing material that was previously available only in printed form, while former broadcasters are examining print options that were never available before.
Some people, even experts in their fields, will confidently state that printing is dead or dying and that the Internet will most likely help to hasten its demise. It's easy to see how this idea came about, since the early emphasis on the Web has been on a type of access referred to as "browsing" or "surfing". These terms are rapidly falling out of favor, since the former term has been used to describe shoppers that look but don't buy, and the latter for riding a superficial and fast paced wave over a very deep ocean. Either implies that everything of value at a Web site can be instantly absorbed and requires little or no persistence. Visiting only a few sites will convince most people that this is not the case. The browsing and surfing metaphors are being discarded as people realize that the Internet is so vast, "surfing" can only yield a fraction of sites in a lifetime, and even then, only a fraction of those fraction will be of any interest. The new paradigm is search engines, intelligent user agents that seek out content of interest, and super home pages that organize the millions of Web pages into a more usable form. Web publishers want repeat customers, and while the flashiness of the page may attract users initially, it will be the depth and organization of content that will keep them returning. Such depth will increase the need for users to absorb information off-line, and reducing it to print will be the most common method, especially when families contend for use of the Internet resource. People trust and, to some degree, are enamored of print. Rationally or not, people still tend to commit their most critical documents to paper for safekeeping - and why not? No one has yet invented a cheaper, more reliable, more portable, more convenient, or more universally acceptable "reader" of information than the printed page.
One thing is clear from all of this - printing is very far from being "dead". Although it can be debated that the availability of information in display form will reduce the need to print that information, it is equally arguable that an abundance of content not previously available for printing, coupled with the rise in popularity of the Web, will help keep the demand for printing high. Regardless of what happens to the total number of pages printed, the total number of users of print from the Internet is and will continue to rise. These users are accustomed to the convenience of printing information for a number of reasons, from sharing with others to reading on the bus or over coffee.
Unfortunately, in the Rush to the Web, many software vendors, Web authors, and content providers have not given a lot of thought to the nuances of Internet Printing. We believe that this must be corrected immediately. The bar is very high for personal and business printing from the PC, and user expectations are correspondingly high for Internet Printing.
Printing from the Internet and traditional printing from the PC
are different in several important ways. Consider the following
|Traditional PC Printing||Web Printing|
|Original medium was paper.|
This is especially true for office applications such as word processing, spreadsheets, presentations, financial reports, etc. where the resultant output was always intended for paper.
|Original medium was the screen.|
The on screen representation for these apps is frequently just a convenient way of viewing a printed page, and, of course, this is where the concept of WYSIWYG was coined. There would be no WYSIWYG without printed output.
|HTML is content based, without regard to WYSIWYG.
In fact, it is highly unlikely that the output will look the same on any two given screens, much less any two printers. Content is king, format is unimportant.
|The user is often the originator of the document. S/he is concerned with the quality of output.||The document author is practically never the user. His or her goals are possibly at odds with those of the user. Many Web pages are poorly formatted even for the screen, indicating the author's lack of interest in the presentation as opposed to the content.|
|The user is generally familiar with the application that produces the output.||Most HTML users don't understand HTML at all.|
|The styles and capabilities of equipment used for printing and display is limited and probably understood by the user.||There is no such limit. The output must look good on hundreds of variations of hardware and software.|
|User may iteratively massage the output until it prints acceptably.||The user may not have the time, equipment, software, knowledge, or inclination to massage anything. The publisher will usually get only one chance to do it right before the user gets frustrated.|
|The user may have recourse to the author or software publisher in the case of a problem.||With the potential of millions of users of a page, the user has nobody to call if the print looks bad.|
|The output generally contains text, graphics, and photographs.||The output may contain many non-printable objects, or objects for which the printed representation is not well defined, such as sound and video clips, animation, etc.|
|The output achieved is generally the best that can be obtained on the given hardware.||Even if the author has considered printing, s/he must compromise the output since there are so many different types of printers available. Some are better at images than others, some faster, some color, etc.|
|The source file is generally available in case a mistake is found in the output at a later time. The output can almost always be reproduced.||The information presented may be fleeting, and difficult or impossible to reproduce at a subsequent viewing.|
|Print speed is generally not a huge problem. Jobs can print unattended.||Print speed and efficiency is much more critical since download speed must be added. On the Internet, the download time will frequently exceed the print time.|
So why does Internet Printing present any more of a challenge than standard printing? The short answer to this question is that Internet Printing is special because Internet Publishing is special.
As the World Wide Web portion of the Internet evolves, Web pages themselves are evolving. In the recent past, a Web page might be considered captivating if it included a good usage of color graphics to spice up the text. Printing in this environment was not too demanding, since the majority of Web Pages were mostly text oriented and the issue was content. When such a page was printed, it was usually satisfactory and nearly always at least readable. As technology and tools become available, we will see a migration from the mostly static pages of today, to more active pages that are designed to attract and keep a user's interest. Web pages will see an increasing use of multimedia, animation, video, and active objects that do not translate well (or at all) when rendered on paper.
We must consider, also, that there is a fundamental dichotomy of usage between hypertext and printed information. The things that hypertext can do best, such as linking to interesting or corroborative data and quickly jumping back and forth among complex anchors and bookmarks, are unavailable in printed media. In fact, the ideal hypertext page will leave much to be desired when printed. Attempts to simply print out all of the linked references will generally produce an unusable addendum to the main text. The level of control available to the author is very different in hypertext. In hypertext, a link can provide an interesting sidebar, but if it's too interesting, it may lure the reader away from the subject entirely. This may be desirable for educational texts, but is highly undesirable in commercial sites. Likewise, an external link may bring the reader into a confusing place in the author's train of thought. In short, hypertext can provide a very free and non-deterministic path through information whether the author likes it or not. In printed text, the author can pretty much control the user's passage through the text since papers, articles, and books are generally read from front to back, if not from beginning to end.
These two things, the printing of active pages and the fundamental difference of philosophy between hypertext and print, work together to break the WYSIWYG paradigm. While it can be argued that printing a static Web page is very similar to a screen dump with some re-formatting, the process of printing an active page (or a page rich in links) implies a conversion of the active content to static content, including the ability to do without hypertext links. Unless the author has given this concept special consideration, content, readability, meaning, and style can be lost in the translation.
Currently, when publishing data that needs to be viewed or printed, the author is left with sometimes unacceptable compromises. These produce a much higher level of frustration and dissatisfaction with the printing capabilities of User Agents. It behooves the industry to anticipate this problem and provide adequate means for authors to avoid it.
There are other factors at work that will increase the importance of Internet Printing. First, technology is constantly improving. As just one example, the advent of digital cameras and inexpensive, high quality color printing, will change the way that photographs are captured, processed, and shared. Second, the Web and its applications are maturing. There are many new applications coming available that either require or may be enhanced by printing. Third, the popularity of the Web means that millions of users may be accessing a given page on a variety of incompatible equipment, from monitors to operating systems. Now printers must be added to that list and authors must be educated to the nuances of printing from the Web.
The Internet is used for a variety of purposes. These include information and message exchange, entertainment, commerce, business communications, advertising, and publishing. The printing requirements for each of these varies widely, ranging from non-existent (printing from a game) to content only (e-mail without objects) to critical (publishing of catalogs or art, engineering and medical graphics).
The Web has always been a viewing and browsing medium. Although it is rare to find a Web page that will never need to be printed, there are a few good examples. Error pages, transitional pages, video feeds, sound pages, and certain intellectual property may qualify. In most cases, the author need not worry about printing because the user does not expect the page to print successfully. In the special case of art, photographs, and other intellectual property that are considered acceptable for display and not for printing, the author may want to indicate such by disabling print capabilities in the UA for that page only.
This will cover the vast majority of the Web pages published. Pages are generally designed with one of the common media in mind, so that if compromises must be reached, they favor either the display or the printed page. Even in non-obvious cases, users may desire a permanent copy of the display content, e.g. grab a frame of a video feed for printing. In extreme cases, it may be necessary to have two completely separate pages that are linked for the purpose of printing. The more a page moves from static to active, the more likely it is that the printed output will be unsatisfactory. Maintaining two forms of content is a lot more trouble for the Web author, but it can be easily justified in certain high quality cases.
The Internet has been a place for the interchange of documents for a long time. The Web also has this need and it will be particularly critical in intranet installations. This type of publishing is the antithesis of Display Only Web page design. These pages were always intended to be printed, look best on paper, and any screen viewing is expected to conform to the WYSIWYG paradigm. They generally must be displayed by the same application as that with which they were originally produced.
This type of printing is the farthest removed from what is generally meant by the term publishing. It includes special output from an application that may have nothing to do with the current (or any) displayed information. This type of printing has not come into full swing on the Internet yet, but is extremely common in intranet environments. It will become very important as commerce on the Web proliferates and applications, applets, or active objects are downloaded from a server to be executed on the client. Application printing often has nothing to do with HTML or Web pages, although links or controls that invoke the application will be found on Web pages. It can offer precise control over page layout and print finishing options, and thus will never be completely supplanted by HTML
We believe that a single printing solution is not available that can meet the diverse needs of the Internet. In particular, HTML is not seen as the solution to all printing needs regardless of the number of extensions devised. For HTML to do so, it would have to evolve into something that it is, at least in many people's minds, a contradiction in terms: both a content and a layout language. Rather, a palette of solutions is preferred, allowing the document author to choose the best technology for his application.
This is commonly referred to as browser printing and is the process of printing HTML formatted Web pages. The tools and methodologies will largely depend on the author's intent and where the document fits in the Internet Printing spectrum defined above. Below we discuss the methodologies for HTML. A separate document outlines the proposed extensions to HTML or CSS.
Today, most Web pages are intended for screen viewing and generally authors have given very little thought to how these pages will appear when printed. In such cases, the UA will employ heuristics or user supplied options to do the best job possible when the page is printed. This type of page may or may not be satisfactory when printed, and printed output is definitely considered unimportant by the author.
These pages are designed to be displayed and printed with nearly equal quality. The content is maintained only once, but very different styles may be applied depending on the output medium. For example, a different font may appear on the printed page, margins may be different sizes, or colors may be used on the screen but a different style of emphasis may be used on a monochrome printer. We recommend the use of media specific Cascading Style Sheets, as described below. The printing of HTML pages will ultimately be limited by HTML itself. Since it is a content oriented language, exact control of a printer and the printed page is beyond its scope. However, HTML pages, augmented by CSS2, will solve the majority of Web printing needs.
In some cases, the merging of formats using a common source base will not be practical. The simple fact is that the more a page has been optimized for screen display or printing, the less likely it will be acceptable in the other medium. This will reach its zenith in pages that are extremely active and will require more conversion to a static page than formatting alone. Compromise on these pages is generally not possible, so a separate HTML document must be maintained that is designed especially for printing. Today, this is analogous to certain sites that have a text only version of the page available, where the user can click on a special link. The problem is that this whole concept is generally too confusing for the user to be allowed to decide about the printed output. In this case, a special link would be indicated so that a UA could automatically retrieve a special page and route it to the printer when the user clicks on the print button or selects print from a UA menu.
This method is superior to merging when there is a substantial difference in the formatting style for different forms of output. When merged, the entire page must be downloaded, even though the user may never elect to print. When a separate link is used, that page is retrieved only if a user decides to print, and any special images or fonts are only loaded at that time.
This type of publishing has value in database retrieval examples where the HTML is generated on the fly and has been formatted for the screen. If a user desires to print such a page, only then will the printed version of the HTML be generated. An added advantage is that if the page is being printed on an intranet printer connected to the network instead of a printer locally connected to a PC, the server may be intelligent enough to print it directly so that the page is never brought into the client. This will reduce overall network traffic and reduce printing time. This same benefit accrues to Prepared Documents as described below.
This is very similar to the special HTML page designed for printing, except that the format of the document is not restricted to HTML. It is in the category of HTML printing, because it is invoked from an HTML page and is an alternative method of printing the content of a Web page. This is useful in several cases:
Redirection to a special non-HTML version of the document generally requires the application that originally produced the document. This may be the full scale application such as Word or Acrobat, or helper or viewing derivatives that are distributed free of charge and have, except for display and printing, very limited functionality. Like redirection to a separate HTML page, it may be printed directly from the server in an intranet situation.
The author may wish to provide a print capability for an active page - perhaps a page displaying transient data like stock quotations, account balances, or the contents of a virtual whiteboard. This data may be retrieved, generated, or otherwise managed by an active control or object embedded in the HTML page. In such cases, simply freezing and doing a "screen dump" of the displayed information at the browser level is generally unsatisfactory. For example, a scrolling ticker or marquee may need to be frozen and centered. A captured video frame needs to be complete and not caught in the middle of a new scan. A colorful control may look bad on a monochrome printer. These are examples of situations where the printed version of the control may have different characteristics than the displayed version.
Microsoft will define a standard object interface that will allow control designers to interact to a greater extent with the container to control printing. Among other things, controls written to this specification, regardless of their source or language, will be able to:
In addition, a printing style may be associated with certain objects that don't present this interface. It might include such options as not printing the object, printing only in outline, printing as on the screen, replace with a URI, outline with URI, etc.
The second major type of printing is output from an application, and we propose this method when precise control of the output format, printer, or media selection is required. This category overlaps with some of the HTML categories already mentioned, since the application may be invoked by HTML. They include:
In consideration of the trends, uses, and methodologies as described above, Microsoft and Hewlett Packard believe that a number of technology areas need clarification or standardization. In this section, as in the previous sections, we are primarily concerned with intent and direction rather than the specifics of syntax.
Microsoft and Hewlett Packard are convinced that certain extensions to HTML are required in order to achieve quality printing for the Internet. In general, we would like to see as many of the formatting issues as possible be handled by Cascading Style Sheets, and that the number of HTML tags added be held to an absolute minimum. A number of extensions suggestions are handled in the separate document. Microsoft and Hewlett-Packard are committed to Web standards and are not interested in promoting proprietary HTML tags.
Apply a style to HTML elements based on the intended medium. For example, a page that is to be printed could be formatted with a different font or margins than a page that is to be displayed on a graphics screen. This could apply to all formatting styles, but some would not make much sense or may not be different across mediums. For example, using this technique, the same body could be used for either screen or paper, but the styles could be different. See the discussions above for a discourse on breaking the WYSIWYG paradigm.
The different media styles must coexist in the same style sheet, so a method for their selection must be introduced. For backwards compatibility, the default medium must be the screen and it is implied for all elements that are not otherwise specified. The page may use both styles simultaneously as would be the case where a page is being displayed and the user then elects to print. The existing screen format should not be altered, and a separate thread or process should reformat the page based on the paper medium.
The concept of a conditional test for medium is the minimal route. For example, if paper use style B, else use style A. This method would avoid a conflict since a given element could only have a single style in effect at any given time.
A different method would involve the explicit paper attribute to be applied to certain styles, while the screen attribute would be implied for all other styles. Many elements may have the same style for either medium, so the distinction need only be made when there is a conflict where the same element is given two (or more) styles. This implies a hierarchy of media terms, with the highest priority going to the current medium when it is specified, and all non-specified elements also applying to the current medium.
The richest method of selection would involve a more complex conditional mechanism. Environment conditions could be compared to standard values in a series of relational tests with the result being either true or false. This condition could be tested in an if - then - else construct and the proper styles assigned accordingly. To improve performance and readability, the result could be assigned to a temporary variable that could be used in subsequent tests. This mechanism may have other uses in style sheets or HTML.
Examples of environment variables and their values:
Device: graphics screen | graphics printer | character screen | character printer
Color: monochrome | gray | low color | medium color | high color
Resolution: high | medium | low
Paper Type: standard | glossy | transparency
Paper Size: standard | letter | A4 | legal
Audio: none | low | medium | high
Video: none | low | medium | high
One style that could be specified is "no output" for the specified medium. For example, the control portion of a page containing back and forward buttons could be skipped if the page is being printed.
The ultimate conditional style specification would involve the selection of an entirely different style based on the environment. For example, one for screen display, and one for printing. This could be accomplished with the Print Only and Display only Tags, with the proper style sheet specified for each area. The disadvantage of this is that it doesn't scale well to other, as yet undefined, media.
The ability to provide color consistency across a variety of monitors and printers is a requirement for many types of commerce printing. Microsoft and Hewlett Packard are proposing a method of assigning a standard or custom color space to individual elements through style sheets so that colors are consistently displayed on all Web screens, printers, or other devices that have been calibrated to this standard.
A standard color space would be defined for the Web so that most images or other elements would have it available and the tag would only imply that this item should be rendered using this standard space. Custom spaces would be loaded from a URL so that a mixture of color spaces could appear on one page. For performance purposes, the standard color space would be permanently installed on a user system and the custom spaces would be cached.
The details of this proposal can be found in a separate document.
When an author designs a page for Web publication using HTML, s/he is unable to predict a number of things about the output device. For display, it is unknown what the limits of the user device are, and how the user has sized the display screen. Much consideration has been given to the concept of automatically sizing a window for best viewing of the page. There are many good arguments for and against, but the nays have it for the time being, sighting the fact that multiple windows may be open on a user's screen and that s/he must maintain control of the browser window to effectively manage her desktop. Perhaps the compromise will be a suggested window size that the user may elect by clicking on a toolbar button.
The situation for print is similar, but not nearly as unpredictable. Almost everyone in the world will try to print the page on one of four paper sizes. In most cases, it even reduces further to either Letter or A4. HTML purists will insist that the output should flow depending on the paper size, and they would be right. However, a simple standard will allow a Web author to design the printing style of a page so that it looks the same on 99% of all printers in use today.
We propose that a new paper size be promoted as Web Standard Paper. The maximum top, bottom, right, and left printing margins will be chosen to produce an assured printing area that will work on 99% of page printers, whether they are printing on Letter size or A4 size paper.
The UA will interpret this Web Standard size as being appropriate to print on either Letter or A4 paper. A local configuration option will determine what size of paper to use should both be loaded into the target printer. Likewise, whether the image should be centered or printed left justified on Letter paper can be selected by the user.
In addition, we propose that other standard paper sizes may be indicated as "suggested" by the author for printing styles. Only one suggestion may be indicated per style or page.
One of the methods of Internet Printing involves third party or service bureau printing. Many users may not have available the printer that is best for the job, e.g. they have a laser printer but desire to print in color., or perhaps they would like to print a large quantity of a document. In this situation, a user may elect to send a file to a copy or printing center for reproduction. This will be particularly popular with students, small businesses, and business travelers. Interestingly, these requirements will be almost identical to intranet printing situations where a centralized print or copy room is available.
It is highly unlikely that a service bureau will allow the attachment of a printer by even a well known user. Most likely, these print requests and documents will be placed into some sort of centralized queue. In such cases, a number of properties must accompany the document for successful printing. Some are the same as, or extensions of, the printer properties that the author found best for printing, and some of these may be embedded in the document depending on the format. In some cases, the embedded information will need to be overridden because it doesn't make sense for the target printer, e.g. selecting the paper from a certain tray.
Microsoft will propose a standard set of document properties designed to streamline the process of third party printing. These will accompany a document file in either a standard message or a unique file format. Some properties will be optional and some required. They will include rendering, font, finishing, delivery, and accounting properties, as well as originating application, target printer family, etc. Any documents or document processing facility that conform to this standard will be able to produce consistent quality output. Some of these properties may conform to the ISO 10175 DPA spec, but it will not go far enough without augmentation.
Printing is one area in which people can most often see an immediate benefit to content negotiation. This is most often found in image resolution, because screen resolution is usually much lower than today's printer resolution.
A simple method of obtaining a different content could be to use the proposed conditional tags PrintOnly or DisplayOnly to specify the URI of an image of the best resolution for print and display. But this is very crude and doesn't answer some more complex issues, such as obtaining the best image for my 600 dpi color laser as compared to my 300 dpi monochrome ink jet.
Such a situation calls for a more complex mechanism, where a reference to a particular URI might cause the server and the UA to negotiate for the best format or image available. Since this will of necessity require that the UA know and understand the environment available, one of two methods is possible: either the UA is configured for a certain behavior, or a CSS is used to indicate the negotiation behavior. The latter seems preferable since it allows for more flexibility for pages to behave differently depending on their embedded or referenced style.
In either case, a few thorny issues will arise in situations where the UA and the server both have options. For example, if the UA has more than one printer available and the server has more than one image available, which will have precedence. How will the user know that a high quality color image is available for printing if s/he currently has the monochrome printer selected? Might the user indicate a preferred printer, or allow the UA to select the printer based on the image formats available? In any case, the ability to negotiate needs to be backward compatible with UAs and servers that don't support the capability.
In many cases, a site consists of more than a single page that the user may want printed. This is true regardless of the method that the author has chosen for providing print quality, i.e. merged styles, separate HTML, or separate formatted documents. The user would like to have a simple method of printing an entire site or a logical subset of that site. Much of this feature can be accomplished by the User Agent and is outside the scope of this document.
But consider that in complex sites, there may be many pages that are provided for navigational purposes with little content of interest for printing. Which pages are of interest is very difficult for the user or the UA to determine. Even if the user has been to every page and verified that they want it printed, and that is a very big if, simply going back through and selecting the proper pages and hitting the print button in the proper order is quite onerous. A site navigational map may not be able to indicate the order in which pages should be printed and prevent the user from getting duplicate information or missing critical pieces. This problem is compounded by the other ideas for alternate or negotiated print suggested earlier in this document.
The site author is familiar with all of these things: interest, order, completeness, alternate options, etc. and needs a method to communicate these to the user or the User Agent. We propose that this be through a special HTML tag that can reference a Print Collection. Probably the most effective method for the UA to deal with this is on the toolbar. A button adjacent to the standard page print button could be available or "grayed out" depending on the presence of this special information. When more than one collection is available for the site, the user will be given a choice. This will make it quite easy for a user to print the entire site, or perhaps one of several available catalogs.