Workshop Report

The W3C Workshop in Paris was the third in a series of industry consultation events held by W3C. The goal of this Workshop was first to identify difficulties faced by existing professional publishing organizations in using tools based on the Open Web platform, including the production of printed books, and second to find ways to work on eliminating or ameliorating those difficulties.

The Workshop focus was on professional publishing, whether academic, non-profit or commercial, and in particular on mediated publishing rather than self-publishing.

We received approximately fifty statements of interest and position papers for the two-day event, and approximately sixty people attended the Workshop. The event lasted two days; rather than have as many position papers presented as possible we tried to favor open discussion.

Detailed minutes of the Workshop are linked from the references page along with slides, position papers, and references to the minutes of the meeting.

[picture: people talk during a coffee break; in the background through the Window you can see France.]

Workflows

Workflows appeared to fall into one of several categories, driven by a need to produce ebooks, print, and, in some cases, Web sites:

  1. XML First, in which the authors supply XML;

  2. XML Early, in which the authors and copy editors work with Microsoft Word and its revision tracking before conversion to XML;

  3. XML Late, in which paginated PDF (or InDesign documents) are sent out of house and converted to XML;

  4. XML Never, or the “traditional” workflow in which Quark or InDesign is used to produce PDF for print, and electronic books are not created, or are static PDF copies of the printed books;

  5. Variations, such as starting with Markdown and converting that to XML in-house.

Some publishers are using XHTML 1.1, XHTML 5, or their own profiles of XHTML, rather than either a non-HTML XML-based language or HTML. This allows restrictions on what is accepted from author, enables the use of XML tools such as XSLT engines and XML databases, and yet can still be rendered directly in a Web browser.

Barriers

The participants identified areas where there are difficulties in using the Open Web Platform to do commercial, mediated publishing.

Some particular areas were discussed as needing attention within the purview of the World Wide Web Consortium, possibly in conjunction with other organizations. These areas can be seen in more detail in the minutes, but included:

  1. Authoring and Editing on the Web: Although it is possible for an author to write a novel in a Web browser text box, the available revision and change tracking mechanisms do not compete with Microsoft Word for copy editing and proofreading.

  2. Formatting for paged media: even where XHTML was used, limitations in the design and implementation of Cascading Style Sheets (CSS) caused difficulties both for print and for ebooks. Additional difficulties in the area of control over line-breaking and hyphenation were noted, as well as control over printing, for example conveying author and user intent for print production.

  3. Page proofs: even if third-party tools, such as Antenna House Formatter and Prince, can produce print-quality PDF, there is still a need to view and annotate page proofs.

  4. A lack of knowledge and understanding about the Open Web Platform within the publishing industry was seen as a problem, especially as the new technology becomes increasingly central to publishing.

Resolution

The purpose of the Workshop was not to answer all questions but rather to explore ways forward towards eliminating the barriers so that the Open Web Platform would become well-suited to the needs of publishers and the wider publishing communities.

A number of actions were identified:

  1. The Digital Publishing Interest Group within the Digital Publishing Activity at W3C has established a (Member-only) task force with the goal of describing requirements for pagination using CSS, and to provide use cases for page media; Dave Cramer of Hachette will lead that task force.

    At the same time, a paged media layout task force has already been proposed within the CSS Working Group; that task force would focus on technological solutions to the requirements.

  2. An Education and Outreach Group may be started within the W3C Digital Publishing Activity. This group may have the role of “match-making” between publishers and technologists, providing publisher-specific training and resources, and helping to show how the various components fit together at a high level. This may also be where discussion may continue about the extent to which the barriers and difficulties are due to limitations in the Open Web Platform itself or come from other sources.

  3. A one-day meeting between the CSS Working Group and the Digital Publishing Interest Group was suggested. Although the W3C Technical Plenary and Advisory Committee Meeting (TPAC) is intended for liaison between groups, this event might be better held (the CSS chairs suggested) at a CSS Working Group meeting. It may come out of the two task forces on page layout.

  4. The W3C should also reach out to archivists, librarians and other curatorial experts, for example for change tracking and for character set encodings, and should also investigate further the relationship between W3C and academic publishing research such as DocEng.

  5. Liaisons should be established where needed with other standards bodies and industry organizations, and contact maintained in order to avoid duplication of effort or needless competition.

Conclusion

The Open Web Platform is being used today by forward-looking publishers, as well as by newcomers from outside the traditional world of publishing. There is much work to be done to facilitate the production of useful, functional books. There is also much exploration to be done around the boundaries of what constitutes a book, where a book ends and a Web site begins, and even the very role of the publisher. From a technological viewpoint similarities and differences between electronic books and Web sites, between print and online, are forcing a reevaluation of metadata, of distribution models, and of how publishers interact with their audience and markets.

The Workshop in Paris was a beginning; future work within W3C and elsewhere will continue as the Web evolves. If the World Wide Web Consortium is the organization responsible for the Web, leading the Web to its full potential, then it is the responsibility of the W3C and all involved in the Web to make sure that no-one is left behind, that the new world still has room for people and for ideas, and a responsibility for us all to work together to keep technology rooted in culture, in society, in learning and in education.

We invite all organizations involved in publishing to work with us at W3C in a vendor-neutral environment to develop the technologies for the future.

[Montage image: the Bastille monument, paris; several photographs of the people at the Workshop, uncaptioned]