W3C

Publishing and the Open Web Platform" workshop day 1

16 Sep 2013

Agenda

See also: IRC log

Attendees

Chair
Liam Quin, Peter Linss
Scribes
fsasaki, Bert, Karen, Em, Jirka

Contents


introduction

<fsasaki> liam: thank you every one for coming, Peter Lins and I will co-chair today

<fsasaki> liam going through logistics

<Fil> hi

<glazou> ScribeNick glazou

<fsasaki> scribe: fsasaki

<glazou> fsasaki: ok

liam: anybody who has not been to a w3c meeting before?

(many hands going up)

scribe: great. one aspect of w3c workshops: they are industry consultation events
... we get together to hammer out a solution to a problem - we (W3C) are listening to you
... we have arranged some speakers, but please: interrupt us. Our goal is to figure out the answer to a question:
... what do you need to do commercial publishing using the open web platform?
... the only way we can fix the web is if you tell us what's broken
... would be a great outcome to hear "good what you can already do today"
... another great outcome would be: hear what should be done for the future
... any questions so far?

hash tag is #owpworkflow

the future of publishing (presentation from Liam Quin)

liam: citing "The printed word is not dead - it only looks that way because it doesn't move" (2001)
... Frank Romano, at Seybold 2001
... the open web platform
... see also http://www.webplatform.org/ site for more info. A collection of technologies and standards from W3C, IETF, ECMA, Unicode and others
... name is coming from Jeff Jaffe (W3C Ceo)
... includes HTML, CSS, SVG, MathML, JavaScript, ...
... computer software is moving to the Web
... generic desktop computers are losing groupd. the owp can do graphics, types etc.
... wordpress in the new pagemaker
... so what is publishing? has anyone not read a physical book in the last year?

(2 hands going up)

liam: by "publishing" at this workshop I mean professional publishing
... including words that are digital or not
... "self publishing" is not included in this - of course this is important too, but not in scope here
... one aspect for digital publishing is: how to convey that a document can be trusted?
... major issue is the incompatibility of reader devices across platforms

(many hands go up showing that this is really a problem)

liam: use of draft or unstable features are another problem for implementations
... same for formatting limitations

"CSS does not - eBook readers don't" - comment from the audience

<Dave_Cramer> Some of the "new features" in CSS that don't work reliably across all ebook reading systems include margin-top :)

liam listing gaps - in addition to the above metadata fragmentation, prepress and finishing not handled

liam showing a formatting example

liam: things hard to do today at the web: footnotes
... hyphenation, multiple streams of footnotes, crop / bleed / spreads / binding / finishing
... respond to needs of various distribution and marketing channels
... topic of metadata
... libraries are asked: what books do I have?
... but now also: who wrote the webpage?
... we don't have marc records on the web
... we have proprietary metadata that publishers have to use
... but different book publishing channels use different metadata
... so metadata is an important topic, but we are likely to dive into that at a separate future workshop
... so how can we address these issues?

emily: went to a conference for independent publishers

<glazou> fsasaki: ask Liam?-)

emily: publishers are worried about independent publishers
... anybody can do it

<astearns> emily form corvus?

indeed: )

liam: publishers are in the business of curation

<astearns> just what I heard - probably slightly wrong on name and company

liam: values that a publisher brings is: brand, ...

<newt> 'emily from corbas'

liam: publishing is about curating, finding authors, quality, ...

adam: I see publishing as a business model
... we are trying to work out how to produce knowledge and cultural items
... even then we talk about authors this is very "loaded"

<Dave_Cramer> Even some NY Times Bestsellers come from publishers other than the big 5.

adam: this is not the only way to produce quality
... every time a word like publishing, author, book comes up there is a special auro about that
... for me these are legacy terms we need to challenge a bit

liam: there is two conversations: what is publishing vs. what is a book
... what is publishing will come up a lot
... don't think it is productive for publishers to think that they are in the printing business
... now about w3c in general
... a technical standards organization responsible for the world wide web
... we are trying to fix problems in the web
... we are a member funded consortium
... companies pay to join and then can participate in working groups
... w3c is about community and conversations
... a group of related technical work items and interconnected working groups
... like XML, TV on the Web, publishing, ...
... some areas are cross area relevant, like internationalization or accessibility
... example of internationalization needs to add markup to strings e.g. for ruby annotation
... or accessibiltiy needs for people with various disabilities
... publishing activity lead is Ivan Herman
... we have liaisons with the IDPF, Markus Gylling
... others are coming up

<Fil> (a slide in small red type, about accessibility (just sayin'))

liam: membership includes publishers, tool vendords, users etc.
... involvement of users for creating specs is important
... we are doing a series of workshops on publishing
... this one is focusing on workflow issues, from authors to producing physical or eBooks
... the idea that you can use the same file to produce an eBook on the web or print
... we are to identify the needs and barriers for the workflow topic
... aim of the workshop is not to solve the issues. but rather to understand how to solve things and what are the right people to solve them
... the w3c process says: a spec will not become a final standard ("Recommendation") unless all comments are addressed
... our process ensures consensus and produces open standards
... = a royalty free platform
... reaching also out to specific communities
... what you can do? join w3c, join the digital publishing interest group - if you don't join w3c then you can make comments from the outside
... some things that many things in publishing have in common
... most of technical innovations in publishing come from a meeting:
... between someone who has a need (a publisher) and a technologist

liam telling the story of gutenberg inventing book printing - showing the value of collaboration between publishing needs and technologists

liam: technologists have to listen to publishers - and publishers should say "it would be cool if ..."
... domain experts need to communicate

speaker from univ. of redding

aaa: would be good to have concrete ideas about what is not written in standards, but what is best practices
... people that have a position in publishing houses
... there is a lot of work todo to translate things from print publishing world into other areas

<glazou> speaker was Gerry Leonidas, University of Reading

liam: in w3c we have a document about japanese layout requirements
... there are no requirements documents yet for other languages

[info about that: such documents are currently being created]

liam: an issue was that for some time that knowledge was not known

<inserted> Audrain (Hachette Livre): 99% of content is made offline.

liam: workflow in publishing in an ideal world: there is one file, converted to PDF, sent to the printing company, and then to eBook people to make an eBook

<glazou> speaker was Luc Audrain, Hachette Livre

liam: you don't want editorial changes to happen in either of the publications
... people are using XML to create eBooks
... 40% of publishers are using XML in some point in their workflow
... if you take your wordfile and convert it to XML
... you have done proofreading etc.
... that is one workflow
... "XML late" means: you ahve done pagination etc.
... and then convert to XML. That is not so good but it saves money in the short term

liam: XML late people would like to do XML early

bill: about "unspoken rules" and cultural knowledge
... when we look at chapter of a book, titles, different type sizes
... we know by convention of typography what things are
... behind these are markup
... we think that the presentation is the structure, but it is not
... there is nothing "given" in this
... another example: in the mid 90s there was a workshop about web based publishing
... the web was brand new
... one student had a publication with certain elements in blue and underscored
... the convention that these are links was not stable yet
... so anybody that already had used the web were not used to that
... now reading the new york times digital today: links are in grey, important terms are in blue (but without underlining)
... these examples show that these conventions evolve other time

presentation from Adam Witwer, O'Reilly Media

<glazou> so why isn't O'Reilly a member of the CSS WG?

adam: my main area is tools and software dev side
... a story in four chapters: 1 escape from framemaker 2 down the cascade 3 dawn of HTMLBook 4 atlas never shrugs
... approach that I describe is just one

adam: framemaker is a tool for creating content like indesign
... docbook xml is a standard for technical documentation
... today it is version 5.1 - we are using 5.0 internally
... this was in 2006, before anything on the digital book side was happening
... in 2007 ePub became an IDPF standard - so above was a pre eBook standard
... so why did we move from framemaker to docbook?
... we had safari books online - 25% of our authors said I will write my book in docbook
... so we took docbook into framemaker and tried to work with it - that was insane
... also books needed heavy unicode support - framemaker at that time was not good at that, XML is
... so our workflow then was DocBook XML > XLF-FO > PDF. The step fro XML to FO used XSLT
... XSL-FO is like HTML and CSS put together. very difficult to read - more for machines to process
... the step FO > PDF used antenna house (just one choice possible here)
... what did we learn? First, book is not equal to PDF
... PDF was just one representation of the XML
... this lead to the single source publishing model: from one set of markup we produce many outputs. This was around 2007
... from XML content to Safari books online, PDF, ePub, ...
... single source publishing was very succcessful
... cost saving was great
... we were able to pull this off since we had a standard and technical authors already using that
... so we had "XML early, XML first"
... we had authors writing in wrod etc. - we converted that to XML docbook
... now chapter 2 - a switch from XSL-FO to CSS for page layout
... reason was various events: ePub 3 came up, O'Reilly loses lead XSL-FO developer.
... docbook had xslt stylesheets to create ePub output
... and antenna house 6 supported CSS
... also, "by accident" we got HTML5 (as ePub3 format)
... all above "events" happened in a few monhts
... so we started doing workflows like: XML > XSLT > HTML5+CSS > PDF (via Antenna House)
... this was hugely successful in the production group
... it lifted the bail for many people to understand the production process
... our first XSL-FO workflow as about "doing things as good as it gets"
... I did not like the idea that PDF is not important and will go away
... we rather took that moment to make things better
... if you compare our PDF produced today to 3 years ago, the difference is amazing, e.g. looking at font usage etc.
... we use CSS modules like paged media, generated content, text, fonts
... paged media is relying no the box model
... there is the content and in the edge other regions
... you can select a right page, bottom right (which is nested), then a generated piece of content
... including page number, font settings etc.

glazou: a problem with this presentaiton and the modules you are mentioning
... the modules are not even stable working drafts
... o'reilly is not member of the CSS working group and you are the heaviest user of the modules
... without your input & help, we will not make it
... these things are used all other the place
... the things are not "ready to use". There are many problems about the features that you described
... we all know these things exist - but please know that they are unstable
... they are implemented by some vendors - antenna house and prince
... but from a CSS perspective they don't exist

ivan: saying the same in a more positive manner: please come to the WG to do the work!

adam: I searched for vendor extensions and there are not many in our style sheets

adam showing examples of vendor extensions with -ah prefix from typography and images

adam: image placement is a mess in CSS spec but also in the tool(s) - something which we struggle with
... pages are reflowable - on the web the approach is different than in traditional publishing

<liam> [Liam's presentation, for the record, http://www.w3.org/2013/Talks/09-quin-publishing-workshop/ ]

adam: bnefits of CSS over xsl-fo:
... "democratization" of style sheet dev
... removes "programmer" from between designer and page
... development was faster for CSS
... benefits of CSS over traditional page layout:
... same content easily can be presented in different ways - like you see it all the time on the web
... o'reilly "animal" book template: 3251 lines of CSS
... for tables, fitures, sidebars, ...
... really complex content that we lay out with CSS
... in general we do a template based approach

phil: we are a general publisher
... we have very different business models in publishing
... our authors, editors, ... are very creative
... they don't want to have standards - e.g. what you just showed (the template) is not something that they want to use

adam: absolutely - our approach is only one approach
... you have to consider what the best tool is for your business / authors etc.
... limitations of the CSS workflow:
... there is a depdendency on commercial PDF processors for professional quality books
... complex layouts and two-page spreads can be difficult
... we did not design our own enginge because it is some very serious engineering
... example of what is currently discussed in w3c
... you want to say: I have a note on the left page like ... , and on the right page like ...
... you cannot do this today
... moving ahead: publishers need to use CSS and provide feedback
... there needs to be support for newer modules: exclusion, regions, grid layout
... template approach is great, but we need to move on and push things to the limits

ivan: in the workflow that you had before - how do you handle aspects like review
... many publishers still use word since this gives them that reviewing functionality

adam: some use word, some use other options I don't have control of
... some use PDF annotations, or versioning control via git
... showing again the docbook based model with various output
... we realized that we were producing 4 different versions of HTML - we did not plan that, it happened organically
... the question came up: why do we need docbook?
... we started thinking about using HTML natively
... big benefits: simplifies the document transformation layer
... aligns our toolset with other things on the web
... lessons learned of docbook:
... most authors don't want to work with XML
... docbook had a valuable community
... a single source content model is valuable for regenrating digital books & easy to adapt to new digital book formats
... so single source publishing model is very important - and we came up with HTMLBook

adam reading from readme at https://github.com/oreillymedia/HTMLBook

adam: it is not a standard. It has an XML Schema with it. It is a way of semantically describing publishing in HTML
... do publishers need a schema?
... you may want to write a specific HTML model and handle HTML & CSS for every book
... we don't do that and work with just one model & schema
... docbook is giving us a rich way to describe everything, from foot notes to UI items
... we needed a way to do it: should we just use class or data-* attributes?
... data-* is a wildcard - you can do what you want with them
... problem with class is that authors may want to use it for their own purposes
... some ongoing work - wish people would use it and give us feedback

jirka: in HTMLBook are you using this directly for ePub?
... or are you modifying markup?

adam: we use HTML directly. When we create ePub we add some metadata, that is it

markus: scripting transformation - how will that be set up?

adam: some XSLT, ruby, python ...

markus: there are many different flavours that many groups are doing now
... e.g. ePub 3.1 we discussed that too - the HTML WG said that data-* is not for cross platform stuff

adam: now on "Atlas" platform
... for authoring on top of HTML that I have described

adam showing authoring interface

adam: there are many HTML editors out now
... they are updating really good

<astearns> Atlas currently only available in private beta?

adam: we modified an editor to use our schema
... an author would never see the schema stuff unless she wants to
... about using git with atlas:
... author clones down book project to local writing environment
... author writes in HTML, markdown, or AsciiDoc and pushes files back to Atlas
... Atlas transforms files to HTMLBook and builds book formats

<Fil> atlas http://chimera.labs.oreilly.com/books/1230000000065

Adam showing a visualization

<Fil> HTLMBook spec https://github.com/oreillymedia/HTMLBook/

adam: github also helps with change handling
... but this depends also on the authors themselves, i.e. whether they use it or not
... one author wrote some javascript and css and interactive widgit
... we let the author embbeed that in an iFrame and that appears in an online version
... but it does not appear in the print version

<Fil> atlas (better url given by @figoblog) http://atlas.labs.oreilly.com/

adam: once you are writing in HTML you are opening up other opportunities

bill: concept of authors writing in HTML
... millions do because they blog
... I am less interested in the concept of wordpress as pagemaker, than wordpress as word

adam: I don't see a problem, that is fine

bbb: as a community we failed by not spreading version control more for people
... outside software developers only the wiki community is doing version control
... and in a limited way
... we should produce people frindly diffs and images
... the movement e.g. of images and review of changes is usually a nightmare
... if we would train people with version control it would be a nightmare

liam: there is a change tracking markup community group at w3c

http://www.w3.org/community/change/

liam: dealing with such topics

Panel: Using the Web as it Stands

Using HTML 5 as it Stands

<bert_> Scribe: Bert

<bert_> [panelists introduce themselves]

<bert_> Robin: HTML in books not a good idea.

<bert_> ... Leads to debugging, problems, brittle. HTML DOM and XML DOM differences.

<bert_> Scribe: Karen

Robin: We can actually use both together
... Idea is to agree as community to use HTML in published, final formats
... and integrate fully into Web
... Move on to next presenters unless there are questions

Next Speaker: Adam Hyde

Adam: I'm Adam Hyde, you can find me at adamhyde.net
... I have been involved in book publishing
... using community book publishing
... Book Sprint methodology to print books in 3-5 days

<liam> [for FLOSS, free/libre/open source software documentation]

Adam: in ePub, Mobi...
... rely on fast technologies
... the browser is solution
... should be the production and design environment
... and also the renderer to create PDF using HTML
... The books for methodology will be online shortly
... Books for methodology relies on output quickly for use
... have been using open source technologies for the rendering
... rely on open source
... looked at @ and pisa (sp?)
... used PDf for a while
... then used CSS
... render PDF to render content in the browser
... with tables of contents
... relied on page generated content model
... relied on JS, found @
... create content in HTML
... push button and get content formatted
... right click PDF and get 1:1
... printing press in terms of creating print is click JS
... open source
... have a lot of advantes
... get ePub as a gif
... anything you can see in browser you can see in PDF
... can make interactive presentations
... and correlate to online
... using browser itself
... use to solve book problems

[slide with list]

scribe: Even JS takes algorithms
... it's all available right now with JS
... have been doing a year
... Last point is I don't know where publishers are right now
... If Gutenberg had put everything online
... eventually publishers would invest in this
... publishers should be contributing to browser development and contributing
... a little about me
... thank you

Next Speaker: Tony Graham, Mentea

Tony: Liam said the title of this panel is deliberately open-ended
... I put as a target format
... a couple days ago Wikipedia had banner about fifth most popular site
... ASCII doc with markdown, etc.
... part of this is HTML5 converted to HTML5
... I was at Balisage presentation
... where Sanders @ talked about HTMLBook from O'Reilly
... he stressed the XSLT side, not the Ruby side of it
... What I tend to see..
... archive format...
... XSLT could be used to produce static HTML5 pages
... there is also option to do XSLT2.0 in the browser
... just loads
... and do thinkgs on the web page
... @ has a demo showing how to play chess
... Example showing what it can do entirely in XSLT
... respond to user events
... Other things I see things for HTML5 is validator
... things to do with actually data
... perhaps O'Reilly is not doing journals, but can validate parts of the content; and ISO standards for scientific journal
... journal publishers are worried about such things
... Of course it can be styled
... I transformed it
... Elizabethan Ruby
... it illustrates part of problem we have with HTML
... it lost some of it; had to modify the transform
... modify the structure
... to do more with what I needed: HTML5 was not enough to do the styling

Next Speaker: Gerry Leonidas

Gerry: I come from a university with a long tradition in typography
... most of technologists and people from publishing organizations
... have nothing to do with Gutenberg
... typography is mostly 19th century
... and extends into 20th
... book and journal typograpy has not changed that much
... have to look to graphic design
... I don't see many people here whose job it is to translate this into language that technologists and publishers understand
... a problem as you open up @

@/tools

scribe: don't ask people what they want idea
... luxury of working at university
... we said let's not do what O'Reilly does
... let's find the most difficult problems to fix
... Left is classical Greek lexicon
... written in XML and we styled output for precise typography
... has loaded meanings
... On the other hand, we have a HenryV edition of Shakespear
... different levels of annotation and mark-u[p
... you can rely on audience to parse this
... and understand which is text and which is annotation
... what is missing is the model for typesetting
... we have tried to do this online
... with Typecast, a Monotype company now
... You can do it
... problem is that this is not responsive
... difficult for text books
... we tried to figure out these types of problems
... sequencing and @...hierarchies
... break it out to these things and a number of levels
... the sticking point
... we can develop a model that both can understand is the authoring environment
... Word was not invented to produce complex documents
... and yet it was produced
... equivalent is WordPress
... linear structure; not good to do HenryV
... Markdown...good for simple hierarchies
... simple tools
... We have done a generalized model
... that gives one of nine levels of priority in each element and a sequence in a chain
... focusing on a paragraph; in P level
... working with Typecast
... as simple proofing environment; looks extremely simple
... I'll stop there

Next Speaker: Philippe Riviere

Philippe: I am a journalist and a technologist
... I was doing web site and journalism
... I don't like to call it a content management system
... but I wrote this with friends and we use it internally
... we take exports from Quark
... and port them into SQ data base to create scripts
... from this to publish ebooks
... take the HTML pages and prepare a book with them
... First we tried to make a mobile app
... then we realized that our work was not advancing software
... had bugs on every platform
... more about content
... so we went to ebooks
... the challenges was to respect the news hierarchy
... journal is not just a collection of articles
... there is an information hierarchy
... front news, news sections
... all of this is disappearing on web site
... I wanted to record the structure in a mobile app and ebooks
... and still go with HTML, CSS
... one page is table of contents; series of links
... two chapers
... chapter can be an article or a service page
... We list the chapters and a script goes to fetch each chapter
... Inside each chapter is CMS
... just knows the author of the pages
... @ was poorly documented
... we lost a lot of time figuring out these things; special files
... This is our result
... system on iPad
... we have nice typography which is the biggest challenge with ebooks
... we tried to do our best here
... We just recently produced an ebook
... in our archives
... go from May 1954
... we can now publish ebooks
... very simple thing
... thank you

Robin: Thank you for all those presentations
... are there any questions in the room?

Emily Gibson: Do publishers know how and what technologies

scribe: I teach a strategic course in London
... web technologies are basis for book publishing technologies
... they don't know what HTML is, what a mark-up language is
... these are people involved in strategy at publishing houses

Nic: that is what I have found
... same situation
... what I find frustrating is that you see expensive work-arounds in tool sets
... if they would put a toe in, would be great
... it's not so tricky
... I am hoping we can convince some publishers

Emily: have to start from first principles every time; has been ten years

Robin: a decade not so much in technology

@: Practical thing


.??

Bill Kasdorf, Apex: I am sensitive to everyone in organization to 'get this'

scribe: start talking first about vocabulary
... what do you call them, what are the pieces
... then people who know mark-up can translate it
... but if you start with mark-up you get deer in headlights reaction
... I worked with a large client with diverse publications
... first reaction was there is no way we can get XML

scribe: we have these and these and other things
... Tell me the parts
... we took A, then took B, then C, completely different
... semantically they are the same things
... they cannot see that semantically these are the same
... then they understand separation of presentation and content

Daniel: problems of HTML based tool chain
... is there is no WYSIWYG for the masses
... write docs, submit to publishers
... format data as if like Word
... but not care about the technologies inside
... someone writer should not have to care about HTML, epub
... we don't have a tool yet for the masses

Adam Hyde: I would disagree with that

scribe: If you look at GoogleDocs
... example of that
... I am not advocating that
... GoogleDocs is a solution within W3C

Adam: look at demos, they are amazing
... one of things liberating it
... has opened up opportunities

Daniel: Maybe we are living in too geeky an environment
... ask them about GoogleDocs

Leonidas: I think there are tools
... I showed one
... people are thinking about things
... take a step back
... content production used to be a specialist
... and no relation for how things exists
... very short period
... to produce content at proofing stage to mimic how things look
... we have not decided minimum level of what people need to know
... like what is bare minimum of what people need to know to drive a car
... A lot of people doing this are more my age
... with other references

<scribe> ...new generation need to train to think about what needs to be visible

UNKNOWN_SPEAKER: I think there is a big problem with GoogleDocs because it answers the problem of the previous technology
... why would GoogleDocs have a page break?
... why should it print?

@: Two comments; translating structures into things that mean something is important

<glazou> who's speaking ?

scribe: have to get back to their vocabulary
... we worked with K12 teachers in South Africa
... used an editor that was close to wzywig
... they could cut and paste the note, a whole exercise
... translation from Word or @ was affordable

@/Kathy Fletcher

Pierre Thierry: Concept of wzywig

scribe: they see something that looks like what they want at the end; but it masks the semantics
... should have 'what you see is what you mean'
... need something that shows the semantics
... may look bad but is easy to use and understand

Bill: one addition
... in production workflows
... people do have that semantic rendering
... use false colored rendering to see what is not in print; need to move that upstream

Alan Stearns, Adobe: I edit the CSS Regions spec

scribe: glad to hear bookJS has worked for you
... have to go through JS library defvelopment is annoying
... as you use library you will find out what you will need...from the technologies in the browser

Adam: I would like to respond
... CSS regions was amazing presentation
... change on fly
... it's an awesome implementation
... and we have learned a lot about book production by going down dead ends
... talk to people about the possibilities
... people outside publishing are gaining
... thanks for CSS Regions

@: I confirm that our authors know

<kathi-fletcher> what does gcpm stand for?

scribe: tools do not transcript the structure

<glazou> Generated Content for Paged Media

scribe: H1, end of the paragraph itself
... but introduce a high level hierarchy
... and have it at beginning of the next H1
... if we had a tool to capture this structure would be useful

Robin: sort of contentEditable but for structure, not mark-up

Dave Cramer: I don't think we have a well established vocabulary for these elements

<liam> [wanting something like content-editable but for structure as well as for content]

scribe: in a novel there may be a blank line
... a few firms
... use differently across boundaries and publishing houses
... have not been made universal enough

Todd Carpenter, NISO: Bill, a question for you

scribe: How many times have we tried to develop that semantic language

Bill: 3752
... that does not mean it's futile
... different interest groups have more in common
... you kind of need to start there
... I work mostly with book and journal people
... anyone in magazine world know what a deck is
... is also useful for books

Robin: information about lunch?

Liam: Lunch is "thataway"
... Come back in one hour and 23 minutes or sooner
... Take your laptops, mobile phones
... there is a sign up sheet for dinner venues

Robin: thanks to the panelists

Panel: Web technologies, authoring and Workflow

<Em> Scribing on iPad

<Em> Yes

<fsasaki> scribe: Em

Afternoon session first up is ...

Jirka kosek speaking about ITS 2.0

ITS 2.0 - metadata annotations

Jirka: 30-40-50 languages automated metadata language translation

ITS2.0 makes translation task easier and more effective

Originally developed for XML but ITS 2.0 specification can also be used for html5

ITS namespace attributes eg foie gras in the middle of English text that should not be translated

As html5 does not have support for namespaces a different approach needs to be taken

Problems to be addressed

1) ITS not just inline markup but css selectors cannot replace xpath e.g. because only with xpath you can address attributes

<fsasaki> [ FYI, the w3c validator has been updated to allow its-* validation, see an validation example here http://validator.w3.org/check?uri=http://www.w3.org/TR/its20/examples/html5/EX-term-html5-local-1.html ]

2) html5 extensibility for additional metadata

3) html5 cannot embed XML as additional metadata

One proposed solution is to use json

<fsasaki> [see a "data island" example here http://www.w3.org/TR/its20/#EX-locQualityIssue-html5-local-2 ]

Workflow problems:

1) export/import to cms - no existing standards to describe a complex export/import scenario including translation workflow.

question: do not understand point 1 A: css selectors cannot address attributes only elements

You can select elements with some attributes attach to them but you cannot select attributes themselves

<liam> [ you can select an element having a title attribute, but you can only select elements in CSS, not attributes ]

You cannot select the node attribute, you can only select the elements or parts of the elements

<fsasaki> [xpath example assuming html default namespace: //img/@alt . Scenario is to attach information e.g. that "alt" needs some special handling during translation]

q: what is the problem with using xpath and CSS...

Next presentation Tomas 4d concept

Experts in documentation and engineering towards ideal online XML editor for print, digital and other streams

Newspapers, magazines, all kinds of publications

Industry as well as publishing

Different kinds of XML inputs are used including open office, word, HTML, etc

It wasn't as comfortable as we had imagined so we created our own authoring tool of

For XML

Using xslt for all kinds of output

First we tried xmax - very efficient, but compatibility issues with Internet explorer and other tools

Then we tried CK but we had to rebuild the tool for every DTD every customer provided us

So we built a tool that is very good for simple content, XMS author

It is XML with pretty CSS on it so that authors with no interest in XML can use it

It is connected to a database where authors can pick up images or other bits of text

It is developed around oXygen with some widgets to align with oxygen components and word style toolbar

*word like toolbar

XSLT, CSS2+, MathML, web services like svg etc

[example of output printed file - a colour page with image and recipe]

Close - description of codex reader (epub3 reader for ios and android platforms, with fixed layout or reflow able layout)

Next - Nic Gibson who has written some HTML that is not portable so has to display from own laptop

Corbas works for publishers workflow plus XML hackery

One of the things we can say is that simple text is pretty much a solved problem

But there is some really big problems with complex text, eg legal, educational

<Dave_Cramer> There are no simple texts ;)

Too difficult for HTML markup but XML is required for certain more difficult texts

The difference is that XML is required for structured content with specialised tools

Unfortunately this

Is not used by authors

Authors write totally linear which makes XML authoring fail in almost every general case

We are requiring that authors use XML because we can process if

*it

MS Word is a truly awful product but authors like it for the very reasons that we dislike it - they can simply make sthg bold

We are talking about a very small subset here but one that will grow - currently on ly pop

*only

35% of students read digitally

The challenge is complex text

We need to make CSS work for these challenging texts

We need to allow authors to write linearly and then get the structure right (footnotes, sidebars, etc)

We need to think about the things that authors need - we need to let them use just enough structure when writing and not disrupt the flow

Make authors comfortable while writing

The next challenge is editing, where the concept of final proofs is very late in the process

As soon as the layout needs to be done by hand the XML system breaks

Comment: from publishers association, is this about print or digital layout too?

Answer: this is about both or it won't be cost effective for publishers

Right now converting to digital post print version isn't working very well

Problem: publishers are not involved, there are only two publishers involved right now
... there is only a small subset of people interested in this and they will not join the w3c

Publishers are outsourcing often to people who are in this room

q: what is your strategy for making pubs

Understand?

A: we point out the risks of not understanding eg your books being taken down from amazon

Also that you can create this yourself and save money if you start in the right place

Q: in w3c the point is that up until a couple of months ago this was an internal w3c discussion

The publishers have joined recently and there are more coming on board

Comment: hachette problem is training, education, etc and that is a process, there is a cost attached and it is a slow evolution

Karen Meyers w3c role is to do outreach - tv and media were at this point three years ago

They are looking for training programmes for publishing communities

fjh: Q is there a shift to have authors do more where publishers used to do layout etc

A publishers used to outsource things they understood now they outsource things because they do not understand

Not necessary towards authors but away from publishers

Comment about what authors are exited about eg drag and drop bibliographies

Comment about usability andaccess

P

*accessibility - needing to get it right

Comment Accessibility benefits everyone eg separating content and structure helps the author understand the structure better too

Comment there are no outsourced companies suggesting HTML composition to our publishing group because the quality is not good enough and we have that in print

Comment that we have a lot of problems with semantic understanding and integration of rdf and owl different content on different platforms

A

A

Publishers struggle with XML they will not have any knowledge of rdf and owl

Q if word is such a low bar why are we trying to try

Create new skins for Word

Word is not very extensible but authors still have word at their disposal so we are dependent on word as the original content editor

Word can be extended as an editing environment by exporting

Approach of "carrots not sticks" from authors all the way through to production staff at oreilly

Eg authors can publish instantly including errata or can collaborate in real time with your co authors sorry for losing your word but you do get these other things instead

Build a platform that will give authors a one to one representation of what they write to what they are selling

Authors become self correcting

Final point - we need to extend CSS for composition

We should be able to do the composition with declarative controls

We would like use cases and feedback to the steering group

<applause>

(Aside: I would not recommend scribing on an iPad - autocorrect is a nightmare!)

<Jirka> scribe: Jirka

Panel: Standards Bodies: Who does what? (moderator: Ivan Herman, W3C)

Ivan Herman: introduces panel

Christina Mussinelli:

standards are important also for book distribution

information and product flows

Standards in publishing are listed

Christine explains ISBN

Actionable ISBN

Bill Kasdorf from Apex

Bill speaks about aligning standards

EPUB3 is based on HTML5

But HTML5 is not final standard yet, but EPUB3 needs to b

EPUB3 uses approach where open things are left to HTML5

Aligning magazines with EPUB3

Mentions nextPub, PSV (PRISM Source Vocabulary) and OpenEFT (Enhanced for Tablet)

Overlapping Organizations

There are several different organizations working on issue related to EPUB, metadata, accessibility

A lot of metadata is defined in schema.org vocabularies

In future Pearson will use semantic HTML5 from authoring to production

Presentation is handled by XSLT + CSS

Output will be EPUB3

Bill Wagner, Printer Working Group: Introduced PWG

PWG makes standards in printer area: eg. IEEE 1284, IPP, XHTML/CSS-Print, ...

Areas that needs improvement: page rendering and job ticketing

Proposal for job tickets in CSS and XSL-FO

<Luc> it's all about printing

<Luc> and printer also use new technologies

Drafts are available at http://www.pwg.org

Question from Mohamed Zergaoui, Innovimax:

scribe: What's it supposed to be printer (PDF, HTML, ...)?

Bill: Input is out of scope, PWD solves print production - like binding, duplex, ...

Q from Daniel Glazman, Discuptive Innovation: CSS rules and overrides can always override print job ticket in a default ticket. So restrictions will not work.

Bill: In many scenarios user's can't provide user stylesheet (e.g. in print kiosk).

Dave Cramer: We do single source HTML publishing to many output formats. For us it's natural to put such info into CSS.

Adam Hyde: Is there plan to extend it for no-hard-copy formats?

Todd Carpenter, NISO

Members are publishers, SW industry and libraries

A lot of "intelligence" is lost from XML when transformed to HTML

Long term preservation of documents requires standards for long-term suitable formats

There are many different organizations that overlaps but little bit different needs

But W3C membership doesn't represent print publishing very well

References http://xkcd.com/927/

People involved in standards should talk more each to other

<glazou> +1 to what Ivan says

Ivan: W3C doesn't want to develop any new standards in a publishing area

But IDPF and others depend on W3C technologies and publishers are underrepresented in W3C

W3C want to setup bridges, so requirements are reflected in Web standards

Q from Daniel Glazman: 95% of EPUB are from W3C, rendering is done in browsers, publishers should join W3C, otherwise they will not have influence on technologies they depend on

Markus Gylling, IDPF: IDPF hopes to use W3C power to overcome vendor lockin in the area of readers.

Ted O'Connor: Longevity of web will be longer then of any other organization. So web formats are suitable for long-term archivation.

Todd: There is much more structural intelligence in the special formats then in web based distribution formats

Robin Berjon, W3C: I see two misconceptions. HTML is not only for rendering, it's also structured storage, which can be extended if there is something missing

HTML will last long as it has many implementations.

Mohamed: PDF formats for archiving are just 10 years old. Why HTML based ones should be develop faster? Archiving has many solutions but none is perfect. HTML can solve many of them.

Ivan: Probably we should make similar event with archiving industry.

Bill: Currently many users author in JATS, DocBook or TEI but in final HTML a lot of metadata is thrown away.

Elizabeth: We need markup for footnotes, would be nice to standardize how to markup them in HTML.

<fantasai_> It's also better for CSS, I think, if the footnote and context are marked up together!

Christina: Many publishers are not experienced with digital workflow.

<fantasai_> e.g. <p>some sentence <aside>Footnote content</aside></p> would be great

<aside role="footnote">?

<fantasai_> or <iaside>, maybe, for "inline-aside".

<fantasai_> footnote vs. endnote vs. sidenote vs. popup is presentational

<fantasai_> Japanese even has end-of-paragraph note :)

bert_ you can present footnote as a pop-up as well

Liam wraps-up

<fantasai_> Bert, I don't think so! I think a footnote is an aside, a parenthetical of sorts. It is inline in the document

<fantasai_> Bert, it's *presented* as if it were a link

Tomorrow we will start at 9