See also: IRC log
<fsasaki> liam: thank you every one for coming, Peter Lins and I will co-chair today
<fsasaki> liam going through logistics
<Fil> hi
<glazou> ScribeNick glazou
<fsasaki> scribe: fsasaki
<glazou> fsasaki: ok
liam: anybody who has not been to a w3c meeting before?
(many hands going up)
scribe: great. one aspect of w3c
workshops: they are industry consultation events
... we get together to hammer out a solution to a problem - we
(W3C) are listening to you
... we have arranged some speakers, but please: interrupt us.
Our goal is to figure out the answer to a question:
... what do you need to do commercial publishing using the open
web platform?
... the only way we can fix the web is if you tell us what's
broken
... would be a great outcome to hear "good what you can already
do today"
... another great outcome would be: hear what should be done
for the future
... any questions so far?
hash tag is #owpworkflow
liam: citing "The printed word is
not dead - it only looks that way because it doesn't move"
(2001)
... Frank Romano, at Seybold 2001
... the open web platform
... see also http://www.webplatform.org/
site for more info. A collection of technologies and standards
from W3C, IETF, ECMA, Unicode and others
... name is coming from Jeff Jaffe (W3C Ceo)
... includes HTML, CSS, SVG, MathML, JavaScript, ...
... computer software is moving to the Web
... generic desktop computers are losing groupd. the owp can do
graphics, types etc.
... wordpress in the new pagemaker
... so what is publishing? has anyone not read a physical book
in the last year?
(2 hands going up)
liam: by "publishing" at this
workshop I mean professional publishing
... including words that are digital or not
... "self publishing" is not included in this - of course this
is important too, but not in scope here
... one aspect for digital publishing is: how to convey that a
document can be trusted?
... major issue is the incompatibility of reader devices across
platforms
(many hands go up showing that this is really a problem)
liam: use of draft or unstable
features are another problem for implementations
... same for formatting limitations
"CSS does not - eBook readers don't" - comment from the audience
<Dave_Cramer> Some of the "new features" in CSS that don't work reliably across all ebook reading systems include margin-top :)
liam listing gaps - in addition to the above metadata fragmentation, prepress and finishing not handled
liam showing a formatting example
liam: things hard to do today at
the web: footnotes
... hyphenation, multiple streams of footnotes, crop / bleed /
spreads / binding / finishing
... respond to needs of various distribution and marketing
channels
... topic of metadata
... libraries are asked: what books do I have?
... but now also: who wrote the webpage?
... we don't have marc records on the web
... we have proprietary metadata that publishers have to
use
... but different book publishing channels use different
metadata
... so metadata is an important topic, but we are likely to
dive into that at a separate future workshop
... so how can we address these issues?
emily: went to a conference for independent publishers
<glazou> fsasaki: ask Liam?-)
emily: publishers are worried
about independent publishers
... anybody can do it
<astearns> emily form corvus?
indeed: )
liam: publishers are in the business of curation
<astearns> just what I heard - probably slightly wrong on name and company
liam: values that a publisher brings is: brand, ...
<newt> 'emily from corbas'
liam: publishing is about curating, finding authors, quality, ...
adam: I see publishing as a
business model
... we are trying to work out how to produce knowledge and
cultural items
... even then we talk about authors this is very "loaded"
<Dave_Cramer> Even some NY Times Bestsellers come from publishers other than the big 5.
adam: this is not the only way to
produce quality
... every time a word like publishing, author, book comes up
there is a special auro about that
... for me these are legacy terms we need to challenge a
bit
liam: there is two conversations:
what is publishing vs. what is a book
... what is publishing will come up a lot
... don't think it is productive for publishers to think that
they are in the printing business
... now about w3c in general
... a technical standards organization responsible for the
world wide web
... we are trying to fix problems in the web
... we are a member funded consortium
... companies pay to join and then can participate in working
groups
... w3c is about community and conversations
... a group of related technical work items and interconnected
working groups
... like XML, TV on the Web, publishing, ...
... some areas are cross area relevant, like
internationalization or accessibility
... example of internationalization needs to add markup to
strings e.g. for ruby annotation
... or accessibiltiy needs for people with various
disabilities
... publishing activity lead is Ivan Herman
... we have liaisons with the IDPF, Markus Gylling
... others are coming up
<Fil> (a slide in small red type, about accessibility (just sayin'))
liam: membership includes
publishers, tool vendords, users etc.
... involvement of users for creating specs is important
... we are doing a series of workshops on publishing
... this one is focusing on workflow issues, from authors to
producing physical or eBooks
... the idea that you can use the same file to produce an eBook
on the web or print
... we are to identify the needs and barriers for the workflow
topic
... aim of the workshop is not to solve the issues. but rather
to understand how to solve things and what are the right people
to solve them
... the w3c process says: a spec will not become a final
standard ("Recommendation") unless all comments are
addressed
... our process ensures consensus and produces open
standards
... = a royalty free platform
... reaching also out to specific communities
... what you can do? join w3c, join the digital publishing
interest group - if you don't join w3c then you can make
comments from the outside
... some things that many things in publishing have in
common
... most of technical innovations in publishing come from a
meeting:
... between someone who has a need (a publisher) and a
technologist
liam telling the story of gutenberg inventing book printing - showing the value of collaboration between publishing needs and technologists
liam: technologists have to
listen to publishers - and publishers should say "it would be
cool if ..."
... domain experts need to communicate
speaker from univ. of redding
aaa: would be good to have
concrete ideas about what is not written in standards, but what
is best practices
... people that have a position in publishing houses
... there is a lot of work todo to translate things from print
publishing world into other areas
<glazou> speaker was Gerry Leonidas, University of Reading
liam: in w3c we have a document
about japanese layout requirements
... there are no requirements documents yet for other
languages
[info about that: such documents are currently being created]
liam: an issue was that for some time that knowledge was not known
<inserted> Audrain (Hachette Livre): 99% of content is made offline.
liam: workflow in publishing in an ideal world: there is one file, converted to PDF, sent to the printing company, and then to eBook people to make an eBook
<glazou> speaker was Luc Audrain, Hachette Livre
liam: you don't want editorial
changes to happen in either of the publications
... people are using XML to create eBooks
... 40% of publishers are using XML in some point in their
workflow
... if you take your wordfile and convert it to XML
... you have done proofreading etc.
... that is one workflow
... "XML late" means: you ahve done pagination etc.
... and then convert to XML. That is not so good but it saves
money in the short term
liam: XML late people would like to do XML early
bill: about "unspoken rules" and
cultural knowledge
... when we look at chapter of a book, titles, different type
sizes
... we know by convention of typography what things are
... behind these are markup
... we think that the presentation is the structure, but it is
not
... there is nothing "given" in this
... another example: in the mid 90s there was a workshop about
web based publishing
... the web was brand new
... one student had a publication with certain elements in blue
and underscored
... the convention that these are links was not stable
yet
... so anybody that already had used the web were not used to
that
... now reading the new york times digital today: links are in
grey, important terms are in blue (but without
underlining)
... these examples show that these conventions evolve other
time
<glazou> so why isn't O'Reilly a member of the CSS WG?
adam: my main area is tools and
software dev side
... a story in four chapters: 1 escape from framemaker 2 down
the cascade 3 dawn of HTMLBook 4 atlas never shrugs
... approach that I describe is just one
adam: framemaker is a tool for
creating content like indesign
... docbook xml is a standard for technical documentation
... today it is version 5.1 - we are using 5.0 internally
... this was in 2006, before anything on the digital book side
was happening
... in 2007 ePub became an IDPF standard - so above was a pre
eBook standard
... so why did we move from framemaker to docbook?
... we had safari books online - 25% of our authors said I will
write my book in docbook
... so we took docbook into framemaker and tried to work with
it - that was insane
... also books needed heavy unicode support - framemaker at
that time was not good at that, XML is
... so our workflow then was DocBook XML > XLF-FO > PDF.
The step fro XML to FO used XSLT
... XSL-FO is like HTML and CSS put together. very difficult to
read - more for machines to process
... the step FO > PDF used antenna house (just one choice
possible here)
... what did we learn? First, book is not equal to PDF
... PDF was just one representation of the XML
... this lead to the single source publishing model: from one
set of markup we produce many outputs. This was around
2007
... from XML content to Safari books online, PDF, ePub,
...
... single source publishing was very succcessful
... cost saving was great
... we were able to pull this off since we had a standard and
technical authors already using that
... so we had "XML early, XML first"
... we had authors writing in wrod etc. - we converted that to
XML docbook
... now chapter 2 - a switch from XSL-FO to CSS for page
layout
... reason was various events: ePub 3 came up, O'Reilly loses
lead XSL-FO developer.
... docbook had xslt stylesheets to create ePub output
... and antenna house 6 supported CSS
... also, "by accident" we got HTML5 (as ePub3 format)
... all above "events" happened in a few monhts
... so we started doing workflows like: XML > XSLT >
HTML5+CSS > PDF (via Antenna House)
... this was hugely successful in the production group
... it lifted the bail for many people to understand the
production process
... our first XSL-FO workflow as about "doing things as good as
it gets"
... I did not like the idea that PDF is not important and will
go away
... we rather took that moment to make things better
... if you compare our PDF produced today to 3 years ago, the
difference is amazing, e.g. looking at font usage etc.
... we use CSS modules like paged media, generated content,
text, fonts
... paged media is relying no the box model
... there is the content and in the edge other regions
... you can select a right page, bottom right (which is
nested), then a generated piece of content
... including page number, font settings etc.
glazou: a problem with this
presentaiton and the modules you are mentioning
... the modules are not even stable working drafts
... o'reilly is not member of the CSS working group and you are
the heaviest user of the modules
... without your input & help, we will not make it
... these things are used all other the place
... the things are not "ready to use". There are many problems
about the features that you described
... we all know these things exist - but please know that they
are unstable
... they are implemented by some vendors - antenna house and
prince
... but from a CSS perspective they don't exist
ivan: saying the same in a more positive manner: please come to the WG to do the work!
adam: I searched for vendor extensions and there are not many in our style sheets
adam showing examples of vendor extensions with -ah prefix from typography and images
adam: image placement is a mess
in CSS spec but also in the tool(s) - something which we
struggle with
... pages are reflowable - on the web the approach is different
than in traditional publishing
<liam> [Liam's presentation, for the record, http://www.w3.org/2013/Talks/09-quin-publishing-workshop/ ]
adam: bnefits of CSS over
xsl-fo:
... "democratization" of style sheet dev
... removes "programmer" from between designer and page
... development was faster for CSS
... benefits of CSS over traditional page layout:
... same content easily can be presented in different ways -
like you see it all the time on the web
... o'reilly "animal" book template: 3251 lines of CSS
... for tables, fitures, sidebars, ...
... really complex content that we lay out with CSS
... in general we do a template based approach
phil: we are a general
publisher
... we have very different business models in publishing
... our authors, editors, ... are very creative
... they don't want to have standards - e.g. what you just
showed (the template) is not something that they want to
use
adam: absolutely - our approach
is only one approach
... you have to consider what the best tool is for your
business / authors etc.
... limitations of the CSS workflow:
... there is a depdendency on commercial PDF processors for
professional quality books
... complex layouts and two-page spreads can be difficult
... we did not design our own enginge because it is some very
serious engineering
... example of what is currently discussed in w3c
... you want to say: I have a note on the left page like ... ,
and on the right page like ...
... you cannot do this today
... moving ahead: publishers need to use CSS and provide
feedback
... there needs to be support for newer modules: exclusion,
regions, grid layout
... template approach is great, but we need to move on and push
things to the limits
ivan: in the workflow that you
had before - how do you handle aspects like review
... many publishers still use word since this gives them that
reviewing functionality
adam: some use word, some use
other options I don't have control of
... some use PDF annotations, or versioning control via
git
... showing again the docbook based model with various
output
... we realized that we were producing 4 different versions of
HTML - we did not plan that, it happened organically
... the question came up: why do we need docbook?
... we started thinking about using HTML natively
... big benefits: simplifies the document transformation
layer
... aligns our toolset with other things on the web
... lessons learned of docbook:
... most authors don't want to work with XML
... docbook had a valuable community
... a single source content model is valuable for regenrating
digital books & easy to adapt to new digital book
formats
... so single source publishing model is very important - and
we came up with HTMLBook
adam reading from readme at https://github.com/oreillymedia/HTMLBook
adam: it is not a standard. It
has an XML Schema with it. It is a way of semantically
describing publishing in HTML
... do publishers need a schema?
... you may want to write a specific HTML model and handle HTML
& CSS for every book
... we don't do that and work with just one model &
schema
... docbook is giving us a rich way to describe everything,
from foot notes to UI items
... we needed a way to do it: should we just use class or
data-* attributes?
... data-* is a wildcard - you can do what you want with
them
... problem with class is that authors may want to use it for
their own purposes
... some ongoing work - wish people would use it and give us
feedback
jirka: in HTMLBook are you using
this directly for ePub?
... or are you modifying markup?
adam: we use HTML directly. When we create ePub we add some metadata, that is it
markus: scripting transformation - how will that be set up?
adam: some XSLT, ruby, python ...
markus: there are many different
flavours that many groups are doing now
... e.g. ePub 3.1 we discussed that too - the HTML WG said that
data-* is not for cross platform stuff
adam: now on "Atlas"
platform
... for authoring on top of HTML that I have described
adam showing authoring interface
adam: there are many HTML editors
out now
... they are updating really good
<astearns> Atlas currently only available in private beta?
adam: we modified an editor to
use our schema
... an author would never see the schema stuff unless she wants
to
... about using git with atlas:
... author clones down book project to local writing
environment
... author writes in HTML, markdown, or AsciiDoc and pushes
files back to Atlas
... Atlas transforms files to HTMLBook and builds book
formats
<Fil> atlas http://chimera.labs.oreilly.com/books/1230000000065
Adam showing a visualization
<Fil> HTLMBook spec https://github.com/oreillymedia/HTMLBook/
adam: github also helps with
change handling
... but this depends also on the authors themselves, i.e.
whether they use it or not
... one author wrote some javascript and css and interactive
widgit
... we let the author embbeed that in an iFrame and that
appears in an online version
... but it does not appear in the print version
<Fil> atlas (better url given by @figoblog) http://atlas.labs.oreilly.com/
adam: once you are writing in HTML you are opening up other opportunities
bill: concept of authors writing
in HTML
... millions do because they blog
... I am less interested in the concept of wordpress as
pagemaker, than wordpress as word
adam: I don't see a problem, that is fine
bbb: as a community we failed by
not spreading version control more for people
... outside software developers only the wiki community is
doing version control
... and in a limited way
... we should produce people frindly diffs and images
... the movement e.g. of images and review of changes is
usually a nightmare
... if we would train people with version control it would be a
nightmare
liam: there is a change tracking markup community group at w3c
http://www.w3.org/community/change/
liam: dealing with such topics
<bert_> Scribe: Bert
<bert_> [panelists introduce themselves]
<bert_> Robin: HTML in books not a good idea.
<bert_> ... Leads to debugging, problems, brittle. HTML DOM and XML DOM differences.
<bert_> Scribe: Karen
Robin: We can actually use both
together
... Idea is to agree as community to use HTML in published,
final formats
... and integrate fully into Web
... Move on to next presenters unless there are questions
Next Speaker: Adam Hyde
Adam: I'm Adam Hyde, you can find
me at adamhyde.net
... I have been involved in book publishing
... using community book publishing
... Book Sprint methodology to print books in 3-5 days
<liam> [for FLOSS, free/libre/open source software documentation]
Adam: in ePub, Mobi...
... rely on fast technologies
... the browser is solution
... should be the production and design environment
... and also the renderer to create PDF using HTML
... The books for methodology will be online shortly
... Books for methodology relies on output quickly for
use
... have been using open source technologies for the
rendering
... rely on open source
... looked at @ and pisa (sp?)
... used PDf for a while
... then used CSS
... render PDF to render content in the browser
... with tables of contents
... relied on page generated content model
... relied on JS, found @
... create content in HTML
... push button and get content formatted
... right click PDF and get 1:1
... printing press in terms of creating print is click JS
... open source
... have a lot of advantes
... get ePub as a gif
... anything you can see in browser you can see in PDF
... can make interactive presentations
... and correlate to online
... using browser itself
... use to solve book problems
[slide with list]
scribe: Even JS takes
algorithms
... it's all available right now with JS
... have been doing a year
... Last point is I don't know where publishers are right
now
... If Gutenberg had put everything online
... eventually publishers would invest in this
... publishers should be contributing to browser development
and contributing
... a little about me
... thank you
Next Speaker: Tony Graham, Mentea
Tony: Liam said the title of this
panel is deliberately open-ended
... I put as a target format
... a couple days ago Wikipedia had banner about fifth most
popular site
... ASCII doc with markdown, etc.
... part of this is HTML5 converted to HTML5
... I was at Balisage presentation
... where Sanders @ talked about HTMLBook from O'Reilly
... he stressed the XSLT side, not the Ruby side of it
... What I tend to see..
... archive format...
... XSLT could be used to produce static HTML5 pages
... there is also option to do XSLT2.0 in the browser
... just loads
... and do thinkgs on the web page
... @ has a demo showing how to play chess
... Example showing what it can do entirely in XSLT
... respond to user events
... Other things I see things for HTML5 is validator
... things to do with actually data
... perhaps O'Reilly is not doing journals, but can validate
parts of the content; and ISO standards for scientific
journal
... journal publishers are worried about such things
... Of course it can be styled
... I transformed it
... Elizabethan Ruby
... it illustrates part of problem we have with HTML
... it lost some of it; had to modify the transform
... modify the structure
... to do more with what I needed: HTML5 was not enough to do
the styling
Next Speaker: Gerry Leonidas
Gerry: I come from a university
with a long tradition in typography
... most of technologists and people from publishing
organizations
... have nothing to do with Gutenberg
... typography is mostly 19th century
... and extends into 20th
... book and journal typograpy has not changed that much
... have to look to graphic design
... I don't see many people here whose job it is to translate
this into language that technologists and publishers
understand
... a problem as you open up @
@/tools
scribe: don't ask people what
they want idea
... luxury of working at university
... we said let's not do what O'Reilly does
... let's find the most difficult problems to fix
... Left is classical Greek lexicon
... written in XML and we styled output for precise
typography
... has loaded meanings
... On the other hand, we have a HenryV edition of
Shakespear
... different levels of annotation and mark-u[p
... you can rely on audience to parse this
... and understand which is text and which is annotation
... what is missing is the model for typesetting
... we have tried to do this online
... with Typecast, a Monotype company now
... You can do it
... problem is that this is not responsive
... difficult for text books
... we tried to figure out these types of problems
... sequencing and @...hierarchies
... break it out to these things and a number of levels
... the sticking point
... we can develop a model that both can understand is the
authoring environment
... Word was not invented to produce complex documents
... and yet it was produced
... equivalent is WordPress
... linear structure; not good to do HenryV
... Markdown...good for simple hierarchies
... simple tools
... We have done a generalized model
... that gives one of nine levels of priority in each element
and a sequence in a chain
... focusing on a paragraph; in P level
... working with Typecast
... as simple proofing environment; looks extremely
simple
... I'll stop there
Next Speaker: Philippe Riviere
Philippe: I am a journalist and a
technologist
... I was doing web site and journalism
... I don't like to call it a content management system
... but I wrote this with friends and we use it
internally
... we take exports from Quark
... and port them into SQ data base to create scripts
... from this to publish ebooks
... take the HTML pages and prepare a book with them
... First we tried to make a mobile app
... then we realized that our work was not advancing
software
... had bugs on every platform
... more about content
... so we went to ebooks
... the challenges was to respect the news hierarchy
... journal is not just a collection of articles
... there is an information hierarchy
... front news, news sections
... all of this is disappearing on web site
... I wanted to record the structure in a mobile app and
ebooks
... and still go with HTML, CSS
... one page is table of contents; series of links
... two chapers
... chapter can be an article or a service page
... We list the chapters and a script goes to fetch each
chapter
... Inside each chapter is CMS
... just knows the author of the pages
... @ was poorly documented
... we lost a lot of time figuring out these things; special
files
... This is our result
... system on iPad
... we have nice typography which is the biggest challenge with
ebooks
... we tried to do our best here
... We just recently produced an ebook
... in our archives
... go from May 1954
... we can now publish ebooks
... very simple thing
... thank you
Robin: Thank you for all those
presentations
... are there any questions in the room?
Emily Gibson: Do publishers know how and what technologies
scribe: I teach a strategic
course in London
... web technologies are basis for book publishing
technologies
... they don't know what HTML is, what a mark-up language
is
... these are people involved in strategy at publishing
houses
Nic: that is what I have
found
... same situation
... what I find frustrating is that you see expensive
work-arounds in tool sets
... if they would put a toe in, would be great
... it's not so tricky
... I am hoping we can convince some publishers
Emily: have to start from first principles every time; has been ten years
Robin: a decade not so much in technology
@: Practical thing
Bill Kasdorf, Apex: I am sensitive to everyone in organization to 'get this'
scribe: start talking first about
vocabulary
... what do you call them, what are the pieces
... then people who know mark-up can translate it
... but if you start with mark-up you get deer in headlights
reaction
... I worked with a large client with diverse
publications
... first reaction was there is no way we can get XML
scribe: we have these and these
and other things
... Tell me the parts
... we took A, then took B, then C, completely different
... semantically they are the same things
... they cannot see that semantically these are the same
... then they understand separation of presentation and
content
Daniel: problems of HTML based
tool chain
... is there is no WYSIWYG for the masses
... write docs, submit to publishers
... format data as if like Word
... but not care about the technologies inside
... someone writer should not have to care about HTML,
epub
... we don't have a tool yet for the masses
Adam Hyde: I would disagree with that
scribe: If you look at
GoogleDocs
... example of that
... I am not advocating that
... GoogleDocs is a solution within W3C
Adam: look at demos, they are
amazing
... one of things liberating it
... has opened up opportunities
Daniel: Maybe we are living in
too geeky an environment
... ask them about GoogleDocs
Leonidas: I think there are
tools
... I showed one
... people are thinking about things
... take a step back
... content production used to be a specialist
... and no relation for how things exists
... very short period
... to produce content at proofing stage to mimic how things
look
... we have not decided minimum level of what people need to
know
... like what is bare minimum of what people need to know to
drive a car
... A lot of people doing this are more my age
... with other references
<scribe> ...new generation need to train to think about what needs to be visible
UNKNOWN_SPEAKER: I think there is
a big problem with GoogleDocs because it answers the problem of
the previous technology
... why would GoogleDocs have a page break?
... why should it print?
@: Two comments; translating structures into things that mean something is important
<glazou> who's speaking ?
scribe: have to get back to their
vocabulary
... we worked with K12 teachers in South Africa
... used an editor that was close to wzywig
... they could cut and paste the note, a whole exercise
... translation from Word or @ was affordable
@/Kathy Fletcher
Pierre Thierry: Concept of wzywig
scribe: they see something that
looks like what they want at the end; but it masks the
semantics
... should have 'what you see is what you mean'
... need something that shows the semantics
... may look bad but is easy to use and understand
Bill: one addition
... in production workflows
... people do have that semantic rendering
... use false colored rendering to see what is not in print;
need to move that upstream
Alan Stearns, Adobe: I edit the CSS Regions spec
scribe: glad to hear bookJS has
worked for you
... have to go through JS library defvelopment is
annoying
... as you use library you will find out what you will
need...from the technologies in the browser
Adam: I would like to
respond
... CSS regions was amazing presentation
... change on fly
... it's an awesome implementation
... and we have learned a lot about book production by going
down dead ends
... talk to people about the possibilities
... people outside publishing are gaining
... thanks for CSS Regions
@: I confirm that our authors know
<kathi-fletcher> what does gcpm stand for?
scribe: tools do not transcript the structure
<glazou> Generated Content for Paged Media
scribe: H1, end of the paragraph
itself
... but introduce a high level hierarchy
... and have it at beginning of the next H1
... if we had a tool to capture this structure would be
useful
Robin: sort of contentEditable but for structure, not mark-up
Dave Cramer: I don't think we have a well established vocabulary for these elements
<liam> [wanting something like content-editable but for structure as well as for content]
scribe: in a novel there may be a
blank line
... a few firms
... use differently across boundaries and publishing
houses
... have not been made universal enough
Todd Carpenter, NISO: Bill, a question for you
scribe: How many times have we tried to develop that semantic language
Bill: 3752
... that does not mean it's futile
... different interest groups have more in common
... you kind of need to start there
... I work mostly with book and journal people
... anyone in magazine world know what a deck is
... is also useful for books
Robin: information about lunch?
Liam: Lunch is "thataway"
... Come back in one hour and 23 minutes or sooner
... Take your laptops, mobile phones
... there is a sign up sheet for dinner venues
Robin: thanks to the panelists
<Em> Scribing on iPad
<Em> Yes
<fsasaki> scribe: Em
Afternoon session first up is ...
Jirka kosek speaking about ITS 2.0
ITS 2.0 - metadata annotations
Jirka: 30-40-50 languages automated metadata language translation
ITS2.0 makes translation task easier and more effective
Originally developed for XML but ITS 2.0 specification can also be used for html5
ITS namespace attributes eg foie gras in the middle of English text that should not be translated
As html5 does not have support for namespaces a different approach needs to be taken
Problems to be addressed
1) ITS not just inline markup but css selectors cannot replace xpath e.g. because only with xpath you can address attributes
<fsasaki> [ FYI, the w3c validator has been updated to allow its-* validation, see an validation example here http://validator.w3.org/check?uri=http://www.w3.org/TR/its20/examples/html5/EX-term-html5-local-1.html ]
2) html5 extensibility for additional metadata
3) html5 cannot embed XML as additional metadata
One proposed solution is to use json
<fsasaki> [see a "data island" example here http://www.w3.org/TR/its20/#EX-locQualityIssue-html5-local-2 ]
Workflow problems:
1) export/import to cms - no existing standards to describe a complex export/import scenario including translation workflow.
question: do not understand point 1 A: css selectors cannot address attributes only elements
You can select elements with some attributes attach to them but you cannot select attributes themselves
<liam> [ you can select an element having a title attribute, but you can only select elements in CSS, not attributes ]
You cannot select the node attribute, you can only select the elements or parts of the elements
<fsasaki> [xpath example assuming html default namespace: //img/@alt . Scenario is to attach information e.g. that "alt" needs some special handling during translation]
q: what is the problem with using xpath and CSS...
Next presentation Tomas 4d concept
Experts in documentation and engineering towards ideal online XML editor for print, digital and other streams
Newspapers, magazines, all kinds of publications
Industry as well as publishing
Different kinds of XML inputs are used including open office, word, HTML, etc
It wasn't as comfortable as we had imagined so we created our own authoring tool of
For XML
Using xslt for all kinds of output
First we tried xmax - very efficient, but compatibility issues with Internet explorer and other tools
Then we tried CK but we had to rebuild the tool for every DTD every customer provided us
So we built a tool that is very good for simple content, XMS author
It is XML with pretty CSS on it so that authors with no interest in XML can use it
It is connected to a database where authors can pick up images or other bits of text
It is developed around oXygen with some widgets to align with oxygen components and word style toolbar
*word like toolbar
XSLT, CSS2+, MathML, web services like svg etc
[example of output printed file - a colour page with image and recipe]
Close - description of codex reader (epub3 reader for ios and android platforms, with fixed layout or reflow able layout)
Next - Nic Gibson who has written some HTML that is not portable so has to display from own laptop
Corbas works for publishers workflow plus XML hackery
One of the things we can say is that simple text is pretty much a solved problem
But there is some really big problems with complex text, eg legal, educational
<Dave_Cramer> There are no simple texts ;)
Too difficult for HTML markup but XML is required for certain more difficult texts
The difference is that XML is required for structured content with specialised tools
Unfortunately this
Is not used by authors
Authors write totally linear which makes XML authoring fail in almost every general case
We are requiring that authors use XML because we can process if
*it
MS Word is a truly awful product but authors like it for the very reasons that we dislike it - they can simply make sthg bold
We are talking about a very small subset here but one that will grow - currently on ly pop
*only
35% of students read digitally
The challenge is complex text
We need to make CSS work for these challenging texts
We need to allow authors to write linearly and then get the structure right (footnotes, sidebars, etc)
We need to think about the things that authors need - we need to let them use just enough structure when writing and not disrupt the flow
Make authors comfortable while writing
The next challenge is editing, where the concept of final proofs is very late in the process
As soon as the layout needs to be done by hand the XML system breaks
Comment: from publishers association, is this about print or digital layout too?
Answer: this is about both or it won't be cost effective for publishers
Right now converting to digital post print version isn't working very well
Problem: publishers are not
involved, there are only two publishers involved right
now
... there is only a small subset of people interested in this
and they will not join the w3c
Publishers are outsourcing often to people who are in this room
q: what is your strategy for making pubs
Understand?
A: we point out the risks of not understanding eg your books being taken down from amazon
Also that you can create this yourself and save money if you start in the right place
Q: in w3c the point is that up until a couple of months ago this was an internal w3c discussion
The publishers have joined recently and there are more coming on board
Comment: hachette problem is training, education, etc and that is a process, there is a cost attached and it is a slow evolution
Karen Meyers w3c role is to do outreach - tv and media were at this point three years ago
They are looking for training programmes for publishing communities
fjh: Q is there a shift to have authors do more where publishers used to do layout etc
A publishers used to outsource things they understood now they outsource things because they do not understand
Not necessary towards authors but away from publishers
Comment about what authors are exited about eg drag and drop bibliographies
Comment about usability andaccess
P
*accessibility - needing to get it right
Comment Accessibility benefits everyone eg separating content and structure helps the author understand the structure better too
Comment there are no outsourced companies suggesting HTML composition to our publishing group because the quality is not good enough and we have that in print
Comment that we have a lot of problems with semantic understanding and integration of rdf and owl different content on different platforms
A
A
Publishers struggle with XML they will not have any knowledge of rdf and owl
Q if word is such a low bar why are we trying to try
Create new skins for Word
Word is not very extensible but authors still have word at their disposal so we are dependent on word as the original content editor
Word can be extended as an editing environment by exporting
Approach of "carrots not sticks" from authors all the way through to production staff at oreilly
Eg authors can publish instantly including errata or can collaborate in real time with your co authors sorry for losing your word but you do get these other things instead
Build a platform that will give authors a one to one representation of what they write to what they are selling
Authors become self correcting
Final point - we need to extend CSS for composition
We should be able to do the composition with declarative controls
We would like use cases and feedback to the steering group
<applause>
(Aside: I would not recommend scribing on an iPad - autocorrect is a nightmare!)
<Jirka> scribe: Jirka
Ivan Herman: introduces panel
Christina Mussinelli:
standards are important also for book distribution
information and product flows
Standards in publishing are listed
Christine explains ISBN
Actionable ISBN
Bill Kasdorf from Apex
Bill speaks about aligning standards
EPUB3 is based on HTML5
But HTML5 is not final standard yet, but EPUB3 needs to b
EPUB3 uses approach where open things are left to HTML5
Aligning magazines with EPUB3
Mentions nextPub, PSV (PRISM Source Vocabulary) and OpenEFT (Enhanced for Tablet)
Overlapping Organizations
There are several different organizations working on issue related to EPUB, metadata, accessibility
A lot of metadata is defined in schema.org vocabularies
In future Pearson will use semantic HTML5 from authoring to production
Presentation is handled by XSLT + CSS
Output will be EPUB3
Bill Wagner, Printer Working Group: Introduced PWG
PWG makes standards in printer area: eg. IEEE 1284, IPP, XHTML/CSS-Print, ...
Areas that needs improvement: page rendering and job ticketing
Proposal for job tickets in CSS and XSL-FO
<Luc> it's all about printing
<Luc> and printer also use new technologies
Drafts are available at http://www.pwg.org
Question from Mohamed Zergaoui, Innovimax:
scribe: What's it supposed to be printer (PDF, HTML, ...)?
Bill: Input is out of scope, PWD solves print production - like binding, duplex, ...
Q from Daniel Glazman, Discuptive Innovation: CSS rules and overrides can always override print job ticket in a default ticket. So restrictions will not work.
Bill: In many scenarios user's can't provide user stylesheet (e.g. in print kiosk).
Dave Cramer: We do single source HTML publishing to many output formats. For us it's natural to put such info into CSS.
Adam Hyde: Is there plan to extend it for no-hard-copy formats?
Todd Carpenter, NISO
Members are publishers, SW industry and libraries
A lot of "intelligence" is lost from XML when transformed to HTML
Long term preservation of documents requires standards for long-term suitable formats
There are many different organizations that overlaps but little bit different needs
But W3C membership doesn't represent print publishing very well
References http://xkcd.com/927/
People involved in standards should talk more each to other
<glazou> +1 to what Ivan says
Ivan: W3C doesn't want to develop any new standards in a publishing area
But IDPF and others depend on W3C technologies and publishers are underrepresented in W3C
W3C want to setup bridges, so requirements are reflected in Web standards
Q from Daniel Glazman: 95% of EPUB are from W3C, rendering is done in browsers, publishers should join W3C, otherwise they will not have influence on technologies they depend on
Markus Gylling, IDPF: IDPF hopes to use W3C power to overcome vendor lockin in the area of readers.
Ted O'Connor: Longevity of web will be longer then of any other organization. So web formats are suitable for long-term archivation.
Todd: There is much more structural intelligence in the special formats then in web based distribution formats
Robin Berjon, W3C: I see two misconceptions. HTML is not only for rendering, it's also structured storage, which can be extended if there is something missing
HTML will last long as it has many implementations.
Mohamed: PDF formats for archiving are just 10 years old. Why HTML based ones should be develop faster? Archiving has many solutions but none is perfect. HTML can solve many of them.
Ivan: Probably we should make similar event with archiving industry.
Bill: Currently many users author in JATS, DocBook or TEI but in final HTML a lot of metadata is thrown away.
Elizabeth: We need markup for footnotes, would be nice to standardize how to markup them in HTML.
<fantasai_> It's also better for CSS, I think, if the footnote and context are marked up together!
Christina: Many publishers are not experienced with digital workflow.
<fantasai_> e.g. <p>some sentence <aside>Footnote content</aside></p> would be great
<aside role="footnote">?
<fantasai_> or <iaside>, maybe, for "inline-aside".
<fantasai_> footnote vs. endnote vs. sidenote vs. popup is presentational
<fantasai_> Japanese even has end-of-paragraph note :)
bert_ you can present footnote as a pop-up as well
Liam wraps-up
<fantasai_> Bert, I don't think so! I think a footnote is an aside, a parenthetical of sorts. It is inline in the document
<fantasai_> Bert, it's *presented* as if it were a link
Tomorrow we will start at 9