1 The Future of HTML
The perspective of the scientific
publisher
Sebastian Rahtz |
and |
Herbert van Zijl |
Elsevier Science |
Oxford / Amsterdam |
May 1st 1998 |
2 Background
Elsevier Science is one of the main scientific publishers in the
world:
-
primary
research
journals
in
the
major
fields
-
over
120,000
papers
a
year
-
long-term
storage
in
SGML
since
the
early
1990s
for
structure
-
long-term
storage
in
PDF
for
page
typography
and
reprints
-
a
specialized
DTD
developed
over
the
last
6--7
years,
with
specialized
needs
in
e.g.
math
and
bibliographies
3 Document storage in HTML?
HTML's markup model does not allow essential tasks like:
-
imposition
of
editorial
control
-
generation
of
navigational
aids,
such
as
indices,
directly
from
the
document
itself
-
generation
of
rich
cross-document
(or
even
intra-document,
such
as
bibliographical
citation)
links
-
addressing
or
management
of
objects
smaller
or
larger
than
a
single
document
-
efficient
re-use
of
document
components
-
search
within
semantically
significant
components
of
a
document
4 Elsevier's current usage of HTML
Electronic journals since 1995, generating HTML from SGML; since
1997
Science Direct, an extremely large inter-linked database of
articles.
-
Pre-extraction
of
key
fields
(such
as
author
names)
for
external
indexing
and
searching
-
Linking
of
articles
to
backup
resources,
such
as
abstract
databases
-
Use
of
PDF
to
provide
better
quality
printout
than
that
which
can
be
derived
from
HTML
-
Use
of
fixed
size
GIF
images
to
display
mathematics
and
special
symbols
- Variable presentation granularity, eg
-
summary
bibliographical
details
only
-
front
matter
only
-
front
matter,
plus
figures
and
references
-
full
article
5 Common problems
-
No
semantic
information
is
left
in
the
target
HTML
file
about
links
-
Linking
to
external
resources
is
static,
in
the
simple
HTML
model,
and
cannot
easily
accomodate
changes
in
the
resources
-
The
target
HTML
does
not
allow
flexible
and
dynamic
printing,
because
of
the
lack
of
semantic
information
in
the
markup
-
The
fixed
rendering
of
math
and
special
symbols
is
expensive
in
development
and
production
time,
and
is
seriously
inflexible.
6 What's wrong
Production processes are
-
costly
to
set
up
-
not
producing
products
that
are
flexible
enough
for
the
user
All the flexibility that we introduce is at the
generation end of the
process. Parallel `canned' variants, or on-the-fly reconversion, allow
readers document display in full article, summary of headings, or just
front matter.
7 Does CSS help?
Cascading Style Sheets allow semantics to be derived in a roundabout
way from the
presentation style markup.
<H2 class="sectionHead">Results</H2>
is more useful than just
but this poor man's architectural mapping is hardly flexible
enough. Applications like scientific publishing require features like
re-ordering the components of a document, and selecting subsets of
it.
8 Future plans and requirements
Elsevier Science can develop its Web-based offerings in at least three
ways (although the priority of these is arguable):
-
Switching
flexibility
in
presentation
to
the
client
side
of
the
process,
by
preserving
semantic
markup
across
the
delivery,
instead
of
pre-rendering
it
to
HTML
-
Increasing
sophistication
and
richness
in
linking,
either
inside
the
document
database,
or
to
external
resources
-
More
interactive
documents,
with
embedded
applications
9 Towards XML
Not surprisingly, these three directions coincide with XML:
-
Client-level
applications
can
render
the
information
in
different
ways.
-
Linking
adds
value
to
basic
material.
The
extended
link
and
pointer
mechanism
proposed
for
XML
has
many
applications
in
scientific
publication.
-
Scientific
publishing
can
make
use
of
special-purpose
markup
languages,
like
those
for
mathematics
and
chemistry.
With
the
semantics
of
math
formulae,
or
molecular
models,
we
can
produce
richer
products.
10 Example
For
example,
a
simple
feature
like
`back-referencing'
from
bibliographies
currently
requires
pre-processing,
and
devious
add-in
scripts
to
hard-wire
an
interface;
but
if
the
functionality
was
a
standard
feature
of
XLink-enabled
XML
browsers,
we
could
reduce
our
work
to
style
sheets.
11 Do we need HTML?
If
our
future
lies
in
providing
flexibility
on
the
client
side
of
delivery,
we
could
become
essentially
independent
of
HTML.
It
then
becomes
a
browser
decision
whether
to
convert
XML
markup
into
HTML
for
presentation,
or
render
it
directly.
12 Directions for HTML 1: PDF-like
HTML
can
evolve
further
in
the
direction
of
PDF.
HTML
can
define
the
low-level
functionality
of
screen
documents,
freeing
browser
writers
to
concentrate
on
the
user
interface
instead
of
the
complexities
of
rendering
XML
and
XSL/CSS
directly.
Or
should
PGML
take
over
this
role?
13 Is HTML like PostScript?
PS
was
a
huge
breakthrough
in
providing
developers
with
a
single
interface
language
for
many
different
rendering
engines,
and
enabled
typesetting
to
be
a
consumer
product.
HTML
brought
multi-media
authoring
to
the
average
consumer,
by
providing
a
single,
accessible,
language.
PostScript
gave
birth
to
PDF,
which
added
all
the
functionality
needed
for
screen,
as
well
as
paper,
rendering,
and
HTML
has
slowly
acquired
more
and
more
presentation
features,
and
a
style
sheet
language.
14 Differences between PDF and HTML
-
PDF
files
usually
carry
fonts
with
them,
which
makes
them
a
bit
larger
than
HTML
files
-
While
PDF
files
contain
line
and
page
breaking
information,
the
real
difference
is
the
word-level
justification,
hyphenation
etc,
and
white-space
placement.
-
Neither
format
retains
much
worthwhile
semantic
structure
---
any
indexing
is
either
crude
crunching
of
every
word,
or
uses
predefined
catalogue
structures
-
HTML's
style
sheet
mechanism
is
more
efficient
than
the
fixed
layout
of
PDF,
but
only
allows
simplistic
adjustment
by
the
end-user.
15 Directions for HTML 2: back to basics
Split HTML into three parts:
-
A
markup
language
for
simple
documents
- A standardized set of modular XML applications for
-
hyper-linking
-
tables
-
forms
-
metadata
- Add frames to CSS?
16 Conclusions
The vital steps:
-
Break
out
the
simple
semantic
component
of
HTML
to
an
XML
application
to
enable
XML
adoption
-
Complete
the
work
on
XLink
-
Propose
standard
table
XML
DTD
-
Propose
standard
form
XML
DTD