 \documentclass{llncs}
  \usepackage{graphicx}
  \usepackage{url}

  \DeclareGraphicsRule{.gif}{eps}{}{}
  
  \begin{document}
   \bibliographystyle{splncs}



\title{Transforming XHTML to LaTeX and BibTeX}
\author{Dan Connolly}
  \maketitle
  

    \begin{abstract}
  


\par We transform XHTML to LaTeX and BibTeX to allow technical articles
to be developed using familiar XHTML authoring tools and
techniques.
\par{\bf PRE-PUBLICATION DRAFT  1.20  of 2006-04-20T17:10:49Z. DO NOT CIRCULATE.}\end{abstract}
  



\section{Introduction}
  


\par Occasionally a web page turns the corner from a casually drafted
idea to an article worthy of publication. Computer science conferences
often require submissions using specific LaTeX styles; for example,
the \empty ISCW2004
submission instructions require that submitted papers be formatted
in the style of the Springer publications format for \empty Lecture
Notes in Computer Science (LNCS).
\empty XSLT is
a convenient notation to express a transformation from
XHTML to LaTeX.


\par Tools to transform from LaTeX to HTML are commonplace, but there
are far fewer to go the other way.  A little bit of searching yielded
some work\cite{Gur00} that was designed to undo a
transformation to XHTML. It used an odd XHTML namespace and exhibited
various other quirks specific to reversing that transformation, but it
provided quite a boost up the LaTeX learning curve\cite{Mann94}.


\par That code did not integrate with the BibTeX. In order to take
advantage of automatic bibliography formatting traditionally provided
by LaTeX styles, after studying the \empty BibTeX
format\cite{Spen98} for a bit, {\tt \empty xh2bibl.xsl} was born.


\par Together with tradtional {\tt pdflatex} and {\tt bibtex}
tools\cite{tetex} and and XSLT processor such as
xsltproc\cite{XSLTPROC}, this transformation can
turn ordinary web pages with just a bit of special markup into
camera-ready PDF in specialized LaTeX styles.


\subsection{A Quick Example}
  


\par This article demonstrates the basic features. See:

\begin{itemize}
  \item {\tt \empty Overview.pdf}\item {\tt \empty Overview.tex}\item {\tt \empty Overview.bib}
    \end{itemize}
  


\par They are produced ala:

\begin{verbatim}
$ make Overview.pdf
xsltproc  --novalid --stringparam DocClass llncs \
  --stringparam Bib Overview --stringparam BibStyle splncs \
  --stringparam Status prepub  \
        -o Overview.tex xh2latex.xsl Overview.html
TEXINPUTS=.:../../../2004/LLCS: pdflatex  Overview.tex
This is pdfTeX, Version 3.14159-1.10b (Web2C 7.4.5)
...
Output written on Overview.pdf (3 pages, 62474 bytes).
Transcript written on Overview.log.
xsltproc  --novalid -o Overview.bib xh2bib.xsl Overview.html
BSTINPUTS=.:../../../2004/LLCS: bibtex  Overview
This is BibTeX, Version 0.99c (Web2C 7.4.5)
The top-level auxiliary file: Overview.aux
The style file: splncs.bst
Database file #1: Overview.bib
TEXINPUTS=.:../../../2004/LLCS: pdflatex  Overview
This is pdfTeX, Version 3.14159-1.10b (Web2C 7.4.5)
...
Output written on Overview.pdf (3 pages, 67583 bytes).
Transcript written on Overview.log.
TEXINPUTS=.:../../../2004/LLCS: pdflatex  Overview
This is pdfTeX, Version 3.14159-1.10b (Web2C 7.4.5)
...
Output written on Overview.pdf (3 pages, 67167 bytes).
Transcript written on Overview.log.
\end{verbatim}





\section{Features}
  


\par The transformation {\tt \empty xh2latex.xsl}
works in the obvious way for many idioms:

\begin{itemize}
  \item sections headings: {\tt h2}, {\tt h3}, {\tt h4}\item paragraphs: {\tt p}\item itemized lists: {\tt ul}, {\tt dl}\item enumerated (numbered) lists: {\tt ol}\item tables: {\tt table border="1"}, {\tt tr}, {\tt td}\item verbatim: {\tt pre}\item phrase markup: {\tt em}, {\tt code}, {\tt tt},
    {\tt i}, {\tt b}
    \end{itemize}
  


\par Table support is limited to tables with {\tt border="1"}
and where all rows have the same number of cells. For example:
\begin{center}\begin{tabular}{|c|c|c|}
\hline
{\bf Name}&{\bf Address}&{\bf Phone}\tabularnewline
\hline
John Doe&123 High St.&555-1212\tabularnewline
\hline
Jane Smith&456 Low St.&555-1234\tabularnewline
\hline
\end{tabular}
\end{center}



\par Specialized markup is required for other idioms. An \empty article.css stylesheet provides
visual feedback for this special markup.


\par To use a latex package, add a link to the head of your document a la:
\begin{verbatim}
  <link rel="usepackage" title="url"
    href="ftp://cam.ctan.org/tex-archive/macros/latex/contrib/misc/url.sty" />
\end{verbatim}


\par The package name is taken from the title attrbute. The href attribute is not used in the LaTeX conversion.


\par We recommend the \empty url.sty
package, per \empty a TeX
FAQ. For example: \url{ http://www.w3.org/People/Connolly/}.

\subsection{Front Matter}
  


\par The following patterns are used to extract the
title page material:

\begin{itemize}
  \item {\tt div/@class="maketitle"}
  \begin{itemize}
  \item title: {\tt h1}\item abstract: {\tt div/@class="abstract"}\item author: {\tt address/a[@rel="author"]}
    \end{itemize}
  
  \item keywords: {\tt div[@class="keywords"]}\item terms: {\tt div[@class="terms"]}
    \end{itemize}
  


\par {\em support for WWW2006 style authors, following
\empty ACM style,
is in progress.}



\subsection{Cross references and footnotes}
  


\par The {\tt a[@class="ref"]} pattern is transformed to the LaTeX
{\tt \textbackslash ref\{label\}} idiom, assuming the reference takes
the form {\tt href="\#label"}.


\par The footnote pattern is {\tt *[@class="footnote"]}.


\subsection{Figures}
  


\par The {\tt div[@class="figure"]} pattern is transformed to a
figure environment; any {\tt div/@id} is used as a figure
label. The file pattern is {\tt object/@data}.  {\em Figures are
currently assumed to be PDF; the {\tt object/@height} attribute is
copied over.} The caption pattern is {\tt p[@class="caption"]}.
{\em @@need to test this.}
Be sure to include the {\tt epsfig} package a la:

\begin{verbatim}
  <link rel="usepackage" title="epsfig" />
\end{verbatim}


\subsection{Citations and Bibliography}
  


\par An {\tt a} element starting with an open square bracket
{\tt [} is interpreted as a citation reference. The {\tt href}
is assumed to be a local link ala {\tt \#tag}.


\par The pattern {\tt dl/@class="bib"} is used to find the
bibliography.
Each item marked up ala...
\begin{verbatim}
<dt class="misc">[<a name="tetex">tetex</a>]</dt>
<dd>
<span class="author">Thomas Esser</span>
<cite><a
href="http://www.tug.org/tex-archive/help/Catalogue/entries/tetex.html"
>The TeX distribution for Unix/Linux</a></cite>
February <span class="year">2003</span>
</dd>
\end{verbatim}


\par or

\begin{verbatim}
<dt class="misc" id="tetex">[tetex]</dt>
...
\end{verbatim}


\par Note the placement of the bibtex item type {\tt misc} and the
tag {\tt tetex} and keep in mind that {\tt bibtex} ignores
works in the bibliography that are not cited from the body.


\par The {\tt \empty xh2bibl.xsl} transformation
turns this markup into BibTeX format. {\tt xh2latex.xsl} transforms
the entire bibliography {\tt dl} to a {\tt \textbackslash bibliography\{...\}}
reference.


\par {\em capitalization of titles seems to get mangled. I'm not sure if
that's a feature of certain bibliography styles or what.}



\subsection{Bugs/Caveats/Misfeatures}
  

\begin{itemize}
  \item Composed characters and such in the bibliography are
handled with a sort of kludge, e.g.
{\tt K<span title='\textbackslash "o'>ö</span>bler}
\item The {\tt samp} element is used to pass LaTeX
math markup thru, e.g.
{\tt <samp>\textbackslash Delta</samp>}

    \end{itemize}
  




\section{Makefile support}
  


\par Formatting a LaTeX document is done in several passes.  One \empty typical manual shows:

\begin{verbatim}
ucsub>  latex MyDoc.tex
ucsub>  bibtex MyDoc
ucsub>  latex MyDoc.tex
ucsub>  latex MyDoc.tex
\end{verbatim}


\par The follwing excerpt from {\tt \empty html2latex.mak} shows
some rules to accomplish this using make:

\begin{verbatim}
.html.tex:
	$(XSLTPROC) --novalid $(HLPARAMS) \
		-o $@ xh2latex.xsl $< 

.html.bib:
	$(XSLTPROC) --novalid -o $@ xh2bib.xsl $<

.tex.aux:
	TEXINPUTS=$(TEXINPUTS) $(PDFLATEX) $<

.tex.bbl:
	BSTINPUTS=$(BSTINPUTS) $(BIBTEX) $*


.aux.pdf:
	TEXINPUTS=$(TEXINPUTS) $(PDFLATEX) $*
	TEXINPUTS=$(TEXINPUTS) $(PDFLATEX) $*
\end{verbatim}


\par Sources:
\begin{itemize}
  \item {\tt \empty xh2latex.xsl}\item {\tt \empty xh2bib.xsl}\item {\tt \empty article.css}
    \end{itemize}
  





\bibliography{Overview}


\end{document}
  