Posted by cayhorstmann
on August 18, 2007 at 2:30 PM PDT
I had to render a set of presentation slides in HTML Slidy format into images. This blog entry shows how to carry out this task with the excellent Flying Saucer XHTML renderer and concludes with some ramblings about infrastructure.
Say No To Powerpoint
I loathe authoring with Powerpoint (or its OpenOffice equivalent).
Putting together a presentation requires a horrid amount of mouse clicking
and fussing with fonts and formatting. Instead, I use the amazing
href="http://www.w3.org/Talks/Tools/Slidy/">HTML Slidy. You write your
slides in XHTML (I use
href="http://www.xmlmind.com/xmleditor/">XMLMind, but any web editor
that produces XHTML will do.) Then you add links to the Slidy style sheet
This isn't something that Aunt Tilly would do, but I already know XHTML
and a bit of CSS, so authoring becomes very, very fast. I dash off a few
h1, ul, and img, and I am done.
Your Mission, Should You Choose to Accept It
style="float: left; margin-right: 1em;" />Next semester, I will use
href="http://up.ucsd.edu/about/WhatIsUP.html">Ubiquitous Presenter for
“active learning” in the classroom. It is a nifty system where
both the professor and the students can annotate slides. However, the
slides must be in Powerpoint format (and, I am told, it is best if the
Powerpoint is not too old or too new or “too complex”). As an
alternative, the system will grudgingly accept a set of 576x432 pixel
image files. (It took me a long time to realize that this means 8x6 inches
at 72 dots per inch.)
I forged ahead with this recipe:
- Use PrinceXML to convert
XHTML to PDF
- Use Pdftk to burst the
PDF into a sequence of single-page PDFs
- Use ImageMagick to convert
the pages to PNG
It worked, but the image quality was rather poor.
There had to be a better way. My mission was to find one.
I considered using JEditorPane, but I had not been happy with
it in the past. It has a lot of quirks and doesn't seem very actively
maintained. Then I happened to spot
style="float: left; margin-right: 1em;" />Flying Saucer is “an
XML/XHTML/CSS 2.1 renderer in 100% Java”. It renders to Java2D
images, and, together with the
href="http://www.lowagie.com/iText/">iText library, to PDF. (It does
not try to process arbitrary HTML, so you won't want to build a web
browser with it.) Some people use it to produce PDF or images on the fly,
for example in web applications. Generate XML or XHTML, use CSS for
styling, and you can produce quite a variety of output with very little
I loaded a slide sample into the Flying Saucer viewer application.
Impressively enough, all of the CSS was correctly interpreted. Make that
“almost all of the CSS”. A bug with image scaling was a
showstopper, but I sent a message to the mailing list, and
patch materialized within hours.
Flying Saucer comes with command-line utilities to generate PDF or
images, but there was nothing to produce an image per page. In a day, I
hacked together a simple program for
this purpose, by copying the pagination code from the PDF generator and
the image rendering code from the image generator. I don't understand all
the mumbo-jumbo with devices and contexts, but it works perfectly. Here is
a slide sample:
If you ever need to render images or PDF with lots of text and a fairly
rigid structure, give Flying Saucer a try.
style="float: left; margin-right: 1em;" />What does this all mean in a
bigger way? It is an issue of infrastructure. The Ubiquitous Presenter tools are
tethered to Powerpoint. It is a fragile infrastructure. There is no
reliable mechanism to process Powerpoint other than Powerpoint, and even
that comes with the proviso that the files are neither too old nor too
In contrast, open standards and open source enabled me to leverage a
huge infrastructure. When I submit a bug report about CSS, there was no
arguing about the expected behavior, but the bug simply got fixed. (Try
that with Powerpoint!) There are lots of tools that produce or consume
XML+CSS. The Flying Saucer people integrate the iText stuff; I take the
Flying Saucer stuff and solve my problem in a day. It's all open source
and there is very little friction. It isn't something that I think about
very much, just like I don't usually spend a lot of time thinking about
the physical infrastructure that surrounds us. As long as it works, it is
For this reason, I am somewhat reluctant to embrace the Ianguage du
jour. If I switch to Ruby or Python, I have a different infrastructure.
How good is it? For example, how do I render XHTML into PDF? Is it like
moving from the United States to Mexico? Fun at first, but then the bad
plumbing gets to you? Maybe the JVM will become a part of the
infrastructure, and in ten years, I'll call Flying Saucer from JRuby or,
more likely, Scala.