A survey of free HTML rendering toolkits for Java.
Parse and render HTML documents in Java
HTML is everywhere; not just on the Web, but also as a styled-text and hyperlinking standard for help systems, online stores, email, and many other applications. And for these many needs, there are many Java-based HTML rendering toolkits. Part 1 of Joshua Marinacci's two-part series looks at the free offerings in the HTML rendering space.
The last ten years of software development have seen the rise of
the Internet and open standards, most prominently HTML. To most
non-technical people, web pages (just HTML over a TCP/IP connection)
are The Internet. And now HTML is so pervasive, its usefulness
outweighing its flaws, that we find it in many applications that aren't
strictly web browsers. Chat programs, help files, and even a certain online music store are all built on top of the flexibility and ubiquity of HTML.
There's just one problem with HTML, though. To display it, you need some
kind of web browser, usually a component that actually renders the HTML on
screen. For Java developers, a good HTML renderer can be hard to come
by. The built-in viewer, the Swing HTMLEditorKit, is quite lacking, and there
aren't many alternatives. However, the situation isn't as bleak as you
might think; there are other renderers out there, we just have to look
harder. In this article, we will review 11 different HTML renderers,
comparing their features, compliance, and speed; searching for the best one for any project. Part one will consider free (as in either "speech" or "beer") products, while part two will consider
licensed commercial offerings.
What Features Are We Looking For?
When deciding how to rate each renderer, we should consider why we need one. What do we need to do with it? HTML is essentially styled text and images, loaded over a network, with hyperlinks. Java, often called the networked programming language, works with all sorts of network components, including URLs, quite easily. So the key point we are lacking is styled text. HTML (and by HTML I mean HTML, XHTML, and CSS 1/2) has become the standard for styled text. And it's everywhere.
As processor speed and display quality have increased during the last
ten years, more and more applications have some form of styled text in them,
either for editing or display. A quick look through my Start menu turns up
the following: Outlook (HTML email and the "Outlook Today" screen), Media
Player (advertising and shortcuts), iTunes (the music store), File Explorer
(the stylized sidebar), Trillian chat (for message display), the Address
Book, Microsoft Office (Word, Excel, and PowerPoint), and the Palm Desktop. This
list doesn't even include the styled wizard text and help files for
virtually every application on my computer. These are all programs that
don't really have anything to do with HTML. If we count programs that in
some way edit or produce HTML, then I've got my editor, jEdit, Photoshop,
Flash, and iPhoto. The common thread between all of these is that they have
styled text that could be (and often is) HTML.
The other thing these programs have in common is that they don't view
normal pages on the Web. They each have specific functions, and the HTML
they use is tailored to that function. The browser in iTunes only has to
display the HTML coming out of Apple's music database, not the HTML of the average
broken web page out there. For that reason the first criteria for our
roundup will be "an adherence to standards, as modern standards as possible." This
principally means full XHTML support with as much of CSS1 and CSS2 as
possible. We want to use fewer table hacks and more
divs with style. Being
able to show malformed HTML on the Web is nice, but not essential. The most
important thing is that we can get attractive display using standard
mechanisms. To test this, we will run the browsers against an XHTML and CSS2
site known to push the envelope while being compliant, the CSS Zen Garden , a showcase for the possibilities of CSS-only style. To measure compliance with older web sites
(which may be required for some applications), will we also run against the front pages of Amazon and Slashdot , since these are two heavily used web sites with a good mixture of text and graphics.
Each product we survey will have figures showing how these three sites
are rendered. The small figure shown in the page links to an image of
a full-size browser window, so you can get a complete picture of how
the browser handles layout, images, blocks of text, etc.
Next, we care about speed. A lot of our non-traditional uses for HTML
only require small portions of pages (such as a chat program's message display)
but speed still matters. It's especially important for larger text blocks
such as help files and book readers. To test speed we will use a copy of
Shakespeare's Hamlet (from ClassicReader.com )
formatted as one gigantic HTML file. The styling is simple, but it's a large file (over 10,000 lines) to parse into memory and scroll.
browsers give us direct programmatic control from Java. Plus, the back end
for the HTML is often our application itself, which reduces the need for
validation or content generation. Some of the browsers below do support
is how hackable they are. How much can we control or change from the Java
side? Can we capture click events or trigger pop-up menus? Can we extend the
rendering at all? This will all be under the heading "Hackability."
The final condition for this article is that there must a freely
downloadable demo. Some of the commercial products we'll see in part two
come with licensing fees, but they all have something you can download right now and
try out. I've also added the condition that there must have been some
update to the package, or at least the web page, in the last year. There are a lot of
dead projects with questionable status out there that we want to
The Types of Renderers
There are two types of HTML renderers: 100 percent Java and native
wrappers. The 100-percent-Java renderers are just what they sound like, HTML
renderers written completely in Java without calls to any native libraries.
They have the advantage of being portable to almost anywhere, depending
only on the standard JRE libraries (usually Swing). The second type are
actually wrappers to a native platform web browser like Internet Explorer
or Mozilla. They have the advantage of using a fast and reliable browser
that can handle virtually any HTML you throw at it. The downside is that you may
be tied to one platform and there is less opportunity for hacking the
display from Java. Plus, loading a full web browser may be overkill (and
slow) for something like a chat program.
The license is another a distinguishing feature between these renderers.
Some are open source or at least available for no cost. Some are free for
non-commercial use, and some require licensing fees. Depending on your needs,
one type may be preferable to another, so be sure to read the actual
license before you decide.
On the rendering tests, we will use a recent build of Mozilla Firebird as
our control program. Figures 1, 2, and 3 show Mozilla viewing Slashdot, Amazon, and The
Figure 1. Amazon in Mozilla (You can click on the screen shot to open a full-size view.)
Figure 2. Slashdot in Mozilla (You can click on the screen shot to open a full-size view.)
Figure 3. CSS Zen Garden in Mozilla (You can click on the screen shot to open a full-size view.)
Free HTML Renderers
The Swing HTMLEditorKit
Company: Sun Microsystems
License: Part of the standard JRE
Type: 100 percent Java
Our first renderer is the venerable Swing HTMLEditorKit. Though it has a bad rap, it a lot better than it used to be. Recent revisions (I tested using the Java 1.4.2 JDK) have added preliminary XHTML and CSS support, though it
still fails on a lot of complicated web sites. Since it's just a subclass of
JEditorPane, it can integrate easily with any application, and the use of
Views and Documents from
javax.swing.text gives it a high hackability factor. Most
importantly, it's included with every Java Runtime, so you can depend on it
being there. Its one downside is that while you can view the source and
modify it for your own use, you can't recompile it and distribute it to others along with
your application. I'm not a big fan of the idea that we need to
open source Java, but I do think that there would be a lot to gain from
open sourcing the HTML component (or perhaps all of Swing).
Here's how our three tested pages look with the HTMLEditorKit (Figures 4, 5, and 6).
Figure 4. Amazon in HTMLEditorKit
Figure 5. Slashdot in HTMLEditorKit
Figure 6. CSS Zen Garden in HTMLEditorKit
Not too bad. The HTMLEditorKit clearly has some issues with horizontal tables, but it's passable. There is almost no modern CSS support, but it shows the degraded version of the Zen Garden properly (the
@import hack notwithstanding). If you use it, be sure to call
setEditable(false) on your JEditorPane, or else all of the
script tags will be visible. Speedwise, the HTMLEditorKit pulled up Hamlet in about one second, no slower than Mozilla, so it's pretty speedy with large amounts of text, at least.
All in all, I would say that the HTMLEditorKit's presence in the JRE trumps its failings, and if you can work around its CSS bugs, then use it. It's probably best used in applications with simple styling, such as chat programs or help windows. I wouldn't use it for web previews or anything where you want lots of graphics or tricky alignment.
Modern Compliance: Virtually none
Legacy Web: Passable
Speed: Pretty good
License: Mozilla Public License
Type: Native Wrapper
JRex is a complete wrapper for Mozilla. It is still very much under
development, but shows real potential. I was not able to get it to work
with Mozilla Firebird, but it worked flawlessly with Mozilla 1.4. I'm
guessing that this is just a version issue and hopefully will be worked out
support is perfect. Plugins are also supported except, strangely, the Java
Plugin for Windows.
In terms of hackability, JRex stacks up pretty well. There are APIs to
receive events and direct DOM access is under development. Since this is
Mozilla, we also get support for XUL, which may be useful for some
developers. My only real complaints are the problems dealing with version
issues, and lack of a way to auto-detect an existing installation of Mozilla.
However, since you can simply include an entire copy of Mozilla (about 5MB of DLLs and binaries) with your application, this may not be as much of an
For people who need to embed a true browser into an application, either
for general websurfing or proofing in a dev tool, I recommend JRex. And
since it's still a work in progress, if you are an open source developer
looking for a project to contribute to, this is one to consider. In particular,
one of the leaders mentioned wanting contributors with "knowledge of XPCOM, SWING/AWT, and JNI."
He also said that "knowing JUNIT would be an added advantage."
Figures 7, 8, and 9 show JRex's handling of our sample sites:
Figure 7. Amazon in JRex (You can click on the screen shot to open a full-size view.)
Figure 8. Slashdot in JRex (You can click on the screen shot to open a full-size view.)
Figure 9. CSS Zen Garden in JRex (You can click on the screen shot to open a full-size view.)
Modern Compliance: Excellent
Legacy Web: Excellent
Hackability: Pretty good
Company: UC Berkeley's Digital Library Project
License: Open source (GPL)
Type: 100 percent Java
Multivalent is an interesting research web browser. Meant primarily
for browsing documentation, its HTML features are a bit behind. It rendered
Amazon pretty well, but showed only the unstyled version of the Zen Garden.
It loaded Hamlet reasonably fast, but nothing spectacular. Strangely, I
couldn't get it to load Slashdot. I kept getting GZip errors, but that
may stem from some strange headers on Slashdot's front page. Multivalent
supports complete visual and behavioral customization. Plus, since it's
open source, you can always start banging on the code. It does have some
interesting features, such as lenses for magnifying the screen, full text
searching, on-screen annotation, PDF support, on-the-fly decompression, and
a speed-reading mode.
See Figures 10, 11, and 12 for a look at Multivalent.
Figure 10. Amazon in Multivalent (You can click on the screen shot to open a full-size view.)
Figure 11. Slashdot in Multivalent (You can click on the screen shot to open a full-size view.)
Figure 12. CSS Zen Garden in Multivalent (You can click on the screen shot to open a full-size view.)
Modern Compliance: Virtually none
Legacy Web: Poor
Company: Matt McBride
License: Open source (MPL)
Type: 100 percent Java
Jazilla is a resurrection of the Javagator, Netscape's Navigator-in-Java project
started before they open sourced Mozilla in 1998. Speed is poor, and the rendering for
general web sites is almost unusable. Since it's based on so much legacy
code, it will probably never support modern features such as CSS2. Still, it
can be useful for certain things, especially where a small-footprint
browser is required (such as a chat application).
Figure 13, 14, and 15 show Jazilla in action (or, perhaps, Jazilla inaction).
Figure 13. Amazon in Jazilla (You can click on the screen shot to open a full-size view.)
Figure 14. Slashdot in Jazilla (You can click on the screen shot to open a full-size view.)
Figure 15. CSS Zen Garden in Jazilla (You can click on the screen shot to open a full-size view.)
Modern Compliance: Poor
Legacy Web: Poor
Company: Andrew Moulden
License: Free for non-commercial and some commercial apps.
Type: 100 percent Java
render legacy HTML fairly well. As you can see in the screenshots below,
both Amazon and Slashdot render pretty well, though the lack of
anti-aliasing is especially apparent on Slashdot. It doesn't support CSS at
all, though it does degrade properly. This also highlights the principle of
using CSS properly so that sites are still usable without it.
As far as speed goes, it is pretty snappy on pages it supports. There is
support for event callbacks, and you can override certain features such as how
images are loaded, but there isn't too much hackability. In the long run,
the lack of CSS and XHTML means that more and more sites will fail in CalPane.
The greatest problem with CalPane is that its site doesn't appear to have been updated
since 2002. I bent my own rule and included it in this roundup
because the renderer is perfectly usable in its current state, as seen
in Figures 16, 17, and 18.
Figure 16. Amazon in CalPane (You can click on the screen shot to open a full-size view.)
Figure 17. Slashdot in CalPane (You can click on the screen shot to open a full-size view.)
Figure 18. CSS Zen Garden in CalPane (You can click on the screen shot to open a full-size view.)
Modern Compliance: None
Legacy Web: Decent
Overall, our choices are a very mixed bag. JRex offers high compliance and
speed, but requires integration with native code. The 100-percent-Java renderers have
little support for modern standards, but some (Calpane and Multivalent) can at
least render some popular pages accurately.
In part two of this series, we'll take a look at what commercial HTML renderers can do, and we'll
collect some other renderers that didn't make the cut for this survey but might yet
find their place.