Posted by javakiddy
on July 19, 2006 at 2:12 PM PDT
I still have serious doubts about many of the arguments supporting the whole Java DB thing, but some of the feedback has made me think again.
Several people responded to my original rant , and I'm glad to say just about everyone one of them had an interesting take on the whole Java DB issue. Romain Guy and Joshua Marinacci even went as far as to blog on the topic. Some of the comments appended to all three entries are exceptionally noteworthy too. Spinning of from TOR (the original rant) is John Reynolds and Evan Summers who threw up some well made points I'll deal with below.
I wasn't planning to do a follow up blog, however some of the feedback has shifted my mind a little (just a little), and while I do still question many of the pro arguments, I have to admit I am more open to Java DB on the desktop than I was previously. I thought it was only fair to admit this publicly, after all this is a good thing — throwing out strong questions about a topic invites strong arguments in return, and if you're very very lucky someone will say something which opens your mind to a different way of looking at an issue. But don't start cheering yet — I still have many doubts.
Indeed some pro Java DB arguments still look suspiciously like solutions in search of a problem. Persistence is suddenly very easy, and therefore trendy, and everyone is looking for an excuse to use it. Any excuse to use it! Just like a few years back when we'd all use XML at the drop of a hat (and look where that got us!) And because of its EE origins, the current model is tied firmly to an RDBMS, and people don't want to explore beyond that.
So I guess I've moved from being 'mildly skeptical' to 'mostly indifferent'. Perhaps someone might reply to this blog with a comment which will cause me to upgrade to 'modestly interested'. ("Are you sure you want to go to red alert, sir? It will mean changing the light bulb." : Kryten the android, Red Dwarf.)Firstly it might help to give a little background of who I am and where I'm coming from. Professionally I have written full blown desktop applications in Swing (for 'closed' markets, such as education and local government) in which the database is merely a tool to store, manipulate and filter large bodies of data centrally in a collaborative working environment. These are clients of the thickest, traditional, kind (not thin CRUD wrappers) where the user interface and the database have very little one-to-one coupling, and most of the user's work revolves around local files. Therefore I see an RDBMS as a useful tool to be employed when advantageous. I do not nessasarily see it as the primary starting point of all new projects (in other words I don't subscribe to what one commentator unkindly labelled the "'RDBMS is the root of all' mindset".)
(Btw: Yes, I also wrote that modestly well known API for accessing a certain popular Instant Messaging network. But I never released the IM client which sat atop that, so it doesn't count in this context!)
Secondly I should restate my main concern, loud and clear: it is not actually that Java DB is being put into the JDK, rather that the lopsided manner in which this is being done will cause problems in itself. Java's boast is, after all, that you write it once and deploy anywhere without hassle. Suddenly we have a component which is standard on the developer's JDK, but optional on the end user's JRE. The pressure begins to mount to have Java DB included in the JRE, and next thing we know there's two megabytes more of a reason for Joe Public to click "CANCEL" on the "Download Java plugin?" dialog.
But I'll return to that concern in the conclusion.
The case for the prosecution
- Data stored in a database is outside of the file centric world that desktop users are used to. You can't double click it to launch your application. You can't drag it to the desktop for quick access. You can't do familiar things like throw it into a trashcan to delete it. (Presumably each application has to re-implement its own bespoke database-driven trashcan?) And given that it may not be stored in well known locations like "My Documents", it's unlikely to get backed up on a regular basis. It also requires specific import/export operations before one can do something simple like attach it to an email (remember, a lot of collaborative working is done by mailing documents between workmates. As silly as that is, it is a practice that non-techie users seem to prefer — heaven knowns why! — and we shouldn't obstruct that process.)
- The transactional way of working, where changes are recorded incrementally and undo/redo isn't limited to the current session, is a thrilling idea — HOWEVER, we must remember that for the desktop, the 'Open/Save/Save As/Close' model is king, one in which the end user controls when and where data is saved. It is something users are familiar with. OSSAC dictates that changes are only 'formally' committed when "Save" is used, and "Save As" can be used to branch a document/project at any point. If the user quits and doesn't save, changes should be lost. Of course, with a bit of extra work we can recreate this using the database, but just don't kid yourself that you have escaped boilerplate code forever — you've just replaces one set of frequently re-written routines (persistence) with another.
- Depending on circumstance, using relational databases for trivial data like history lists and bookmarks could be like driving a gas guzzling 4x4 on a trip a hundred yards down the road to buy a carton of milk. Some of this data (bookmarks, playlists) rarely stray beyond the few hundred entries range. Much of the rest (history, MP3 collections) won't stray much beyond the tens of thousands of entries range (unless you never prune your browser history.) And the data doesn't shrink or grow by huge ratios in a given session. This puts it still within the realm of the Java Collections API. Sure, for larger datasets searching/sorting will be noticeably slower than an SQL database, but the resource footprint could be a fraction of the size. Besides, just how often do users search and sort this data? Often enough to merit the extra overhead? Sure, I might use a search to create a playlist of all the MP3s of a given artist — but if I only create a new playlist once a day or once a week, am I going to be bothered if it takes a few seconds longer? Personally I'd rather have a media player with slightly slower search, but fast start up times and much lower resource footprint. (Remember, most current applications don't use full blown SQL databases for these purposes, and they seem to work perfectly well.)
- The user can't easily do complex searches on data which isn't stored in some sort of a relational database, true. But how many would understand how to do anything other than flat single-term, single-table searches? Or, put another way, how many OpenOffice.org users have you seen lately with an O'Reilly 'Mastering Regular Expressions" book next to their keyboard? Fortunately the regexp search in OOo Writer has a trivial code footprint and uses no runtime resources until utilised. The same cannot be said for an embedded relational database.
- It occurs to me that the advocacy of SQL in some circumstances is more to do with the convenience of the programmer, than the benefit of the end user. Sure, JPA makes our life easy, but what about the poor user? (Remember them?) She has an IM client idle in her desktop tray (with an SQL database running simply for logging) and a media player running in the background (with an SQL database for playlists) and she's browsing the web (with an SQL database for bookmarks and history) to Google for something she read in an email (there's another) that her workmate sent her about a project she editing in a word processor (ow, and another!) That's a lot of overhead for very very little functional gain over current SQL-free implementations. Java has a reputation as slow and bloated on the desktop - and here we are firing off RDBMS instances at every opportunity :)
- Yes we could use JPA/POJOs with soft references to create a 'super cache' very easily. I'm in dangerous territory here, as what I know about the internal mechanics of database software could be written on the back of a postage stamp, but... have we solved the memory problem here, or have we just shunted it sideways into a 'black hole' database component? Sure, I expect the database has some really whizzy optimised management algorithms, but a relational database in itself comes with an overhead. The idea is an interesting one, but should certainly be used with caution. Particularly if the overwhelming majority of the time you know your application will not be used with such volumes of data that it mandates the overhead.
- Configuration data? Let's get rid of all the properties files, XML files, and registry entries. Yaaaay! Let's hide it all in databases. But hold on... how often have you had to hand hack a config file to tweak something in the application? (Actually some applications, *cough* Real Player, can't even be made to work 'appropriately' without some registry hacking. Why, only this weekend I had to 'encourage' qttask.exe off of a friend's PC.) How often have you actually managed to recover a broken application by editing its configuration data directly? Hopefully not too often, but it's an invaluable skill when necessary. Currently I know in a worst case scenario I can 'fix' 99% of Windows applications using either a text editor, or Regedit — I don't need an application specific hacking tool. Oh yeah, in an ideal world we'd never ever have to hack things. Anybody know where there's a wormhole into an ideal world? On the surface using a database seems like such a good idea, but the words "be careful what you wish for..." are ringing in my ears. :)
- For all my bluster above about the desktop environment and user experience, it does have to be recognised that there are two new classes of desktop application which are drifting away from the OSSAC/file-centric model of the desktop. We have media handling tools (players, viewers, etc.) which maintain playlists and other collections of user specific data not formally saved by the user as a 'project'. We also have comms/internet based applications that maintain history lists and bookmarks, again not formally saved by the user. While a full blown SQL database might be overkill, perhaps they would benefit from a lightweight half-way-house solution, which extends some of the functionality of the Collections API to offer faster manipulations plus caching of large collections, but with only a modest extra resource hit?
- I suppose there's no reason why someone couldn't (if they haven't already) devise a POJO/JPA inspired file format, which allowed the incremental changes to an object to be saved and loaded to a regular IO stream, preferably in such a way as to support random access, so the entire history of each object doesn't have to be loaded at once. This would be a very lightweight substitution for what many want to use the database for, and would make it easier to write software to fit nicely with the file centric world of the desktop.
The case for the defence
I'm going to drop the bullet pointed format for this.
I'm a big fan of the idea of replacing some AJAX type apps, specifically those that mimick the desktop, with rich UI desktop applications which have the ability to 'float'.
By 'float' I mean not be tied to or installed directly on any given user workstation, but following the user around from place to place just like web applications. I see Swing and WebStart as a step towards this goal, although this is just (to paraphrase the fortune cookie) a single step in a journey of a hundred miles.
In such an environment, the notion of a fixed filestore on a specific PC becomes an anathema to the floating ideal. We could extend the current filesystem idea to make it global: store your files in a central filestore (or set thereof, perhaps rented from a filestore company or provided by your workplace) and hook up to that whenever and wherever you log in. But another possibility is that in such an ethereal environment, the user might feel happy with dispensing with the file centric 'motif' entirely and moving to something more like a database. So each application hooks into one or more databases where project data is stored, and collaborative work is done without the need to email files around.
Perhaps. Who can tell? I suppose the only way we will know is to try developing genuine 'floating' applications and seeing if OSSAC still makes sense. (Yup, having just invented the OSSAC acronym, I'm going to get my money's worth by using it as often as possible!)
Moving on: one thing databases do provide is integrity: for example they're a cheap way of making persistence atomic — an update either happened or it didn't. "Do, or do not. There is no try", to quote Frank Oz's right hand. This is a valid point (and I'm surprised only a couple of people mentioned it in the feedback) and one which I think probably genuinely does merit the overhead of a database. Yes — perhaps it is time we desktop coders stopped just crossing our fingers and hoping the data we wrote to disk actually arrived? Of course, the down side is we still have a user base more familiar with the notion of files/trashcans/My Documents...
And finally: I've saved the best point till last...
A number of people, Evan Summers in his blog entry for example, made the comment that there is a popular strand of desktop software which involves Swing as a thin wrapper around database CRUD operations. While I've had no professional experience developing such software, I am aware of it's existence. As I understand these are often (although not exclusively) bespoke developments for company intranets or specific business sectors, rather than general purpose products sold to a mass market.
What I was unaware of is the scale of this market. I knew there were a number of frameworks and other tools emerging which service this area — not having done much in this market I've only glanced over these, and only tried one. Perhaps I underestimated their importance? It sure looks like I may have done.
If, as some claim, there is a genuine and substantial demand for Java to be used for this purpose, then including a RDBMS in with the JDK would be a pragmatic thing to do. Why not? If Java is thriving in this market then I suppose I should back everything and anything which helps make developer's lives easier (and more predictable.) Perhaps I'm just miffed that Java is only finding success on the desktop as a thin layer over CRUD? Perhaps I wished too much to see world class editors, media players, graphics tools and all the other goodness we associate with desktop applications harnessing the power of Java? Maybe I should be more realistic and accept that if Java is ever to make it 'big time' on the desktop, then we need to shore up the areas in which it is seen as a contender?
Did you survive this far? Well thank you for reading.
I said at the start that I wanted to return to the point about JRE bloat in the conclusion. And so here it is: all of this kerfuffle really isn't about a RDBMS — it's about the size of the JRE download and how inflexible that download is. In my next blog (if my fingers ever recover from typing this one!) I want to talk about this issue, along with general issues of firing up Java apps on the desktop. Specifically I want to throw out a few ideas I've had about introducing a small platform neutral bootstrap which would fix many desktop Java problems. The bootstrap would allow libraries and even sections of the JRE to be loaded and cached on demand — meaning that in future we can all have our Java DB cake and eat it!
Thanks for reading. 'Till then...