Posted by rah003
on June 5, 2008 at 10:27 AM PDT
CMS Magnolia - meet the next version. Short lookout into some of the next version's features.
I never wrote anything about Magnolia on this blog yet, so for those of you who never heard of it here comes a brief summary. If you know what Magnolia is about, feel free to skip to the next paragraph. Magnolia is a Content Management System (CMS). It is Open Source and there is a Community Edition available for those who are in need of a CMS and are happy to work it out themselves and rely on the developer community for help. For corporate clients who require more reliable support there is an Enterprise Edition.
Magnolia is being developed in Java and is based on JSR-170 (Java Content Repository) compatible stores. By default it comes with the JSR-170 reference implementation: JackRabbit, but can be setup with any JSR-170 compatible repository. You can think of Magnolia as glue between web server and the backend repository.
When having multiple public websites served from one authoring instance, you want to make sure that they stay in sync while publishing new pieces of content. This is exactly what the new transactional activation module of the upcoming Magnolia 3.6 release is about. This module ensures a transaction-safe publishing process for a multi-site setup. In effect, this means that Magnolia will publish updated content to either all or none of your public sites.
Of course one could object that if one of the public sites goes down you still want to publish on the remaining ones so your readers get the news you want to publish, but this is something else. If one of the servers goes down, you can still remove it from the list of subscribers and happily continue publishing. The point is that removing one public server from the list becomes a conscious decision of the admin within Magnolia, and when adding it back, this action coincides with making sure all the content is being republished.
So how can activation of content be transactional? Or even better what is the transaction in this context? The definition of transaction is simple: copy a piece of content (or a bunch of them) from an authoring instance to all subscribed public instances. So what Magnolia does is that it tries to copy the content to each and every subscriber. If all content has successfully been propagated, everything is fine and Magnolia issues a "commit" command during which all the temporary data is being cleared. In case some of the subscribed Magnolia instances don't respond in time or return errors, all the other subscribers are asked by the authoring instance via "rollback" command to restore to the previous state.
Each activation transaction has multiple phases. It starts with a transmission phase during which content is being transmitted from a Magnolia authoring instance to a public instance and ends with a collection phase during which feedback (or lack thereof) is collected from all public instances. On each public instance, those phases can look differently depending on what the exact content of a transaction is like.
Let's look at those different scenarios in more detail:
- activation of new content
- transmit phase: receive content and store it in the website workspace
- commit action: nothing
- rollback action: just delete the content from website workspace
- activation of new version of previously activated content
- transmit phase: receive content, move existing version to temporary storage place and replace current version in website workspace with the received content
- commit action: delete previous version of content from temp storage
- rollback action: delete new version of the content from website workspace and restore previous version from temporary storage
- deactivation of content
- transmit phase: move content from website workspace to the temporary storage
- commit action: delete the content from temporary storage
- rollback action: restore content from temp storage back to website workspace
Now let us take a look at the implementation details: Since Magnolia uses a JCR-based back-end storage, the obvious choice would be to use the versioning capabilities of the repository, which actually works like charm. You want to publish a new version of the content and preserve the existing one? All you have to do is to create a new version of the content and you can easily restore it back with the help of the rollback functionality. However, there is a catch. Once content has been deleted, it cannot be restored. Hence, while versioning with a JCR would work nicely for activation, it will not work in the case of deactivation. Once the latest version of a piece of content has been deleted, it doesn't matter that we still have previous versions of it in the repository since the JCR will not allow us to restore any older version.
Another option would be to move the content to some parking area of the site and move it back in case of rollback. That would work, but ... hey, this is your website we are talking about! You surely don't want to have some secret stash of content there, not only it doesn't look nice, but such content is potentially exposed to the whole wide world and can be linked by other web content or manipulated by mistake by some administrator who would not know where it came from if he logged in just in the middle of a Magnolia multi-site transaction. So what Magnolia does instead is to make this temporary storage place yet another secure workspace. This way it can stash all the old versions of content safely away not polluting a live Magnolia website and it can easily restore during rollback if necessary.
The only open issue that remains is that there could still be a brief period of inconsistency on a publicly visible Magnolia server in between the moment after the transaction was initiated and before it would be rolled back (in case of the aforementioned scenario of failure to deliver to one of the other public servers). Fortunately, Magnolia solves this problem due to a cache mechanism included on each public server. Content is being served to website visitors from the cache and the cache is being flushed only after the activation process is completely finished. So during the process of activation, a public server keeps displaying old cached content to the outside world until all activation work is done and over.