Posted by bnewport
on February 22, 2008 at 7:21 AM PST
Differences between disk and memory based XTP and XDP.
Yesterday , I said the most important aspect of an XTP platform is the management. Automatic placing of data on a grid of computers, automatic scale out as new boxes are added and automatic replica count maintenance as boxes fail or are taken out of the grid. It's this aspect which makes extreme possible as it makes writing applications very easy as all the management is taken care of.
Whether to use a memory or disk based XTP platform is the next question. There are currently no disk based XTP platforms for transaction processing that I know of anyway. The memory space has ObjectGrid and gigoherence. The main difference between memory and disk based is a trade off between transaction latency and the total amount of data per node.
A memory based solution can use all the memory on a particular server but no more. It can be setup to push records to disk in an overflow fashion but usually that data isn't available to queries anymore once it's overflowed to disk. The main advantage of a memory solution is that the latency to access data is minimized as all data is already in the address space. If the application can be collocated in the same address space then maximum performance is attained. The only network hops with collocation should be the hop to replicate state and the hop to send the request to the address space.
A disk based solution would have the advantage of capacity. A 1Tb disk costs about 200 bucks now. A Tb of memory is a lot more. With an XTP platform, even machines with a single disk are useful as the XTP platform makes it highly available by creating replicas on other machines. So while a typical blade has about 4 or 8Gb of RAM, a disk based machine could have a Tb of disk. This is a 100x improvement in capacity. Now, the trade off here is latency. You won't get millisecond level transactions with a disk based XTP platform. You're probably talking 100ms or more. But, a 1000 boxes can store 500Tb of data redundantly (1 replica, 1Tb of disk per box). A memory based XTP platform would have 4Tb (8Gb per box, 1 replica).
The best examples of XDP systems (extreme data processing) would be googles big-table or Apache Hadoop. These allow very large data sets to be stored in a grid and use the map-reduce pattern to process it. Collocation isn't so critical it seems in an XDP system when compared with a memory based XTP system. This is mainly due to the disk latencies eliminating the advantage of collocation in address spaces.
Another difference between XDP and XTP is the way partitioning is done. Most memory based XTP systems using a hash based partitioning algorithm. XDP systems like bigtable use a key range based partitioning scheme. Key ranges offer a number of advantages over hash based ones. The first is that it's ordered. This is incredibly useful for processing data in order because data can be streamed in order from a server till that key range is exhausted and then we hop to the server with the next key range and so on. This wouldn't work with a hash based system because the keys are randomly distributed amongst all servers. Fetch data in order would be very expensive as you'd be pulling records from all servers and sorting.
The other main advantage of key range is scaling up. Hash based partition schemes using a fixed partition size and then place partitions on servers. There would be multiple partitions per server. A key range solution has no initial size. We'd start with a single partition with A-Z in it. As we add data then we will split it in to a A-F and G-Z maybe if the data was skewed that way. As more data is added then we split again and again and spread the partitions out on the boxes hosting the grid. This is how google big table works. This looks very attractive and is something I'm considering for a future rev of ObjectGrid, maybe V7.0. It also avoids the issue with hash based partitioning when you have a bad hashing algorithm and lots of records hash to the same partition. The partition starts to fill up and there isn't much you can do. The key range partition solution doesn't have this issue as we can split on any boundary.
So, to summarise, XDP systems have a lot to learn from for their XTP relations. Disk based XTP systems are certainly possible but don't seem to be on the market yet. Memory based XTP systems are here with ObjectGrid and the competitors.