Skip to main content

Project Celeste

Posted by glennscott on May 24, 2010 at 8:42 PM PDT
Project URL

Go see Project Homepage

Summary

A peer-to-peer storage system.

Community: 

PublishedCeleste is a highly-available, ad hoc, distributed, peer-to-peer data store. The system implements semantics for data creation, deletion, arbitrary read and write in a strict-consistency data model.

As in similar systems, Celeste divides data into fragments and replicates each fragment on multiple nodes in the system. Celeste dynamically caches and re-caches fragment replicas to ensure that a minimum number of them exist in the system to protect against loss and that enough fragments are available to recover the entirety of the data. Celeste clients create, write, and subsequently read data despite the fact that some of the nodes, and consequently some of the fragments, may be unavailable.

In contrast to many other similar systems, Celeste clients are able to update any portion of the data including the ability to delete it. Celeste has a strict data coherency model which is maintained despite multiple simultaneous readers and writers, and arbitrary node failures. Data is deleted completely, even in the face of temporary node failures and absence. Celeste clients work with data using a full complement of operations: create, delete, read, write, and administrative controls such as access control. While Celeste provides a global name-space for all data, the name-space can be divided into sub-domains.

Experimental and example applications for Celeste range from small systems maintaining a high-available central repository of metadata for a large storage system to a general purpose geographically distributed "cloud" storage system comprised of several hundred nodes.

In addition to simply storing and retrieving data, Celeste is also a development platform for writing applications that are distributed across the fragments of the stored data. The distributed components of the application run in parallel producing results that are collated, similar to other contemporary map/reduce systems, or in new ways devised by the application programmer.

Developer applications have access to stored data and are used to extend the data access interface and can derive new data to store. For example, the read interface can be extended to permit an application to read image or video files in different formats performing the translation as the data is read from the system.

Celeste originated as Sun Microsystems Laboratories research project investigating techniques for building large scale, distributed systems that work not only in secured data centers but also in open and potentially hostile environments such as the Internet.

While the system is demonstrable, Celeste is still very much a work-in-progress. Our goal is to expand Celeste into the areas of distributed processing and storage using the foundation developed. We invite other developers, users and researchers to consider Celeste as a platform for their own applications, extensions and further work.

License: 

GNU General Public License (GPL v. 2.0)