The W3C Social Web Incubator Group is organizing a free Bar Camp in the Santa Clara Sun Campus on November 2nd to foster a wide ranging discussion on the issues required to build the global Social Web.
Imagine a world where everybody could participate easily in a distributed yet secure Social Web. In such a world every one will be able to control their own information, and every business would...
on Oct 26, 2009
Content at: http://blog.arungupta.me/2009/10/hudson-webinar-and-qa-1014-10am-pt/.
on Oct 13, 2009
Java Champion Alan Williamson posted "A Simple Java class for Amazon SimpleSQS": "With such a beautiful service such as the Amazon Simple Queue Service, it shouldn't be wrapped up with a lot of complicated layers of classes for utilizing. That is why I developed the simple POJO, single class method for utilising Amazon SQS from within Java..."
on Aug 31, 2009
Expanding on the fun from my previous blog entry:
I hereby publicly claim that there exists no Java distributed computing framework that is equally flexible, and as fast, as cajo.
I challenge all wizards: Dare thee make me eat mine own words?
Welcome everyone, even teams, pick thee thine favourite magick: EJB, Spring, Jini, CORBA, JXTA, Terracotta, GridGain, etc... or even craft thine own! Let's...
on Jul 26, 2009
MapReduce is a programming model for processing vast amounts of data. One of the reasons that it works so well is because it exploits a sweet spot of modern disk drive technology trends. In essence MapReduce works by repeatedly sorting and merging data that is streamed to and from disk at the transfer rate of the disk. Contrast this to accessing data from a relational database that operates at...
on Mar 18, 2008
For the past twelve months, I have been involved with the Service Component Architecture (SCA) specifications and two of the open source SCA implementations. Now that SCA is gaining industry traction, I would like to use my weblog here to introduce the technology and demostrate how SCA can be used for building standards-based enterprise class applications using service orineted principles and...
on Jan 26, 2008
I've bumped into consistent hashing a couple of times lately. The paper that introduced the idea (Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web by David Karger et al) appeared ten years ago, although recently it seems the idea has quietly been finding its way into more and more services, from Amazon's Dynamo to memcached (...
on Nov 27, 2007
I am very pleased to announce a most significant breakthrough from the the cajo project, in the ease with which distributed computing can be accomplished in Java; and in only 20 kilobytes. It works with all JREs, 1.3 and later. (And before you Rocket Scientists out there ask; yes, it's also 64-bit clean ;)
Just three methods: (click the link, for greater detail)
void export(Object object);
on Sep 3, 2007
I've raved about the MapReduce parallel programming model in the past, and Apache Hadoop (the framework for running MapReduce applications), and Amazon's compute and storage webservices (EC2 and S3). Now I've written an article - Running Hadoop MapReduce on Amazon EC2 and Amazon S3 - about using them all together to do some data crunching.
The nice thing is that you can fire up a fair sized...
on Jul 20, 2007
SalutafugiJMS is a peer-to-peer implementation of the Java Messaging Service specification that uses ZeroConf DNS-SD discovery and TCP sockets to communicate in a distributed computing system. I built it after seeing Daniel Steinberg's JavaOne talk on ZeroConf. SalutafugiJMS uses SomnifugiJMS as a skeleton and Apple's Bonjour implementation of ZeroConf for muscle inside special SomnifugiJMS...
on Jun 24, 2007
There's a small presence at JavaOne... but an advanced research project at Sun, "Project Caroline", is gaining some *real* interest at the conference.
Project Caroline is a hosting platform designed to support SaaS providers in the development and delivery of dynamically scalable Internet-based services. The key idea around the platform is to present a pool of distributed compute, storage, and...
on May 8, 2007
Ok, I think I've spent enough time on preliminaries, so this time I'm gonna show you some UML diagrams and code. I also have to introduce you Emmanuele Sordini, one of my best friends and co-author of the Mistral project. Emmanuele is an engineer like me (but he's more on the C++ side) and an amateur photographer like me (but he's more on the astronomic photography) and some months ago told me...
on Nov 21, 2006
In March I wrote of affordable web-scale computing:
I would love an API that exposes Google's MapReduce, a simple programming model for crunching on large datasets. You can write and run MapReduce programs today, using Hadoop, but it's only really useful if you have enough machines at your disposal. The pay-as-you-go model of S3 (and Sun Grid) would be very attractive to developers who want...
on Aug 24, 2006
In case you haven't heard of it, Amazon S3 is a web service for storing data.
The two great things about it are that it's simple (look at its nice REST API), and it's cheap (with a pay-as-you-go charging model).
This latter point explains the growing number of startups that are using it to launch new business ventures: no data silos to maintain, and pay by the gigabyte.
My favourite innovative...
on Aug 13, 2006
With the launch of Amazon S3 (Simple Storage Service) we are seeing a continuation of the trend for the big web companies to monetize their computing infrastructure by opening it up to developers.
It is probably only a matter of time before we see Google create something similar, which would essentially be a limited public interface onto the Google File System.
I would love an API that exposes...
on Mar 17, 2006
In a previous blog
I wrote about Nutch's MapReduce implementation, for distributed processing of massive data sets. This, and the closely related Nutch Distributed File System (renamed Hadoop Distributed File System), have now been moved into a standalone project called Hadoop.
According to Doug Cutting, who created Hadoop (as well as Lucene and Nutch), the name comes from:
The name my kid...
on Feb 8, 2006
Doug Cutting has done it again. The creator of Lucene and Nutch has implemented (with Mike Cafarella and others) a distributed platform for high volume data processing called MapReduce.
MapReduce is the brainchild of Google and is very well documented by Jeffrey Dean and Sanjay Ghemawat in their paper MapReduce: Simplified Data Processing on Large Clusters. In essence, it allows...
on Sep 25, 2005
This article in Infoworld: http://www.infoworld.com/article/05/07/20/HNmeshnetworks_1.html?source=NLC-TB2005-07-20 reports that 15 competing proposals will be whittled down to create the new IEEE 802.11s specification.
What's so great about a mesh network topology? A mesh network is a network in which the routing of messages is performed as a decentralized, cooperative process involving many peer...
on Jul 21, 2005
One of the promises of J2EE or Web Services is to allow individual "component" to be discovered and reused to form new business functions. But in reality, things are usually a bit more complicated that this.
In addition to matching the functional requirements of the component, one would need to match the non-functional requirements before a component can be reused. For example:
on Aug 30, 2004
Some of you may know me as the host of the cajo project. In fact, the topic of my blog entry today is that thanks to java.net; there are a lot more of you than I thought!
I was just informed about the logger project; it allows java.net project owners to view access statistics for their projects. What I found out astonished me. So much so, that I thought it important to share this discovery with...
on Aug 19, 2004