Posted by turbogeek
on August 12, 2003 at 11:16 PM PDT
Small World theory proven again. We are linked to almost everyone on earth by a chain of as few as 6 people. Why is that important to a Java developer?
It's a Small Small World After-all
August, 7 2003, the BBC reported that an article in Science had been published about an experiment that proves that by simply forwarding email among acquaintances, it only takes a chain of five to seven emails to reach a specific email user unknown by the original emailer. Why is that important? It helps prove that you can write P2P applications that let you find people or information by at most 6 steps away. This is known as Small World (first tested by psychologist Stanley Milgram) or its more popular moniker, Six Degrees of Separation (A.K.A. the Kevin Bacon theory).
Six Degrees = A Lot of People
But there is a problem with Six Degrees. There were actually a lot of people emailed in the course of the search. Think of a pyramid. You email ten friends and they email ten friends who email ten friends. At 3 levels about 1,110 might be sent with each person having ten friends. At 6 degrees it is a maximum of 1,111,110 emails. If each person knew 100 people, at 6 degrees of separation, it would be 1,010,101,010,100 people. More than enough to reach everyone on the planet. That's a lot of spam and a lot of network traffic. See the problem?
Searching for a Friendster
The six degrees problem depends on who or what you are looking for. It could be a specific person, a type of person, or even a someone at a location, a or a person who has something you want. The experiment would probably have failed if repeated too many times because the emails would be treated as an intrusion and thus spam.
www.friendster.com/ is a similar idea in connectivity (join and look for firstname.lastname@example.org ). Friendster is of course a web site with a large database. They also have to pay for big servers and a lot of bandwidth. What if we duplicated a similar idea in JXTA? How could it be done?
Here is the concept. I have a JXTA pipe listening to an address that is the equivalent of my email address (one of the few things as unique as your social security number). I have a list of my friends and their email addresses. If you know me, then you know my email address and can access my friends in my address book. With access to the book, you can connect to the friend's address book to continue your search if I don't have the person you are looking for in mine.
To use the system, just connect to the pipe with my email address and ask for a contact with a person you are looking for. If I have it, I return it. If I do not have it, I can ask my friends via the same mechanism.
Too Many Friends
There is a slight flaw in this straight forward design. If we have a lot of a lot of people in the network, the number of messages could be quite large. In fact, this is the same problem Gnutella has with its search mechanism. But the answer is not too hard to solve. Just put more information on each peer and set limits on how deep searches can go. Start by replicating your friend's address books.
By duplicating address books we decrease the number of hops for each connection. Since there is a tendency for friends to flock together and thus have each other's email addresses we also reduce duplicate searches. So, in just one hop we access the close group of friends and those that are looser acquaintances (we know this by the frequency of duplicated entries among the friends).
Duplicating address books does add a little bit of storage. For example, 100 friends,each with 100 of their friends at say 256 bytes per entry is 2.5 meg. This seems like a small number with todays disk drives (my little Apple laptop has a paltry 60 Gigs). Yes, it is a waste of space to a certain degree, but is cheaper than a terabyte of disk space on a centralized web server. It also means you and your friends have a great backup mechanism for those all important address books. The best benefit though is that you save a little less than ten thousand connections for each peer contacted.
Of course there would be more to do. Security is one of the more important pieces. In addition, if we are really going to go after true 6 degrees, that's still a lot of connections. Luckily the six degrees network finds the right guy within just two or three hops. No one said software was a simple profession.
The numbers can probably be made much better. With a few networking techniques and ways to qualify the route taken, the search can be optimized. Of course I'm just talking about people in this example. If we were looking for a specific type of person (like an accountant or politician), the search space would be much smaller. It's smaller because the odds are that there is a lawyer of accountant known by at least one of your friends.
Its a Big Gulp World
Small worlds are a little easier to see when looking for types of things like the accountant or politician. Small world is a slightly different idea than pure six degrees of separation. A better example might be Starbucks Coffee, McDonalds,7-Eleven, and Wal-Mart. How close are you to one of these businesses? In effect, you are as close to one of these as are a great number of people throughout the United States and increasingly throughout the world. The network of the franchise has made the world much smaller. For example, you may need to drive a few blocks to get a coke at a 7-Eleven rather than always driving to Dallas, Texas where the original store was located. Your world is effectively smaller.
What if we replace the idea of convenience store with a peer? The peer is often just a small computer with its limited resources. A 7-Eleven is small and has a limited selection and inventory. What types of things can we put in a peer that are similar to the convenience store?
We can create small worlds in two ways. This first is to locate information geographically. For example a computer with Starbucks locations would only have addresses within a few miles of the PC (actually a large number in most neighborhoods). Another method is to repeat information. Repeating info is the same idea as the franchise or chair where each location looks identical to the next except it is conveniently located near you wherever you are. Convenience would be in relation to your geographic location via the network, LAN, ISP, etc.
Six Degree and Small Worlds networking has a lot of promise. But what can you really do with this type of software? What are other applications worth pursuing? Since P2P is here to stay, what can we use it for? I have a few ideas, but what are yours?