Skip to main content

IDFactory weird behavior

1 reply [Last post]
lepacheco
Offline
Joined: 2008-04-02
Points: 0

Hi! I'm using JXSE 2.5 and i found something strange while using the IDFactory. It seems that when using the methods that receive a seed for generating IDs, only the first 16 bytes from the seed are used. For example:

import net.jxta.id.IDFactory;

class Test {
public static void main(String[] args) {
System.out.println(IDFactory.newPeerGroupID("1234567890123456something".getBytes()));
System.out.println(IDFactory.newPeerGroupID("1234567890123456whatever".getBytes()));
System.out.println(IDFactory.newPeerGroupID("123456789012345different".getBytes()));
}
}

The IDs generated by the first two newPeerGroupID calls are exactly the same... while the third is different. Is this supposed to happen? In the API docs it says:

" seed - The seed information which will be used in creating the PeerGroupID. The seed information should be at least four bytes in length, though longer values are better. "

Is this a bug? If not, the documentation should state that only the first sixteen bytes are used

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
bondolo
Offline
Joined: 2003-06-11
Points: 0

You are right, the documentation on the IDFactory is incomplete. However, fixing it is not as easy as adding the documentation for IDFactory class. The IDFactory creates IDs using the id provider appropriate for the peergroup (all ids which specify a peergroup will use the same id type as the peer group). You will notice that for a few id types you need to specify the id provider.

In the default case the id provider used is the 'uuid' type. The documentation for the uuid idtype does indicate that it uses only the first 16 bytes. However, since this documentation is on an implementation sub-class and not the factory it is understandable that it is not clear. The 16 bytes of seed are intended to be used with a cryptographic hash function. uuids are fixed size 128 bits and the seed size matches this.

So, what to do for the case you show: calculate a sha-1 or sha-256 hash of the string and pass the digest result as the seed. There were a couple of different examples of this written n the past. Additionally, you need to be careful converting the string to bytes to ensure that the value is canonical. getBytes() should be replaced with getBytes("UTF-8"). getBytes() should never ever be used for anything which passes between computers because it relies on the system default encoding which varies widely between operating systems and countries. Additionally it may be necessary to further normalize the input string by converting it to lowercase, removing whitespace, etc.

HTH,

Mike