Skip to main content

Whats new in Mustang?

29 replies [Last post]
linuxhippy
Offline
Joined: 2004-01-07

Hi there,

I would be interested which improvements have been made to C2 except escape-analysis - there have been no documents at all :-/
C1 has undergone some tuning which is great (which maybe should have been spent on tired compilation), but noone mentions C2.

Last but not least a small microbenchmark runnning the encryption/decryption code we use in our server-product:

Sun 1.3.1_16 Client: 0.39 mb/s
Sun 6.0_b72 Client : 0.53 mb/s
JRockit 5.0 : 0.68 mb/s
Sun 1.3.1_16 Server: 0.80 mb/s
Sun 6.0_b72 Server : 0.92 mb/s
IBM-1.4.2 : 0.95 mb/s

Every test-run processed first 50mb of data for warm-up and the 1GB which was benchmarked.
So for this test the latest hotspot is still beaten the 2 or 3 year old IBM JIT :p

lg Clemens

Message was edited by: linuxhippy

Message was edited by: linuxhippy

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
linuxhippy
Offline
Joined: 2004-01-07

Hi again,

> Thanks for looking through the work :D
> Yes in the original version the instance variables si
> and x1a2 are not initialised at all, so I've set them
> both to 0 at the start of a run ;)
... hey I also now a little bit about java ;)

> Would you mind posting the bugIDs once they are out
> so that I can "watch" them as well? ;)
Yes of course but may take a while. You know "we have an three week average response time"...

By the way +5mb/s, quite impressive!
Yes Hotspot Server does an excellent job, it just should not be slower than hotspot 1.3.1 ;)

lg Clemens

olsonje
Offline
Joined: 2005-08-10

1.6 b71 server hits 2.7 for me on a p4-2.8 :/

alexlamsl
Offline
Joined: 2004-09-02

> 1.6 b71 server hits 2.7 for me on a p4-2.8 :/

That's really peculiar, if you are using my modified version that is ;)

Right I've just downloaded Java SDK 1.3.1_17, so here is the full set of results on my computer:
[pre]
HotSpot Server:
1.3.1_17: 4.93 +/- 0.02 MB/s
1.5u6: 5.87 +/- 0.04 MB/s
1.6b70: 5.846 +/- 0.005 MB/s

HotSpot Client:
1.3.1_17: 2.311 +/- 0.011 MB/s
1.5u6: 2.3660 +/- 0.0012 MB/s
1.6b70: 2.688 +/- 0.005 MB/s
[/pre]

olsonje
Offline
Joined: 2005-08-10

Yours is roughly 2.42MB/S with client, 3.59MB/s with server.

alexlamsl
Offline
Joined: 2004-09-02

somehow your client has similar performance as mine, but your server is somewhat slower.... that's curious :-/

alexlamsl
Offline
Joined: 2004-09-02

Hi there,

Mind comparing the performance of 1.3.1 and Mustang with the following piece of code?

On my Mustang Client this runs at 2.757 ± 0.017 MB/s (as compared to the original code that runs only at ~2.1 MB/s - so looks like either Mustang Server has comparable performance as that of Client or that my test machine is a bit more powerful ;) )

http://www.srcf.ucam.org/~sll51/PC1_Stream_tuned.java

Thanks,
Alex.

linuxhippy
Offline
Joined: 2004-01-07

Hi again, well here are the results:

IBM-1.4.2: 3.5318217136398955mb/s
Sun 1.6 server: 3.2746086842622306mb/s
Sun 1.3.1 server: 2.4560369387955596mb/s

lg Clemens

PS: You are sure this code does the same? ;)

alexlamsl
Offline
Joined: 2004-09-02

that's what I'm aiming for :D

at least I'm doing the analysis step by step to minimise possible mistakes ;)

linuxhippy
Offline
Joined: 2004-01-07

You get all my respect, I just reviewed your optimized version and it must have taken ages to perform all those changes. Wow what a difference keeping the original source in mind.
Unfourtunatly the encryted result is not the same, although the output is correct again. Seems you introduced some difference which corrects itself ;)

I filed two bugs with the original source:

1.) Performance regression between 1.3.1 and 1.6 (1.3.1 performs 10% better than 1.6, IBM outperforms both).
Should not be that an older version of hotspot outperforms a new one, has no priority for me but I guess the hotspot team is happy to get reports about performance regressions.

2.)EscapeAnalysis slows down Arithmetic code.
This behaviour is only visisble with the original (not tuned) source. EA should not have any negative impact on output code, but currently code generated with EA runs about 10% slower.

Thanks a lot for all the input and especially for the suggestion to use local variables! This change was a 5min fix and now I get 1mb/s more throughput making encryption a no-brainer in performance terms :)
Thanks!

lg Clemens

alexlamsl
Offline
Joined: 2004-09-02

> You get all my respect, I just reviewed your
> optimized version and it must have taken ages to
> perform all those changes. Wow what a difference
> keeping the original source in mind.
> Unfourtunatly the encryted result is not the same,
> although the output is correct again. Seems you
> introduced some difference which corrects itself ;)

Thanks for looking through the work :D
Yes in the original version the instance variables si and x1a2 are not initialised at all, so I've set them both to 0 at the start of a run ;)

Would you mind posting the bugIDs once they are out so that I can "watch" them as well? ;)

alexlamsl
Offline
Joined: 2004-09-02

OK I've just discovered that JDK has Server HotSpot in it....

So with my P4 Prescott 3GHz, WinXP Pro and Mustang build 70, I've got the following result with the modified version as posted above:

Client: 2.792 ± 0.005 MB/s
Server: 5.78 ± 0.03 MB/s

So IMO the Server HotSpot is doing an excellent job ;)

jarouch
Offline
Joined: 2004-03-04

Do you have a build older than 59? It is told to be a big diference between 58 and 59 .. http://weblogs.java.net/blog/opinali/archive/2005/11/mustangs_hotspo_1.html. I am just curious;)

linuxhippy
Offline
Joined: 2004-01-07

Hi Alex and thanks a lot for your investigation!

> Right I think I know where the core problem is -
> variables declared as Instance Fields while they can
> be Local Variables!!!
Well this leads me back to times when we had to think how the JIT could understand things and what we could to to make life for it easier.
This code is really simple and IBM's mixed-mode jvm does optimize this VERY well whereas Hotspot completle fails (in my eyes) to produce something performing good - so this is something which can/could/should be done by the jvm. It shouldn't be even that tricky to implement it.
It even worked much better in hotspot 1.3.1

Thanks a lot for figuring that out I'll change the code to local variables since its an free performance improvement :)
I just submitted this sample-code since tmarble was interested where the 30kb/s difference came from when it turned out to be a bit larger on my P4.

> Even by fixing a few of them gives me (2.434 ± 0.007)
> MB/s!
So with all those source-modifications we are only a bit away from what IBM does generate without modifications ;)

> And that probably explains why Escape Analysis wasn't
> working well - what you have here is literally the
> worst case scenario for EA...
I thought EA is currently only used to perform locking-removal (e.g. remove monitorenter/exit when they've no effect) and some very simply optimization enhancements. In my understanding EA should have almost no effect on the code above at all or it should run slightly faster but nocht ~10% slower...

Thanks for the code, I'll test it.
Just to be curious, won't break the 2 removed if-statements the code?

Thanks, lg Clemens

PS: I am again disappointed about the P4, especially about your Prescott. The Dunron has 800mhz/64kbL2 whereas Prescott has 3ghz/1mbL2 and operates just twice as fast :-/

Message was edited by: linuxhippy

alexlamsl
Offline
Joined: 2004-09-02

> Well this leads me back to times when we had to think
> how the JIT could understand things and what we could
> to to make life for it easier.
> This code is really simple and IBM's mixed-mode jvm
> does optimize this VERY well whereas Hotspot
> completle fails (in my eyes) to produce something
> performing good - so this is something which
> can/could/should be done by the jvm.

I think it is just the case where the current HotSpot offset things in the other direction, i.e. with the new code I expect HS to outperform IBM's JVM.

> It even worked much better in hotspot 1.3.1

And that is where I get the above idea from ;)

> Thanks a lot for figuring that out I'll change the
> code to local variables since its an free
> performance improvement :)

And this is, I believe, better programming patterns as well; in fact, what I was doing throughout was not aiming at making the code faster - I was simply making as though how a Java programmer would write it.

So in a way gaining from 1.46MB/s to 2.95MB/s (Oh yes :) ) just by writing it out more "naturally" in Java shows that HotSpot Team's efforts are probably highly directional, and probably in our (ok, at least my) favours as well.

> So with all those source-modifications we are only a
> bit away from what IBM does generate without
> modifications ;)

And bear in mind I'm only using HotSpot Client.

> I thought EA is currently only used to perform
> locking-removal (e.g. remove monitorenter/exit when
> they've no effect) and some very simply optimization
> enhancements. In my understanding EA should have
> almost no effect on the code above at all or it
> should run slightly faster but nocht ~10% slower...

Oh ok... as discussed earlier in some thread in the "Java SE" we weren't totally sure what was done in Mustang to utilise EA. Someone thought EA is currently used to put identified objects into finaliser queue hence saving GC's workload...

> Thanks for the code, I'll test it.
> Just to be curious, won't break the 2 removed
> if-statements the code?

that's what I mean by a WTF as in Daily WTF (http://thedailywtf.com/)
[code]
int ax, bx;
/* ... */
if (ax != 0) {
ax = ax * bx;
}
/* ... */
[/code]
(to make things sound worse, bx can actually be final since it only ever holds a constant...)
And before you (or the original author) want to explain yourself - yes in x86 Assembly I would use the quick JNZ to escape from the more expensive multiplication.
(second thought - wouldn't that make modern processor micro-optimisations like OOE and branch predictions, and even HT, less effective?)

> PS: I am again disappointed about the P4, especially
> about your Prescott. The Dunron has 800mhz/64kbL2
> whereas Prescott has 3ghz/1mbL2 and operates just
> twice as fast :-/

Really? :O
Would you mind posting the relevant numbers to disappoint me then? :-/

linuxhippy
Offline
Joined: 2004-01-07

Hi again,

I really appriciate your interrest in this topic. Thanks a lot!
I just disovered when browsing your source how much energie you must have put into this. Great - Thanks!

> I think it is just the case where the current HotSpot
> offset things in the other direction, i.e. with the
> new code I expect HS to outperform IBM's JVM.
Sorry ... I did not read your code carefully till now and I discovered that you're right - brackets are missing.
This more or less means you optimized code is wrong :-/

> And this is, I believe, better programming patterns
> as well; in fact, what I was doing throughout was not
Yes of course, I just provided that sample to show the HS engineers where performance improvements are doable.
To be honest I never thought that anybody would ask because a 2% difference in throughput ;)

> that's what I mean by a WTF as in Daily WTF
*ooops*, yes thats ... my fault since the source looks quite different :-/

lg Clemens

olsonje
Offline
Joined: 2005-08-10

So if the code posted was missing stuff can we get the correct code up? I was trying to run this myself and all I got was errors! :P

In all honesty, I would like to see what the real code, without missing pieces, is and look at it myself as well. I don't know much about performance, but I love to learn. :)

linuxhippy
Offline
Joined: 2004-01-07

Hi again,

I uploaded the files to:
http://web460.server3.webplus24.de/PC1_Stream.java
http://web460.server3.webplus24.de/PC1_Stream_tuned.java

PC1_Stream.java is the original, PC1_Stream_tunes is a slightly tuned version which just uses local variables.
I did not change the encryption algorythm, so maybe there is still room left.

Please understand I just provide this source to give an example where hotspot performs not so good.

This are the new numbers for the tuned version:
Hotspot 1.3.1 Server: 3.08mb/s
IBM 1.4.2: 2.8mb/s
Hotspot 6.0/b72 Server: 2.12mb/s

So still Hotspot 1.3.1 is faster than 6.0b72 but now also faster than IBM. I'll never trust JITs anymore ;)

If i copy the instance-variable "si" into a local one and at the end of code() write it back, I was able to archive 2.65mb/s with C2/6.0. So most of the performance problems seem to come from hotspot's slow access to member variables.

lg Clemens

Message was edited by: linuxhippy

Message was edited by: linuxhippy

olsonje
Offline
Joined: 2005-08-10

Wow, I ran them and what a difference the tuned one makes, but still, ouch!

alexlamsl
Offline
Joined: 2004-09-02

On P4 Prescott 3GHz (w/HT) and WinXP Pro (SP2), Mustang build 70 (Client) produces the following results:

mean speed: 1.467618508 MB/s
sample s.d.: 0.014599411 MB/s (N.B. not population s.d.)
no. of samples: 10
hence estimated s.d. of mean speed: 0.014599411 / sqrt(10) = 0.004616739 MB/s

Thus I am claiming a measured performance of (1.468 ± 0.005) MB/s on my system.

alexlamsl
Offline
Joined: 2004-09-02

the code should form an entry on daily wtf really....

removed 2 if-clauses (and uses nanoTime() instead), and the performance ramps up to (1.649 ± 0.006) MB/s already! ;)

Update: (1.7609 ± 0.0016) MB/s

alexlamsl
Offline
Joined: 2004-09-02

Right I think I know where the core problem is - variables declared as Instance Fields while they can be Local Variables!!!

Even by fixing a few of them gives me (2.434 ± 0.007) MB/s!

And that probably explains why Escape Analysis wasn't working well - what you have here is literally the worst case scenario for EA...

alexlamsl
Offline
Joined: 2004-09-02

linuxhippy, would you mind running the modified test below on your machine for another table of results, please? :)

[code]
import java.util.Random;

public class PC1_Stream {
private int si = 0, x1a2 = 0;
private final int x1a0[] = new int[8];
private final byte cle[] = new byte[16]; /* Hold key */
private boolean usedAsEncrypter = false;
private boolean alreadyUsed = false;

/**
* Creates a PC1_InputStream. Decodes an input stream of encoded data.
* @param in The input stream to decode.
* @param password 16 byte encryption key
*/

public PC1_Stream(byte[] password) {
System.arraycopy(password, 0, cle, 0, Math.min(cle.length, password.length));
}

private int assemble() {
x1a0[0] = ((cle[0] << 8) + cle[1]);
int inter = code(0);

x1a0[1] = (x1a0[0] ^ ((cle[2] << 8) + cle[3]));
inter ^= code(1);

x1a0[2] = (x1a0[1] ^ ((cle[4] << 8) + cle[5]));
inter ^= code(2);

x1a0[3] = (x1a0[2] ^ ((cle[6] << 8) + cle[7]));
inter ^= code(3);

x1a0[4] = (x1a0[3] ^ ((cle[8] << 8) + cle[9]));
inter ^= code(4);

x1a0[5] = (x1a0[4] ^ ((cle[10] << 8) + cle[11]));
inter ^= code(5);

x1a0[6] = (x1a0[5] ^ ((cle[12] << 8) + cle[13]));
inter ^= code(6);

x1a0[7] = (x1a0[6] ^ ((cle[14] << 8) + cle[15]));
return inter ^ code(7);
}

private int code(final int i) {
final int bx = 0x4e35, cx = 0x015a;
int ax, dx, tmp;

dx = (x1a2 + i);
ax = x1a0[i];

tmp = ax;
ax = si;
si = dx;
dx = tmp;

ax *= bx;
ax += cx * si;

tmp = ax;
ax = si;
si = tmp;

ax *= bx;
dx += cx;

ax++;

x1a2 = dx;
x1a0[i] = ax;

return ax ^ dx;
}

/**
* Returns a plain byte, which has been unencrypted from the underlying
* InputStream.
* @see java.io.FilterInputStream
*/

public byte[] decrypt(byte[] encryptedData) {
checkUsage(false);

byte c;
int compte, inter;
for (int i = 0; i < encryptedData.length; i++) {
inter = assemble();
encryptedData[i] ^= (inter >> 8) ^ inter;
c = encryptedData[i];
for (compte = 0; compte < 16; compte++)
cle[compte] ^= c;
}

return encryptedData;
}

public byte[] encrypt(byte[] data) {
checkUsage(true);

byte d;
int compte, inter;
for (int i = 0; i < data.length; i++) {
inter = assemble();
d = data[i];
for (compte = 0; compte < 16; compte++)
cle[compte] ^= d;
data[i] ^= (inter >> 8) ^ inter;
}

return data;
}

private void checkUsage(boolean isEncrypter) {
if (alreadyUsed) {
if (usedAsEncrypter != isEncrypter) {
throw new IllegalArgumentException("You may either use this class as encrypter or decrypter, not both!");
}
} else {
alreadyUsed = true;
usedAsEncrypter = isEncrypter;
}
}

public static void main(String[] args) {
final byte[] testData = new byte[1024 * 1024];
Random rnd = new Random();
rnd.nextBytes(testData);

PC1_Stream dec = new PC1_Stream(testData);
PC1_Stream enc = new PC1_Stream(testData);

/* Warmup loop */
for (int i = 0; i < 5; i++) {
enc.encrypt(testData);
dec.decrypt(testData);
}

long start, end;
final int encCount = 50;
int i;
double duration;
for (int m = 0; m < 10; m++) {
System.out.println("Starte Verschlüsselung");
start = System.nanoTime();
for (i = 0; i < encCount; i++) {
enc.encrypt(testData);
dec.decrypt(testData);
}
end = System.nanoTime();
duration = 1E-6d * (end - start);
System.out.println("Encryption took: " + duration + " ms (@ " + (1E3d * encCount / duration) + "MB/s)");
}
}
}
[/code]

linuxhippy
Offline
Joined: 2004-01-07

> Regarding the delta: if you can show statisticially
> that the 30 KB/s difference is signficant than
> it does matter.

To be honest those 30kb do not really matter for me at all,I am happy with hotspots current arithmetic performance. A bit sad is that I was not able to measure improvements in mustang even for OO code.

I re-ran it several times on a AMD-Duron800 and IBM always was about 30kb/s faster than hotspot.

hotspot 1.6 server (EA enabled) best: 0.9220499013406606mb/s
hotspot 1.6 server best: 0.9363997303168776mb/s
IBM 1.4.2 best: 0.9632797749778446mb/s

Interestingly hotspot with escape analysis was slower than withought, is/was this noise??

lg Clemens

linuxhippy
Offline
Joined: 2004-01-07

*Uuuu* bad publicity... ;)

I really did not think that anybody could be interrested in the posted numbers (they just were there to get some attention ;) )...

Please take the microbenchmark as serious as microbenchmarks may be taken, the code was not written by me, just modified to use ints instead of chars for the whole calculation (ugly ported C code) which helped the msjvm to produce the right result *rolling eyes*

When I run Mustang on my Duron 800 (see results in forum) I saw the results I expected, Mustang was almost as fast as the IBM JIT.
However on my P4 Northwood 2.6ghz the whole situation looks quite different ... I benchmarked several times since I could not believe the difference, these results show best run:

IBM 1.4.2: 2.501 mb/s
Hotspot Server 1.3.1: 1.764 mb/s
Mustang b72 server: 1.513 mb/s
JRockit 5.0 server: 1.489 mb/s
Mustang b72 server with escale analysis: 1.350 mb/s
Hotspot Server 1.4.2: 1.300mb/s

This spots up the same tendencies as on my Duron but much more brutal. IBM is way faster than hotspot and hotspot with enabled EscapeAnalysis is even slower.
Interestingly Hotspot 1.3.1 Server is faster than anything distributed later, so this is a performance regression introduced in 1.4 :-/

Furthermore please don't comment about code quality, its a piece I found and it works so I don't care about it.

/* PORTED TO JAVA BY ROBERT NEILD November 1999 */

/* PC1 Cipher Algorithm ( Pukall Cipher 1 ) */
/* By Alexander PUKALL 1991 */
/* free code no restriction to use */
/* please include the name of the Author in the final software */
/* the Key is 128 bits */

/* Only the K zone change in the two routines */
/* You can create a single routine with the two parts in it */

package com.agosys.unicom.encryption;

public class PC1_Stream
{
int ax, bx, cx, dx, si, tmp, x1a2, res, i, inter, cfc, cfd, compte;
int x1a0[] = new int[8];
byte cle[] = new byte[17]; // Hold key
private boolean usedAsEncrypter = false;
private boolean alreadyUsed = false;

/**
* Creates a PC1_InputStream. Decodes an input stream of encoded data.
* @param in The input stream to decode.
* @param password 16 byte encryption key
*/

public PC1_Stream(byte[] password)
{
System.arraycopy(password, 0, cle, 0, Math.min(16, password.length));
}

private void assemble()
{
x1a0[0] = ((cle[0] * 256) + cle[1]);

code();
inter = res;

x1a0[1] = (x1a0[0] ^ ((cle[2] * 256) + cle[3]));
code();
inter = (inter ^ res);

x1a0[2] = (x1a0[1] ^ ((cle[4] * 256) + cle[5]));
code();
inter = (inter ^ res);

x1a0[3] = (x1a0[2] ^ ((cle[6] * 256) + cle[7]));
code();
inter = (inter ^ res);

x1a0[4] = (x1a0[3] ^ ((cle[8] * 256) + cle[9]));
code();
inter = (inter ^ res);

x1a0[5] = (x1a0[4] ^ ((cle[10] * 256) + cle[11]));
code();
inter = (inter ^ res);

x1a0[6] = (x1a0[5] ^ ((cle[12] * 256) + cle[13]));
code();
inter = (inter ^ res);

x1a0[7] = (x1a0[6] ^ ((cle[14] * 256) + cle[15]));
code();
inter = (inter ^ res);

i = 0;
}

void code()
{
dx = (x1a2 + i);
ax = x1a0[i];

cx = 0x015a;
bx = 0x4e35;

tmp = ax;
ax = si;
si = tmp;

tmp = ax;
ax = dx;
dx = tmp;

if (ax != 0)
{
ax = (ax * bx);
}

tmp = ax;
ax = cx;
cx = tmp;

if (ax != 0)
{
ax = (ax * si);
cx = (ax + cx);
}

tmp = ax;
ax = si;
si = tmp;
ax = (ax * bx);
dx = (cx + dx);

ax = (ax + 1);

x1a2 = dx;
x1a0[i] = ax;

res = (ax ^ dx);
i = (i + 1);
}

/**
* Returns a plain byte, which has been unencrypted from the underlying
* InputStream.
* @see java.io.FilterInputStream
*/

public byte[] decrypt(byte[] encryptedData)
{
checkUsage(false);

for (int i = 0; i < encryptedData.length; i++)
{
int c = encryptedData[i];

assemble();
cfc = (inter >> 8);
cfd = (inter & 255);

c = c ^ (cfc ^ cfd);

for (compte = 0; compte <= 15; compte++)
{
/* we mix the plaintext byte with the key */
cle[compte] = (byte) (cle[compte] ^ c);
}

encryptedData[i] = (byte) c;
}

return encryptedData;
}

public byte[] encrypt(byte[] data)
{
checkUsage(true);

for (int i = 0; i < data.length; i++)
{
int c = data[i];

assemble();
cfc = (inter >> 8);
cfd = (inter & 255);

for (compte = 0; compte <= 15; compte++)
{
/* we mix the plaintext byte with the key */
cle[compte] = (byte) (cle[compte] ^ c);
}

c = c ^ (cfc ^ cfd);
data[i] = (byte) c;
}

return data;
}

private void checkUsage(boolean isEncrypter)
{
if (alreadyUsed)
{
if (usedAsEncrypter != isEncrypter)
{
throw new IllegalArgumentException("You may either use this class as encrypter or decrypter, not both!");
}
} else
{
alreadyUsed = true;
usedAsEncrypter = isEncrypter;
}
}

public static void main(String[] args)
{
byte[] testData = new byte[1024 * 1024];
for (int i = 0; i < testData.length; i++)
{
testData[i] = (byte) (i % 127);
}

PC1_Stream dec = new PC1_Stream(testData);
PC1_Stream enc = new PC1_Stream(testData);

for (int i = 0; i < 5; i++)
{
enc.encrypt(testData);
dec.decrypt(testData);
}

for (int m = 0; m < 10; m++)
{
int encCount = 50;
System.out.println("Starte Verschlüsselung");
long start = System.currentTimeMillis();
for (int i = 0; i < encCount; i++)
{
enc.encrypt(testData);
dec.decrypt(testData);
}
long end = System.currentTimeMillis();
long duration = end - start;
System.out.println("Encryption took: " + duration + " with " + ((double) encCount / ((double) duration / (double) 1000)) + "mb/s ");
}
}
}

Message was edited by: linuxhippy

Message was edited by: linuxhippy

alexlamsl
Offline
Joined: 2004-09-02

Firstly, the code tag is now working in this forum (as long as you use /**/ not // for comments, that is; which is what you did here, anyway)

Secondly, your code doesn't compile; looks like you have missed out a few indices for the int[]s ;)

linuxhippy
Offline
Joined: 2004-01-07

Hi again,

I just checked the slightly tuned version (only uses local variables, everything else is equal) with MSJVM build 3167 and the result made me feel a bit, well, sad ;)
MSJVM Build 3167 was that one delivered with Windows98SE, but I was running it on WindowsXP / Celeron600.

Java 1.5.0_u6 client: 0.378mb/s
MSJVM build 3167 : 0.484mb/s
Mustang b73 client : 0.551mb/s

Not really useful anymore but interesting to know.
So with Mustang SUN managed to make the client-jvm finally faster for arithmetic code than the 7 year old msjvm ;)

lg Clemens

tmarble
Offline
Joined: 2003-08-22

Ig:

You seem to have a knack for identifying what's on my "to do" list! We are working on the Mustang release
docs now...

Here's a high-level overview (not limited to C2):
- Biased locking
- Faster thread synchronization
- Lock coarsening
- java.math.BigDecimal optimizations
- parallel old generation collector
- Large pages support (new to Linux, Windows)
- Optimized System.arraycopy()
- Interned string allocation optimization
- Trig functions intrinsification

Regarding the delta: if you can show statisticially
that the 30 KB/s difference is signficant than
it does matter. We use the students T-Test to help
determine if a two sample populations (before, after)
have a significant difference.

Would you be willing to share your microbenchmark
with the group (or at least with me ;-) )?

Regards,

--Tom

alexlamsl
Offline
Joined: 2004-09-02

How I like statistics-aware individuals :D

alexlamsl
Offline
Joined: 2004-09-02

How many times have the test suite been run? 0.03MB/s difference to me seems minute, but for a complete analysis a measure of variances of the performance figures would be well appreciated.