Skip to main content

Inconsistent run-times of micro-benchmark.

10 replies [Last post]
linuxhippy
Offline
Joined: 2004-01-07
Points: 0

Hello,

I've written a small micro-benchmark to test the performance of the PC1-Cipher we use heavily in our applications.

The strange thing is that it seems like sometimes the benchmark runs a lot faster and sometimes slower.
The benchmark was executed using the server compiler and mustang-b104 on Linux-2.6.19, on a P4-Nothwood 2.6ghz.
I disabled power-management features and I don't think this is the problem here because if its fast it stays fast even if the CPU becoms hot.
Another funny thing is that when it runs slow almost everytime the second timing was the fastest.

Any ideas what could lead to this strange results?

<br />
1st Run<br />
Normal way took: 626<br />
Normal way took: 386<br />
Normal way took: 424<br />
Normal way took: 416</p>
<p>2nd Run<br />
Normal way took: 374<br />
Normal way took: 272<br />
Normal way took: 257<br />
Normal way took: 249</p>
<p>3rd Run<br />
Normal way took: 665<br />
Normal way took: 455<br />
Normal way took: 434<br />
Normal way took: 413</p>
<p>4th Run<br />
Normal way took: 418<br />
Normal way took: 274<br />
Normal way took: 265<br />
Normal way took: 249<br />
Normal way took: 259<br />
Normal way took: 260<br />
Normal way took: 258</p>
<p>5. Run:<br />
Normal way took: 634<br />
Normal way took: 391<br />
Normal way took: 432</p>
<p>6. Run:<br />
Normal way took: 628<br />
Normal way took: 472<br />
Normal way took: 479</p>
<p>7. Run:<br />
Normal way took: 677<br />
Normal way took: 449<br />
Normal way took: 415</p>
<p>8. Run:<br />
Normal way took: 616<br />
Normal way took: 473</p>
<p>9. Run:<br />
Normal way took: 692<br />
Normal way took: 460<br />
Normal way took: 439</p>
<p>10. Run<br />
Normal way took: 457<br />
Normal way took: 274<br />
Normal way took: 258</p>
<p>11. Run:<br />
Normal way took: 677<br />
Normal way took: 461<br />
Normal way took: 439</p>
<p>12. Run:<br />
Normal way took: 359<br />
Normal way took: 274<br />
Normal way took: 258<br />

This is the code which lead to these strange results:

<br />
package com.agosys.unicom.encryption;</p>
<p>import java.util.*;</p>
<p>public class PC1_Stream<br />
{<br />
	private int si, x1a2;<br />
	final private int x1a0[] = new int[8];<br />
	private boolean usedAsEncrypter = false;<br />
	private boolean alreadyUsed = false;</p>
<p>	private byte cle_orig[] = new byte[17];</p>
<p>	private byte cle0 = 0;<br />
	private byte cle1 = 0;<br />
	private byte cle2 = 0;<br />
	private byte cle3 = 0;<br />
	private byte cle4 = 0;<br />
	private byte cle5 = 0;<br />
	private byte cle6 = 0;<br />
	private byte cle7 = 0;<br />
	private byte cle8 = 0;<br />
	private byte cle9 = 0;<br />
	private byte cle10 = 0;<br />
	private byte cle11 = 0;<br />
	private byte cle12 = 0;<br />
	private byte cle13 = 0;<br />
	private byte cle14 = 0;<br />
	private byte cle15 = 0;</p>
<p>	/**<br />
	 * Creates a PC1_InputStream. Decodes an input stream of encoded data.<br />
	 * @param in  The input stream to decode.<br />
	 * @param password 16 byte encryption key<br />
	 */</p>
<p>	public PC1_Stream(byte[] password)<br />
	{<br />
		for(int i=0; i < Math.min(16, password.length); i++)<br />
		{<br />
			cle_orig[i] = password[i];<br />
		}</p>
<p>		setCleVariables(cle_orig);<br />
	}</p>
<p>	protected void setCleVariables(byte[] array)<br />
	{<br />
		cle0 = array[0];<br />
		cle1 = array[1];<br />
		cle2 = array[2];<br />
		cle3 = array[3];<br />
		cle4 = array[4];<br />
		cle5 = array[5];<br />
		cle6 = array[6];<br />
		cle7 = array[7];<br />
		cle8 = array[8];<br />
		cle9 = array[9];<br />
		cle10 = array[10];<br />
		cle11 = array[11];<br />
		cle12 = array[12];<br />
		cle13 = array[13];<br />
		cle14 = array[14];<br />
		cle15 = array[15];<br />
	}</p>
<p>	public final void reset()<br />
	{<br />
		si = 0;<br />
		x1a2 = 0;</p>
<p>		for (int i = 0; i < x1a0.length; i++)<br />
		{<br />
			x1a0[i] = 0;<br />
		}</p>
<p>		setCleVariables(cle_orig);<br />
	}</p>
<p>	private final int assemble()<br />
	{<br />
		x1a0[0] = ((cle0 * 256) + cle1);</p>
<p>		int inter = code(0);</p>
<p>		x1a0[1] = (x1a0[0] ^ ((cle2 * 256) + cle3));<br />
		inter = inter ^ code(1);</p>
<p>		x1a0[2] = (x1a0[1] ^ ((cle4 * 256) + cle5));<br />
		inter = inter ^ code(2);</p>
<p>		x1a0[3] = (x1a0[2] ^ ((cle6 * 256) + cle7));<br />
		inter = inter ^ code(3);</p>
<p>		x1a0[4] = (x1a0[3] ^ ((cle8 * 256) + cle9));<br />
		inter = inter ^ code(4);</p>
<p>		x1a0[5] = (x1a0[4] ^ ((cle10 * 256) + cle11));<br />
		inter = inter ^ code(5);</p>
<p>		x1a0[6] = (x1a0[5] ^ ((cle12 * 256) + cle13));<br />
		inter = inter ^ code(6);</p>
<p>		x1a0[7] = (x1a0[6] ^ ((cle14 * 256) + cle15));<br />
		inter = inter ^ code(7);</p>
<p>		return inter;<br />
	}</p>
<p>	private final int code(final int i)<br />
	{<br />
		int ax = x1a0[i];<br />
		int dx = (x1a2 + i);<br />
		int cx = 0x015a;<br />
		int si_local = si;<br />
		int tmp;</p>
<p>		tmp = ax;            //tmp == ax<br />
		ax = si_local;      // ax=si_local, tmp = ax<br />
		si_local = tmp;     //si_local = tmp</p>
<p>		tmp = ax;<br />
		ax = dx;<br />
		dx = tmp;</p>
<p>		if (ax != 0)<br />
		{<br />
			ax = (ax * 0x4e35);<br />
		}</p>
<p>		tmp = ax;<br />
		ax = cx;<br />
		cx = tmp;</p>
<p>		if (ax != 0)<br />
		{<br />
			ax = (ax * si_local);<br />
			cx = (ax + cx);<br />
		}</p>
<p>		tmp = ax;<br />
		ax = si_local;<br />
		si_local = tmp;</p>
<p>		ax *=  0x4e35;<br />
		dx = (cx + dx);<br />
		ax++;</p>
<p>		x1a2 = dx;<br />
		this.si = si_local;<br />
		x1a0[i] = ax;</p>
<p>		return (ax ^ dx);<br />
	}</p>
<p>	/**<br />
	 * Returns a plain byte, which has been unencrypted from the underlying<br />
	 * InputStream.<br />
	 * @see java.io.FilterInputStream<br />
	 */</p>
<p>	public final byte[] decrypt(byte[] encryptedData)<br />
	{<br />
		return decrypt(encryptedData, 0, encryptedData.length);<br />
	}</p>
<p>	public final byte[] decrypt(byte[] encryptedData, int offset, int lenght)<br />
	{<br />
		checkUsage(false);</p>
<p>		for (int i = offset; i < (lenght + offset); i++)<br />
		{<br />
			encryptedData[i] = decryptSingleByte(encryptedData[i]);<br />
		}</p>
<p>		return encryptedData;<br />
	}</p>
<p>	public final byte decryptSingleByte(int plainByte)<br />
	{<br />
		int inter = assemble();<br />
		int cfc = (inter >> 8);<br />
		int cfd = (inter & 255);</p>
<p>		plainByte = plainByte ^ (cfc ^ cfd);<br />
		xorCLEWithC(plainByte);</p>
<p>		return (byte) plainByte;<br />
	}</p>
<p>	private final void xorCLEWithC(int c)<br />
	{<br />
		cle0 ^= c;<br />
		cle1 ^= c;<br />
		cle2 ^= c;<br />
		cle3 ^= c;<br />
		cle4 ^= c;<br />
		cle5 ^= c;<br />
		cle6 ^= c;<br />
		cle7 ^= c;<br />
		cle8 ^= c;<br />
		cle9 ^= c;<br />
		cle10 ^= c;<br />
		cle11 ^= c;<br />
		cle12 ^= c;<br />
		cle13 ^= c;<br />
		cle14 ^= c;<br />
		cle15 ^= c;<br />
	}</p>
<p>	public final byte[] encrypt(byte[] encryptedData)<br />
	{<br />
		return encrypt(encryptedData, 0, encryptedData.length);<br />
	}</p>
<p>	public final byte[] encrypt(byte[] data, int offset, int lenght)<br />
	{<br />
		checkUsage(true);</p>
<p>		for (int i = offset; i < (lenght + offset); i++)<br />
		{<br />
			data[i] = encryptSingleByte(data[i]);<br />
		}</p>
<p>		return data;<br />
	}</p>
<p>	public final byte encryptSingleByte(int plainByte)<br />
	{<br />
		int inter = assemble();<br />
		int cfc = (inter >> 8);<br />
		int cfd = (inter & 255);</p>
<p>		xorCLEWithC(plainByte);<br />
		plainByte = plainByte ^ (cfc ^ cfd);</p>
<p>		return (byte) plainByte;<br />
	}</p>
<p>	private void checkUsage(boolean isEncrypter)<br />
	{<br />
		if (alreadyUsed)<br />
		{<br />
			if (usedAsEncrypter != isEncrypter)<br />
			{<br />
				throw new IllegalArgumentException("You may either use this class as encrypter or decrypter, not both!");<br />
			}<br />
		} else<br />
		{<br />
			alreadyUsed = true;<br />
			usedAsEncrypter = isEncrypter;<br />
		}<br />
	}</p>
<p>	public static void main(String[] args)<br />
	{<br />
		byte[] testData = new byte[1000000];<br />
		new Random().nextBytes(testData);<br />
		//byte[] origData = (byte[]) testData.clone();</p>
<p>		byte[] passwd = new byte[128];<br />
		new Random().nextBytes(passwd);</p>
<p>		for (int i = 0; i < 50; i++)<br />
		{<br />
			long start = System.currentTimeMillis();<br />
			PC1_Stream dec = new PC1_Stream(passwd);<br />
			PC1_Stream enc = new PC1_Stream(passwd);</p>
<p>			enc.encrypt(testData);<br />
			dec.decrypt(testData);</p>
<p>			long end = System.currentTimeMillis();<br />
			System.out.println("Normal way took: " + (end - start));<br />
		}</p>
<p>		System.out.println(testData[0]);<br />
	}</p>
<p>	private static void encryptSlow(byte[] data, byte[] toData, PC1_Stream inCipher)<br />
	{<br />
		byte[] tmp = new byte [1];</p>
<p>		for(int i=0; i < data.length; i++)<br />
		{<br />
			tmp[0] = data[i];<br />
			inCipher.decrypt(tmp);<br />
			toData[i] = tmp[0];<br />
		}<br />
	}<br />
}</p>
<p>

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
briand
Offline
Joined: 2005-07-11
Points: 0

I think you are misinterpreting the problem. It's obvious that the first number in a run is a result of mixed interpreted and compiled code (and running with -XX:+PrintCompilation makes that very clear). I think the issue here is more that two completely different runs can result in rather large differences in performance of the compiled methods.

We've seen similar issues to this with various scimark sub-benchmarks and they are typically cache related.

Brian

mcekovic
Offline
Joined: 2004-03-04
Points: 0

Yes, probably cache in conjuntion with hyper-threading on P4 (is you P4 HT enabled?).

linuxhippy
Offline
Joined: 2004-01-07
Points: 0

my P4 is a Northwood (does not support HT).
However on my new Core2Duo-T7200 notebook the benchmark is 2 times faster and times are consistent. (By the way there's a 2x slowdown when using the client-compiler on C2D, whereas for the P4 it was about 30-50%),

lg Clemens

linuxhippy
Offline
Joined: 2004-01-07
Points: 0

It would be just interesting to see if other users also see these numbers. It would be great if somebody could just grab the code, compile it and run it with mustang's server jvm.

Thank you in advance, lg Clemens

spdenne
Offline
Joined: 2006-02-15
Points: 0

Can you edit your post to not use i as a variable? It tends to turn things into italics when used as an array index.
I think I guessed where you wanted them, and consistently get this result:

Normal way took: 261
Normal way took: 210
Normal way took: 210
Normal way took: 211
Normal way took: 220
Normal way took: 210
Normal way took: 210

Edited post, as the code I copied and pasted didn't format nicely, and some end of line comments ended up swallowing the folling lines. Also added the word "consistently" to clarify.

spdenne
Offline
Joined: 2006-02-15
Points: 0

Windows XP, 1.8GHz Pentium M

java version "1.6.0"
Java(TM) SE Runtime Environment (build 1.6.0-b105)
Java HotSpot(TM) Server VM (build 1.6.0-b105, mixed mode)

linuxhippy
Offline
Joined: 2004-01-07
Points: 0

Thanks for testing.

Strange, I see these inconsistent times all the time - I don't have any power saving enabled nore other programs that could influence the runtime.

Maybe this is related to Linux or the P4 processor?
I have to say I even saw this on two different Linux kernels.

briand
Offline
Joined: 2005-07-11
Points: 0

I don't know for sure what is causing this variability from run to run, but I suspect that it's a cache related issue.

bethere
Offline
Joined: 2007-01-09
Points: 0

Actually this is a JIT/hotspot thing (I would not refer to it as an issue) which is in fact a sort of a cache.

Once you run it in Mixed mode (default) classes are compiled and loaded into memory, consecutive executions do not need to re-compile and only use the already compiled, linked and loaded parts of the code.

I ran in two different ways, please note the difference

Mixed Mode:
java -showversion -cp . PC1_Stream
java version "1.4.2.09"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2.09-050713-03:34)
Java HotSpot(TM) Server VM (build 1.4.2 1.4.2.09-050713-05:46-PA_RISC2.0 PA2.0 (aCC_AP), [u]mixed mode[/u])

Normal way took: 1138
Normal way took: 623
Normal way took: 621
Normal way took: 622

Interpreter mode (an overloaded PA-RISC box):
java -showversion -Xint -cp . PC1_Stream
java version "1.4.2.09"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2.09-050713-03:34)
Java HotSpot(TM) Server VM (build 1.4.2 1.4.2.09-050713-05:46-PA_RISC2.0 PA2.0 (aCC_AP), [u]interpreted mode[/u])

Normal way took: 27840
Normal way took: 27872
Normal way took: 27893

Hope that helps

linuxhippy
Offline
Joined: 2004-01-07
Points: 0

not really, since this does not explain why some runs stay slower than others by a factor of 2.