Posted by kohsuke
on February 2, 2007 at 10:41 AM PST
In this post I'm going to talk about the details of the benchmark Bharath did for the JAX-WS RI 2.1. For more about the JAX-WS RI 2.1 release in general, please refer to Vivek's post.
In this post I'm going to talk about the details of the benchmark Bharath did (kudos to him and the rest of the performance team .) For more about the JAX-WS RI 2.1 release in general, please refer to Vivek's post .
The basic idea of the benchmark is to have a lot of clients send a lot of requests to the server concurrently. The server echos back the data to the client, and then we measure how many requests a server is processing. The test is repeated with 15 different test payload; echo(Void|Integer|String|Date|Struct) tests a small payload, where the data is just one int, string, etc. echoSyntetic...K is the binary payload, where 1K,4K,8K,12K represents the size of the binary. Finally, echoArray and echoOrder tests have a significantly larger payload.
The picture on the right shows the summary. "Number of requests processed per second" is normalized so that you can always see Axis2 as 100%. Depending on the data point, you see that the JAX-WS RI is 30% to 100% faster.
On smaller payloads, such as echoString, echoInteger, etc, you tend to see a larger difference. This is because (relatively speaking) the weight of the databinding is small, and so the test reveals the true difference in the web service layer proper. On larger payloads, the time tends to be spent more on the databinding side, so the difference tends to become small.
The client machine is SunFire x4600 with 15.5 GB of memory and 8 Opteron CPUs. It runs JDK 1.5.0_10-b03 on Solaris 10. We used this monster just to make sure that we have enough clients to keep the server busy all the time. We run total of 32 threads on this machine, each uses JAX-WS to send a SOAP request to the server as fast as possible. We verified that the server CPU was fully saturated.
The server machine is another SunFire x4600. It has the same amount of memory, same OS, same JDK, except that there are only 4 CPUs. We used Glassfish v2 milestone 4 as the container, with -server -Xms2g -Xmx2g as the JVM option. Glassfish is a JavaEE 5 container, which includes StAX. So this means we are using its StAX implementation, SJSXP .
We tried Axis2 1.1.1 with XMLBeans and JAX-WS RI 2.1 with JAXB. We tried to use Axis Data Binding first, but we noticed that under a high load it fails with what seemingly like a concurrency related data corruption. So we decided to move on to XMLBeans, which is listed next to ADB in their quick start guide . We'll see if we can figure out what's going on with Axis+ADB in the future. It could be a Glassfish problem, who knows.
Each test was run for 2 minutes. The first 1 minute is just for the warm-up time, and the measurement only considers the 2nd minute. So one complete test run takes 2 minutes x 15 tests x 2 toolkits = 1 hour. Our harness runs this 4 times, and throw away the first two runs as additional warm-up. The data shown below were the result of the last 2 runs (out of 4.)
We plan to make the benchmark code available on java.net , so stay tuned.
The raw numbers are as shown on the right:
The 'TPS' stands for 'transactions per second' and this represents the number of requests that was processed per second. The 'stddev' is the standard deviation between different runs, so you can use that to see how many digits of TPS you can trust.
We've been working on this for a long time now, and coincidentally another group posted another web service stack benchmark just a few days ago. While we only had very limited time to look at it, we noticed that their benchmark, despite being run on 4-way Xeon system, records roughly around 3,000 reqs/sec (for example on echoVoid test.) Our benchmark recorded more than 10,000 reqs/sec, even for Axis2, on a 4-way Opteron system. While one cannot really compare reqs/sec on different systems in a meaningful way, we nevertheless wonder if their Xeon system could have done much better than 3,000 reqs/sec.
It's also clear we've got more work to do here. We need to get to the bottom of the ADB issue for one thing. We also want to test the scalability of these stacks.
In the end, however, what really matter to you is your own appliation with your own data. So we want you to compare the toolkits by yourself, with your own use cases, and let us know what your findings are.