connection reset / timeout under load
we have some web services deployed to glassfish 3.1. we are seeing that under load simulation, clients start receiving connection reset / connection timed out errors after what we think is a relatively small number of active clients. there are no errors in server.log, only errors on the client side. because of this, i'm a little lost when it comes to troubleshooting the problem.
the other confusing thing is that when i run the load test against a local GF instance running on the same box as the load simulater, things go fine. it's only when i make requests to the server on a different network. i can however simulate the problem by throttling the bandwidth on the local box.
some other possibly relevant information,
- GF is running on an amazon EC2 instance
- we've done little performance tuning. we've increased the heap, perm memory settings, and changed the max threads to 32 (4-processor system).
- the web services a serlvets, and use the servlet 3.0 async API.
- web services using JPA to read / modify data in an in-memory apache derby database.
- the clients are performing various CRUD operations, as well as long-polling (the WS in this case is making use of servlet 3 async to handle this)
as i mentioned, due to the fact that there are no errors on the server itself, i'm unclear how to troubleshoot this problem. any pointers on how to proceed would be useful. in addition, if there are any particular performance tuning parameters we should be looking at, please do suggest.