Skip to main content

EA based lock removal in Mustang-b63!

10 replies [Last post]
linuxhippy
Offline
Joined: 2004-01-07
Points: 0

Hi there,

just found that Mustang-b63 now supports lock removal thanks to EA. Wow one feature I was waiting for since a long time.
Great to have it now, thanks a lot.

However there's one point which is not quite clear to me:
> As part of this change,
> the bytecode escape estimator was added. Without it,
> almost no opportunities for
> this optimization are detected.
What is a bytecode escape estimator? Is it some kind of optimistic compilation which could force hotspot under some circumstances to recompile or something cheaper?

lg Clemens

Message was edited by: linuxhippy

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
mayhem
Offline
Joined: 2003-06-11
Points: 0

Of course, it's understandable that you don't have enough time/resources to implement stack allocation for the Mustang release. But still, IMHO it's probably the most important optimization missing in Hotspot to make Java performance competitive with C/C++ in certain areas (for example scientific and game programming). Another potential benefit of stack allocation is improved performance of iterators and "closure expressions", which can also aid implementation of other, more functionally oriented programming languages in the JVM.

mthornton
Offline
Joined: 2003-06-10
Points: 0

> I've tuned very large server applications
> and it's rare that GC time exceeds 5% of
> overall run time. If you go back over history

I have an application where having GC exceed 70% of the time is common. If I don't set -Xms to an appropriate value before running the task it is very slow indeed. On the other hand having to high a minimum heap size also causes problems. In the example I ran earlier today the task creates a graph containing over 3 million links and almost 3 million nodes plus assorted hash tables of similar size. Heap usage is around 400MB. The time taken to set up this structure is strongly dependent on the minimum heap size. It then takes a mere 20 seconds to solve the problem of interest.

Then for the next task I have to specify a smaller heap (250MB max) or there won't be room (lack of address space) for data allocated by JNI code.

mayhem
Offline
Joined: 2003-06-11
Points: 0

Steve,

Can you tell us something about the plans for implementing stack allocation of non-escaping objects? Is this something that is planned for the Mustang release?

steved
Offline
Joined: 2005-11-30
Points: 0

No, the stack allocation of non-escaping objects is planned for the release after Mustang. Depending on the performance benefit it gives and the size and complexity of the changes required, it could be eventually backported to a future
Mustang update release, but there are no current plans to do so.

Steve

mthornton
Offline
Joined: 2003-06-10
Points: 0

How do you fairly measure the performance benefits when people are currently writing code which is distorted by the absence of stack allocation? There are methods in the Java api which wouldn't exist if Java had stack allocation --- those where Rectangle's are passed as arguments so that one doesn't have to be allocated to return the result.

Benchmarks are hard to interpret and current real code often reflects the absence of this capability.

linuxhippy
Offline
Joined: 2004-01-07
Points: 0

I also think it would be really worth to put this into mustang - It will be complex so it won't ship with an update release.

I see many java developers breaking their head keeping the heap free and not stressing the garbage-collector with methods recycling container classes and Pools.
I know Sun told us many times how super fast the best-since-even Hotspot GC is gut it isn't simply true.
Allocation is still a performance problem, especially when it comes down to larger servers.

Please implement it...

tmarble
Offline
Joined: 2003-08-22
Points: 0

mthornton & linuxhippy:

There are a couple of issues that you bring up.

1. Coding for Performance
While we advocate taking advantage of new API's
for performance (e.g. NIO, concurrency utilities)
we do not advocate writing specialized code to
try to "guess" what optimizations the JVM will make
and how. For starters it's just too hard.
Secondly if you did an optimization that did work
one time, it would probably break (i.e. not
provide the performance benefit) on any other release.
That's why we recommend sticking to clean, pure
object-oriented code for performance.
Why? Because that way the JVM can do it's job
with the clearest expression of the algorithm --
and that will not change from release to release.
As an added benefit the code will be easier
to maintain. So it's not fair to say that
developers write code which is distored by
the lack of XYZ optimization.

2. Shipping a product that is a platform
Our primary goal is reliability. The entire
organization and infrastructure around producing
the JVM is a large and very professional.
We are just about to ship Mustang beta and
so now we in a phase of testing.
Will new features be added to Mustang before
final release? It's quite possible, but
realize at this point in the release
we are aggressively analyzing reliability risk,
complexity to implement changes, time for
testing, etc. etc. It is just not simple to
say "go implement a new optimization,
test the performance impact, and add it to
the release cycle". It's not impossible, but
as we fully intend to hit our release target
dates it becomes an issue of triage.

One thing that we have done better and better
with each release train is add performance
improvements in update releases.

3. The performance of allocation
Yes, this is a hot button. Nevertheless our
recommendation is try to avoid second guessing GC
by creating your own object caches and pools
(except for extraordinarily large or complex
objects such as DB accessors).
I've tuned very large server applications
and it's rare that GC time exceeds 5% of
overall run time. If you go back over history
you realize that these sorts of applications
used to be dominated by GC. Is it worth
tying yourself into a pretzel, tilting your
head and squinting while you write clever
allocation code to maybe save percent or two?
Nope. Better to write pure OO code and let
GC do it's job. Would we like to say that
Ergonomics can do everything? Sure, that'd be
great, but we have a lot more work to do.
So while we continue to expand Ergonomics
you need to do command line heap and GC tuning.
If you write clean code and try different collectors
and tune them you will get much better performance
in the metric that really counts:
time to solve your business problem!

Regards,

--Tom

mthornton
Offline
Joined: 2003-06-10
Points: 0

> So it's not fair to say that
> developers write code which is distored by
> the lack of XYZ optimization.

Then why did someone implement Component.getLocation(Point rv) and similar methods? I understand why it is preferable to avoid code like this, but nevertheless it has been written and continues to be written. Especially with examples like this in the Java libraries. There are also numerous developers avoiding iterators because of the perceived performance cost of creating them. Most of them are probably mistaken to do this, but a small number may be justified.

Some of us also write code which pushes the achievable performance to the limit, and so we sometimes have to do things like this to get acceptable results on the current JVM. I accept that I will have to go back and revisit some of these 'optimisations' when JVM render them redundant (or even counter productive).

As to whether it should be in Mustang, I completely understand the problems. My wish is for it to be implemented as soon as possible, BUT no earlier.

The jScience api is an example of a third party api which has been significantly affected by performance issues and where stack allocation of (non escaping) objects would be very relevant.

http://www.jscience.org/

Message was edited by: mthornton

steved
Offline
Joined: 2005-11-30
Points: 0

>
> What is a bytecode escape estimator? Is it some kind
> of optimistic compilation which could force hotspot
> under some circumstances to recompile or something
> cheaper?
>

The bytecode escape estimator (BCE) scans the bytecodes of called methods to determine whether the method might cause any arguments to escape. It produces a conservative estimate of which arguments escape.

Without the BCE, we must assume that any objects passed to a called method escape. In practice, this eliminated almost all opportunities for lock elimination.

Steve Dever

linuxhippy
Offline
Joined: 2004-01-07
Points: 0

Thanks a lot for explaining it, so its not as mystical as I first thought ... but sure needed.

Thanks a lot for this enhancement, it will make Mustang a really unique release.
Maybe stack allocation could be included too which would in make conjunction with lock-removal everybody happy :)
However I do not want to leave a thankless mark thanks a lot for all improvements made till tiger ;)

Thanks, lg Clemens