Skip to main content

Time to start talking about some javac and VM changes?

23 replies [Last post]
tferguson
Offline
Joined: 2007-03-07
Points: 0

Sun has graciously opened the door to hacking on the entire Java stack to hordes of GPL-preferring developers. Is it time to start talking about some changes to javac and VM that would allow people trade off better performance in exchange for backward compatibility or an airtight security model (that is things that Sun holds dear, but some of the rest of us are willing to bend on)? Specifically I'm thinking of two things, but I think there are probably more.

First, array index checking. With this simple statement "All array accesses are checked at run time; an attempt to use an index that is less than zero or greater than or equal to the length of the array causes an ArrayIndexOutOfBoundsException to be thrown." (section 10.4 of JLS) Sun singlehandedly fueled one of the biggest issue in the Java vs. C/C++ religious wars. Why not let the user decide? Why not allow some other "pretty good" options like guard pages around arrays?

Second, when I first read about Generics, I was excited about not having to type all those annoying casts when dealing with the standard libraries. However what wasn't immediately obvious was that really the typing is the ONLY thing you save, because it inserts all the casts back in after the initial type-checking stage of the compilation. There's no good reason not to allow the user to tell the compiler to generate type-specific versions of the genericized class, and then to avoid the many casts. In order to make this work, you would need access to the source of the genericized classes at compile time, but in any case it could fall back on the old behavior.

Since these changes break assumptions in the JLS and other well-established standards, I think they would never be accepted into the normal JSR process. That doesn't mean they're not needed.

What do you think?

-t

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
josgood
Offline
Joined: 2003-06-18
Points: 0

Nice manners. I admire how you kept the debate about the issues and didn't resort to name calling and personal attacks.

jdavi
Offline
Joined: 2003-06-11
Points: 0

> > Sun has graciously opened the door to hacking on
> the
> > entire Java stack to hordes of GPL-preferring
> > developers. Is it time to start talking about
> some
>
> yes, they've opened the floodgates for every idiot
> with a compiler who thinks he can "improve" Java by
> making it worse.
>

> Your the prototype of everything that caused me to
> oppose Sun releasing Java under a license that would
> allow people to hack their own versions and release
> it.

Just don't worry, It is good that people can play with the code.

It will NOT harm Java at all, because if you change it, it will just NOT BE JAVA ANYMORE, you can only use the name for the code that passed the compatibility tests.

If some of the changes that the people came up with are worth the trouble they will find its way into the oficial release

e_arizon_benito
Offline
Joined: 2006-10-30
Points: 0

(dropped)

Message was edited by: e_arizon_benito

carcassi
Offline
Joined: 2003-10-29
Points: 0

>so it seems bounds-check-removal works pretty well today

Unfortunately, it's not that easy... for bounds-check-removal to work, the code in the inner loop must be inlined. So if you have:

for (int x = 0; ....) {
myobject.mymethod(x);
}

the array accesses within mymethod will not be inlined if mymethod is a virtual function. Also, if the mymethod body is too big, inlining won't work. Plus, a bunch of other details I won't go into...

If you write OO/HPC code, you still need to be very very careful on how you write your loops. I would be very curious to see what is the performance penalty I get from bounds check...

Gabriele

linuxhippy
Offline
Joined: 2004-01-07
Points: 0

> the array accesses within mymethod will not be
> inlined if mymethod is a virtual function.
As far as I know hotspot (both client and server) are able to inline virtual functions based on profiling, so I don't see a problem here with virtual functions. However I may be wrong of course...

lg Clemens

briand
Offline
Joined: 2005-07-11
Points: 0

HotSpot will inline virtual functions under certain conditions. The optimal
case is when there is only one concrete implementation of the target class
loaded. In this case, it will either make a direct call (one class loaded)
the only possible method. A slightly less optimal case is when there are
two concrete implementations of the target class loaded. In this case,
HotSpot will use a conditional to select the correct method to call. In
the general case, where there are N (>2) classes loaded, HotSpot will
make the call through a vtable. HotSpot is also able to detect when
new classes are loaded and will deoptimize and reoptimize any
compiled methods where such assumptions are no longer valid.

This set of optimizations seems to improve virtual call overhead for a
large class of benchmarks and real life applications.

Brian

linuxhippy
Offline
Joined: 2004-01-07
Points: 0

> HotSpot will inline virtual functions under certain
> conditions. The optimal
> case is when there is only one concrete
> implementation of the target class
> loaded. In this case, it will either make a direct
> call (one class loaded)
> the only possible method.

Does Hotspot insert a conditional jump even in the case where only one implementation is loaded to take care about an "uncommon trap"?
So if there's only a single implementation, how much more overhead does an optimized "virtual" call have compared to a true final-call?

Thanks, lg Clemens

jrose
Offline
Joined: 2004-11-29
Points: 0

> Does Hotspot insert a conditional jump even in the
> case where only one implementation is loaded to take
> care about an "uncommon trap"?
> So if there's only a single implementation, how much
> more overhead does an optimized "virtual" call have
> compared to a true final-call?

When one implementation is loaded the compiler optimistically assumes that no other implementation will be loaded, and emits no check. The "nmethod" will be be deoptimized later on if another implementation is loaded. There is no inner-loop cost for this; the cost occurs when the second implementation gets loaded. We call this the "dependencies" mechanism, and it complements the "uncommon trap" mechanism. Only the latter uses a run-time check.

Even if there is an explicit check (guarding an uncommon trap), it is profitable to inline the virtual method body, because it is highly likely that expressions in the inlined code can be optimized in the context of the caller. A parameter to the callee is often defined in the caller as a non-argument, which means there is a wealth of local type information in the caller which the inlined callee can access.

There is also some value in "exact typing", where the receiver is optimistically typed as a given class K exactly (and not of any subclass of K). The cost of an exact class check is loading the receiver's _klass field and comparing it with a constant. The benefit is that all call sites and type checks guarded by the exact class check can be totally devirtualized.

The value of inlining and subsequent folding is so great that the compiler will sometimes issue an optimistic class check to guard an inlined virtual method body, and then provide a backup execution path to all other methods (either via an uncommon trap or a plain vtable lookup). This decision is based on receiver type profiling.

A recent improvement to this is "bimorphic" inlining, where there are two principal types, and the compiler may choose to inline both implementations. (Again, this decision is based on profile feedback, not class hierarchy analysis. See doCall.cpp.)

Another recent improvement is cast profiling, which takes all those checkcasts inserted for generic code and turns them to a profit, by profiling cast-receiver types and exploiting them in the common case of a caller using a monomorphic container type (think List). A simple check, done as soon as an object is extracted from a container, enables devirtualization of all subsequent operations on that object. (Look at GraphKit::gen_subtype_check in the most recent sources and see uses of TypeProfileCasts.)

One area for future improvement (suggested by the contents of this thread) is better propagation of "a.length" values. If a class could be proven to make arrays of some stereotyped size (think 201_compress in JVM98 or a new int[256]), then that range information could be applied to the length field, and properly folded with index expressions which themselves are range-limited (think idx&0xFF, etc). The latter folding is already a routine optimization. The server compiler can perform an impressive amount of range inference; take a look at the code. (Start with classes TypeInt and AddINode.)

And there's always more to do. The best way to improve the system (IMO) is to find a benchmark or app. that matters, profile its hot loops, pore over their code, and decide how to make the code of a loop better. The answer is often an incremental improvement to compilation smarts. In this world of performance hacking, if you turn over a rock, you usually find something worth squashing. You just have to turn over rocks that really matter.

josgood
Offline
Joined: 2003-06-18
Points: 0

Yes, the javac and jvm could probably withstand some improvements. That's always true.

However, I wholly oppose making array bounds checking elimination (ABCE) programmer or runtime optional.

The correct solution is to rely on the jvm to safely, securely do ABCE.

Towards Array Bound Check Elimination in Java Virtual Machine Language
http://citeseer.ist.psu.edu/xi99towards.html

That's the "Java way".

To illustrate the difference..

.NET has stack based pseudo objects, so the programmer could choose stack or heap allocation. Nominally for performance reasons.

The correct solution is for the jvm to do escape analysis and choose to do stack allocations when possible. Safely and securely. This solution is also much faster. The jvm is smarter than you. Dynamic optimization beats static (or programmer) optimization.

.NET has unprotected code, allowing access to native structures like pointers. Nominally for performance reasons.

The correct solution is for the jvm to manage the heap and do garbage collection. Safely and securely. This solution is also much faster. The jvm is smarter than you. Dynamic optimization beats static (or programmer) optimization.

If you're concerned about performance, either programmer productivity or runtime speed, I encourage you to focus on the libraries and APIs. There's a whole lot of cruft and genuine stinkers left behind after years of experimentation and evolution.

mcekovic
Offline
Joined: 2004-03-04
Points: 0

+1 for escape analysis and stack allocation in JVM
+1 for dynamic vs static optimisation
-1 for .NET structs, they completely suck (introduce unnecessary complexity to the language with questionable performance gains - try using generics with struct as type parameters and you will f..k up the performance completely)
+1 for optimizations in APIs and class libraries (JVM is alrady pretty optimized, though stack allocation would not harm)
-1 and no, JVM is not smarter then me, but surely can help ;)

jwenting
Offline
Joined: 2003-12-02
Points: 0

> Sun has graciously opened the door to hacking on the
> entire Java stack to hordes of GPL-preferring
> developers. Is it time to start talking about some

yes, they've opened the floodgates for every idiot with a compiler who thinks he can "improve" Java by making it worse.

> changes to javac and VM that would allow people trade
> off better performance in exchange for backward
> compatibility or an airtight security model (that is

people like you it seems.

> First, array index checking. With this simple
> statement "All array accesses are checked at run
> time; an attempt to use an index that is less than
> zero or greater than or equal to the length of the
> array causes an ArrayIndexOutOfBoundsException to be
> thrown." (section 10.4 of JLS) Sun singlehandedly
> fueled one of the biggest issue in the Java vs. C/C++
> religious wars. Why not let the user decide? Why
> not allow some other "pretty good" options like guard
> pages around arrays?
>
No, those are NOT "pretty good" options. If you like C/C++ so much that you want to change Java to be them, use them instead.
One of the biggest problems with C and C++ is the lack of bounds checking on arrays, which causes probably the majority of security flaws in software released around the world and has for decades.

> Since these changes break assumptions in the JLS and
> other well-established standards, I think they would
> never be accepted into the normal JSR process. That
> doesn't mean they're not needed.
>
No, it means you'd need to break the JLS and therefore break Java itself.
It would no longer be Java, but C++.

> What do you think?
>
Your the prototype of everything that caused me to oppose Sun releasing Java under a license that would allow people to hack their own versions and release it.

pepe
Offline
Joined: 2003-06-10
Points: 0

+1 for the global feeling.
-1 for harshness, but i understand..

tferguson
Offline
Joined: 2007-03-07
Points: 0

ok, so I agree the Linpack numbers do make it look like the array bounds checking is less of an issue for most applications. what about casting? surely the compile time/memory/run time tradeoff there should be controllable?

-t

linuxhippy
Offline
Joined: 2004-01-07
Points: 0

> what about casting?
According to some documents from SUN and javagaming.org a cast typically is a single instruction, sometimes more in complex situations.
I agree that there maybe could be better solutions for generics, on the other side it was a tradeof that has prooven to work quite well.

If you really plan to use generic-code for computational-intensive code (e.g. encryption), you have to use objects and then your design is performance-wise broken anyway. The indirection added just by objects will weight much more than the cast.

lg Clemens

tferguson
Offline
Joined: 2007-03-07
Points: 0

> If you really plan to use generic-code for
> computational-intensive code (e.g. encryption),

isn't all server code "computational-intensive", because you run it on the minimal hardware that you think you can get away with? if I can flip a compiler switch and all of a sudden support the same number of users on 20% less hardware (admittedly probably with a little bit more memory), isn't that something to shoot for?

and its specifically object intensive code that i am talking about. i would actually go so far as to say most java applications fall outside the range of scientific computing, image processing or cryptographic calculation that you refer to. the libraries used to write web-servers, transaction processing systems and messaging servers create objects. it's just what they do.

-t

linuxhippy
Offline
Joined: 2004-01-07
Points: 0

> isn't all server code "computational-intensive",
> because you run it on the minimal hardware that you
> think you can get away with? if I can flip a compiler
> switch and all of a sudden support the same number of
> users on 20% less hardware (admittedly probably with
> a little bit more memory), isn't that something to
> shoot for?

Well but if we're talking about "general-purpose"-code you won't ever even notice performance penality introduced by the casting. This sort of code has many other problems / overheads to fight with, this small change won't likely bring a single percent under most situations.
However the code is public, so just go for it. Would be interresting who the changes you mentioned impact performance.

Good luck, Clemens

e_arizon_benito
Offline
Joined: 2006-10-30
Points: 0

I think Java security/stability must be the last thing to break. Some step by step (vs radical) changes could be a better option for me.

For example the ArrayIndex check speed problem could be approached with a sintax similar to:

int[byte] myArray = new int[byte];
int[int] myArray = new int[int];

meaning that the array would have a fixed size of 2^8 or 2^32 integers and could just be indexed with bytes or ints in each case with no need for bound checks.

Yes that could mean a lot of RAM (12 Gbytes for a simple int[int] array!!!), but RAM is cheap and will be cheaper, and of course much more energy-efficient and "trustable" than CPUs. And of course we will need just a new 'smallint' type 16 bits long to allow for more sensible 'int[smallint]' 64KB secure and fast arrays.

Just my two cents!

linuxhippy
Offline
Joined: 2004-01-07
Points: 0

I don't see the need for array-access improvements, which touches synthax or bytecode.
As you can see here: http://www.shudo.net/jit/perf/Linpack-1000-P4.png

Java-Server performs almost as fast as the fastest C-Compilers (~5% difference). Linpack is a benchmark which uses Arrays a lot, so it seems bounds-check-removal works pretty well today :-)

lg Clemens

e_arizon_benito
Offline
Joined: 2006-10-30
Points: 0

> I don't see the need for array-access improvements,
> which touches synthax or bytecode.
> As you can see here:
> http://www.shudo.net/jit/perf/Linpack-1000-P4.png
>
> Java-Server performs almost as fast as the fastest
> C-Compilers (~5% difference). Linpack is a benchmark
> which uses Arrays a lot, so it seems
> bounds-check-removal works pretty well today :-)
>
> lg Clemens

That's impossible from a mathematical point of view since obviosly while accessing arrays with no checking involves a single operation (retrieving the array from memory), while doing with checking involves 3+ operations (retreiving the array from memory, comparing with the lowest limit, comparing with the highest limit and cheking the comparation result). If the benchmark probed the opposite the benchmark is wrong obviosly, maybe because of cache side effects, or pipelining of operations on the CPU. Pipelining is possible in certain circumstances, but just making use of "complex" CPU and "watts" power compsumtion, and obviosly not aplicable to mobile devices with "light" CPUs (ARM for example) and so not a general (platform independent) result.

Notice also that the "draft solution" I mention is backward compliant since it adds to, but do not modify what is already working. It just need a little "extra effort" on the compiler to check that we are using a byte variable to access a "[byte]" array.

pepe
Offline
Joined: 2003-06-10
Points: 0

Impossible? Certainly not.
Hotspot can understand conditions when bound checking is not necessary. Having bounds checking does not imply that it is necessary to have access checked each time. For example, loops are tested for preconditions and if during a loop hotspot can be certain that there will be no AIOOB, it removes bounds checking. Access then becomes a single operation.
It is impossible if you think statically. Forget about removing at programmer level and if you feel you can do something, help hotspot removing those checks for more complicated cases.

If there were things to do to arrays, i would certainly vote for being access to arrays using a long index and not an int so that the 2 billion element limit can be broken.

e_arizon_benito
Offline
Joined: 2006-10-30
Points: 0

> Impossible? Certainly not.
> Hotspot can understand conditions when bound checking
> is not necessary. Having bounds checking does not
> imply that it is necessary to have access checked
> each time. For example, loops are tested for
> preconditions and if during a loop hotspot can be
> certain that there will be no AIOOB, it removes
> bounds checking. Access then becomes a single
> operation.
I agree many times HotSpot could be "smart enough" (I don't really known if it really is, but it could).
Some others it just can't. Take the next example (this is a real problem I had to code a few weeks ago):

I had to render a chart with daily per minute afluence/simultaneous afluence for "incomming calls". The data is read sequencially and provides just the Hour when the caller hangs and the call time. It means a pseudo code similar to:

int simultaneousCalls = new int ( 60 * 60 * 24 / secondsResolution );
while (isThereMoreData?)
"read HourAsSecondsFrom00:00:00, callSeconds";
for (int offSet=0; offSet int index = (HourAsSecondsFrom00:00:00 - offSet)
simultaneousCalls(index)++;
...

The problem here is that the array bound for each loop is determined by the offSet, that can't be calculated but just once the data is read from the "input stream". So the compiler can't help, neither the hotspot.

Still, since I know in advance that the time resolution was 4 seconds (21600 integers per day) I can sure a int["smallint"] would be enough to warranty no exceptions. That just would have meant 43936 extra bytes (2^16-21600) for the time live of the array (a few tens of second in this particular case).

The problems repeats in many other cases. Imagine a data-mining software that creates thousands of different arrays at run time to process data and takes hours or days . Dropping the bound checking would be a great help.

pepe
Offline
Joined: 2003-06-10
Points: 0

Sorry, but this is ridiculous. The cost of a virtual bounds checking isn't even epsilon in your example (where is the array, by the way?)

Hotspot IS smart enough remove bounds checking, just as it is to even completely remove loops if it can know the result without running the loop.

Data mining software or not, there can be two cases in the use of an array:
- sequential access
- random access
In first case, hotspot makes a nice work in making bounds check before loop and remove them if it can, which i think is most of the loops.
In other case, you NEED to make the test each time, otherwise your algorithm is just running in the wild.
Removing the checks is only asking for trouble whatever the case.

Moreover, working with smaller data for index access would only slow down. Working with data size smaller than what the CPU architecture uses forces to internal masks and decals, which makes acess much slower and would hide the cost for the test.

linuxhippy
Offline
Joined: 2004-01-07
Points: 0

Well if you want to see it happen you can start coding, all the needed source is already there.

Removing array-bounds-checks would break the ability of the JVM to guarantee that nobody could write to somewhere in memory - one of the reasons why software written in java has so few (coding) security flaws. Furthermore is HotSpot-Server since 1.4.0 able to remove bounds-checks or move them out of loops.
However I guess this change is more or less trivial ;)

Good luck, lg Clemens