Skip to main content

Add SIMD, or streaming extensions, at least to JVM

7 replies [Last post]
doronrajwan
Offline
Joined: 2005-06-30
Points: 0

On (almost?) all modern processors there is some form of SIMD support. This includes Intel(SSE), AMD(3DNow!), PPC/CELL(AltiVec) and so. Somehow all implementations supports the same data format: 128 bit SIMD4, which means operations on 4 operands at a time, when the total width is 128 bits.

Optimizing code for such machines from scalar code is something that can be done better off-line, not JIT. These optimizations tends to be complex. For example, see: http://portal.acm.org/citation.cfm?id=996853

Having byte-codes that will directly support SIMD, will allow to implement such optimizations at compile time, and to have better performance, specifically, but not limited to, streaming and media applications.

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
jarouch
Offline
Joined: 2004-03-04
Points: 0
doronrajwan
Offline
Joined: 2005-06-30
Points: 0

Adding libraries is not a replacement for good optimizations; at most, it can be an addition.

Doing such optimizations in HotSpot is the right way to go, of course. However, since the optimizations tends to be quite complex to evaluate (read the article) some of the analysis can be done off-line.

Do not think of a desktop running "server" JIT as the only option. Most desktop Java clients today support only the “client” JIT, which does not have the ability for even far more trivial optimizations. I can think of future handhelds devices that have the SIMD capacity, can stream video, process it in real-time, but lake the ability for such optimizations.

Using specific byte-codes it one alternative. The other can be to give some "hints" for the hot-spot, which can reduce the amount of calculations needed. [b] We are doing that for the byte-code verifier, right? Why not doing the same for the code generation? [/b]

mthornton
Offline
Joined: 2003-06-10
Points: 0

A library provides a default implementation which can be partly or completely replaced by the JIT compiler. Any optimization that could be done via new byte codes could also be done if the operations were expressed by library method calls. There are already a number of places in Java where methods are replaced at run time by 'intrinsic' versions. Both Math and I think java.nio get this treatment.

So, in my opinion, the right approach is to define the common/wanted SIMD operations as a library and then get the JIT wizards to perform some magic on it.

alanb
Offline
Joined: 2005-08-08
Points: 0

The transcript of this chat session might be of interest:
http://java.sun.com/developer/community/chat/JavaLive/2005/jl0315.html
Also this section of the 1.4.2 performance white paper might be interesting:
http://java.sun.com/j2se/1.4.2/1.4.2_whitepaper.html#7

alexlamsl
Offline
Joined: 2004-09-02
Points: 0

Why don't we have HotSpot to do the hard work instead? Adding byte-code doesn't sound too reassuring.

jarouch
Offline
Joined: 2004-03-04
Points: 0

I don't think that special bytecodes are a good way to add this functionality, but what about relatively simple library, something like SIMDMath, working over proper (float[]s/FloatBuffers/FloatVector/...) datastructures and allowing us to use more of the CPU power? CPUs are still evolving and it is easier to add new methods to the library then to teach HotSpot new tricks.

sjasja
Offline
Joined: 2004-08-15
Points: 0

Hotspot optimization also has an edge over static optimization: Hotspot can detect at runtime which classes are subclassed and which methods are overridden. This allows Hotspot to inline "virtual" functions (an optimization that is between hard and impossible for static optimization). And inlining exposes opportunities for further optimization.

E.g. a simple getter function is optimized into a memory access instruction, which inside a loop can be optimized into a SIMD instruction. This can't be done if SIMD optimization is done at byte code level, since "javac" can't possibly know whether the getter is overridden by a subclass. And no need to do ugly counter-OO hacks like "final"ing your classes and methods in the hope that you'll get better optimization.