Skip to main content

Vector API for Java!

9 replies [Last post]
dtrehas
Offline
Joined: 2004-03-15
Points: 0

It would be nice to see vector API that will utilize
the SIMD capabilities of modern processors.

I think that is significant for Java because many multimedia programs need to be optimized (And this is not the real case for now)

And hey, VM is not a magician. Only a SIMD api would give right tips for optimizations.

Maybe a carefully design API would benefit Java for the STI Cell processor market.

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
dtrehas
Offline
Joined: 2004-03-15
Points: 0

Do you have any news for JDK 7 ? ;)

dtrehas
Offline
Joined: 2004-03-15
Points: 0

It could be useful to incorporate some annotation hinting for bytecode pattern recognition from VM?

alexlamsl
Offline
Joined: 2004-09-02
Points: 0

I think getting Java programmers to worry about platform specific details like these might not be a good idea.

There should be interface higher up the heirarchy for these things to "get into Java", so to speak; for instance, to use SIMD technology when rendering graphics, one should use the corresponding Java API for graphics, which in turn will have native code for SIMD if it helps performance.

campbell
Offline
Joined: 2003-06-24
Points: 0

I have been doing some experimentation with rewriting our low-level rendering loops in pure Java code. There are many cases in our native code where we take advantage of SIMD instruction sets (VIS, MMX) to improve rendering performance, so if we moved those loops up to Java it would be nice if we had some way to express vector instructions in a cross-architecture manner. The various SIMD instruction sets are too different to provide lots of low-level ops at the Java level, but there are plenty of high-level operations (e.g. bulk add, multiply, linear interpolate) that might be expressible in such a way.

It's still very early but I'm hoping to work more with the HotSpot team in the Dolphin timeframe to see if these ideas are feasible. I would agree with alexlamsl that it's not clear yet whether they will be generally useful Java developers. Perhaps we could try something private first, and if it works for Java2D and there is enough demand, maybe we could open it up as a public API. But again, first we need to see if these ideas will fly. (I'd love to get rid of a lot of our native rendering code and rely on HotSpot whereever possible, but it's a big project...)

Thanks,
Chris
Java 2D Team

jwenting
Offline
Joined: 2003-12-02
Points: 0

Java is NOT about hardware specific stuff, which is what you're after.

If you want a highlevel API for vector mathematics, you can write it now (and there almost certainly will be one or more available).
If you want it to have direct hardware access for performance you're going to have to use JNI and this should not change.

dog
Offline
Joined: 2003-08-22
Points: 0

> If you want it to have direct hardware access for
> performance you're going to have to use JNI and this
> should not change.

Why is this so?

Why not include a higher level abstraction to allow high performance code to take advantage of SIMD or other parallel processing features of the CPU when these are available?

Java JIT compilers should be able to take advantage of the full range of the instruction sets available to them to make performance improvements. As high level programmers we should have some way of hinting to the JIT that this code is a good candidate for such an optimization (if the optimization is available).

An example of slow code (in modern processors):

for (int i = 0; i < array.length; i++) {
a[i]++;
}

A SIMD capable version would be:

array++; // increments all elements of the array in
// parallel

(Disclaimer: exact syntax TBD.. this is just so you get the idea)

Adding things to make more powerful implementations possible is a good thing.

Imagine:
- Exploiting parallelism to search items in arrays
- Exploiting parallelism to modify images
- Exploiting parallelism to sort items in arrays
- Exploiting parallelism when searching/converting/manipulating strings

These things can make significant differences in your apps because you use them all the time!!

prunge
Offline
Joined: 2004-05-06
Points: 0

> An example of slow code (in modern processors):
>
> for (int i = 0; i < array.length; i++) {
> a[ i ]++;
> }
>
> A SIMD capable version would be:
>
> array++; // increments all elements of the array in
> // parallel

Why add a syntax like this when we can have a method like Arrays.increment(array)? Couldn't it be like System.arraycopy now, it goes to native code which takes advantage of the processor to make an efficient copy.

It would be useful to have maybe an 'ArrayOps' class or add to the java.util.Arrays class static methods to perform (potentially) parallel operations on arrays or parts of them.

dog
Offline
Joined: 2003-08-22
Points: 0

I have no preference with syntax.

Only that it be:
- low level enough to allow it to effectively optimize and serve as a replacement for normal arrays
- that it allow a set of basic opeartions on "arrays" of data..

I suggested increment as a arraywise operation.. but I'm sure the parallel processing literature would suggest more basic operations that are universally useful.

I also wonder if a map/reduce type functionality is useful over arrays for this sort of stuff.

defrector
Offline
Joined: 2003-07-13
Points: 0

I think dog might be onto something.

A foreach loop might be necessary since a state-check for() loop's iterator is causal of the previous loop while foreach is independent and 'truly parallel' give or take. There needs to be a trust by the compiler/VM/whatever that each loop has the same operations, and they don't mess with anything that the other parallel threads are messing with, if I know my stuff right.

Which I don't. But if it does then...

Totally scribbling this on the fly, pseudocode-ish:

double [] allValues = whatever;

for(currentValue: allValues)
{
currentValue++;

if(currentValue>3) {currentValue=3;}
currentValue*=5;
}

I suppose could delegate down to:
double [] allValues = whatever;
boolean [] [] branchVector = whatever; //one dim is for the vector, the other dim is for representing each if() test, which is one in this case since we have one if()

SIMDAdd(allValues,1);
branchVector[1][] = SIMDGreaterThan(allValues,3);
SIMDConditionalSetValue(allValues,3,branchVector[1][]); //if the row in the branchVector is false, it passes the input unaffected. If it is true, it sets the value to 3.
SIMDMultiply(allValues,5);

I know that there are dozens of little issues that pop up, such as accessing shared variables (probably should make the compiler regard these as nonparallel attempts) and whatnot. I am not a compiler writer, it's a shot in the dark but if someone finds use of this have fun!