Skip to main content

Native libgmp for faster number crunching

31 replies [Last post]
atripp
Offline
Joined: 2003-07-15
Points: 0

Maybe someone should incorporate the native libgmp library into OpenJDK.

See the recent threads titled "Removed GMP math?" in the Kaffe and Classpath mailing lists for some details.

You might get the impression that libgmp might only help in extreme corner cases where you have really huge numbers, but I've found that Java is several times slower than C/C++ even on a reasonable benchmark that deals with not-so-huge numbers.

If you act quickly to get in a proposal to the OpenJDK Challenge, you could get paid to do the work.

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
i30817
Offline
Joined: 2006-05-02
Points: 0

Any news on this? I think it would be nice to have this on jdk7.

mthornton
Offline
Joined: 2003-06-10
Points: 0

One particular problem with BigDecimal is that it has already been optimised to give better results in benchmarks like specjbb (http://www.spec.org/jbb2005/). I would guess that such benchmarks are exclusively about what we would consider small numbers. I.e. BigDecimal is used because it is decimal as opposed to binary, not because the values are large. Avoiding performance hits on such tests is probably critical to acceptance of improved implementations.

i30817
Offline
Joined: 2006-05-02
Points: 0

If the test output is too large, can it be compressed (streamed) into a hash? Or absolutely every value must be compared?

eliasen
Offline
Joined: 2005-06-01
Points: 0

> If the test output is too large, can it be compressed
> (streamed) into a hash? Or absolutely every value
> must be compared?

The output could indeed be hashed. Of course, that gives only one bit of information--whether the output is exactly the same or not. If something goes wrong in the test, it's useless to try and figure out which operands are giving trouble.

A big issue, though, is that it takes about 30 hours to run my regression tests. (Some of that is disk output, of course.) I think that even with hashing the output, that's too much time for Sun's regression tests. I've asked a couple of times how long a regression test should run, but never received an answer.

i30817
Offline
Joined: 2006-05-02
Points: 0

And the current test on BigInteger ? Can't you just run them to have a baseline comparation?

As for not knowing exactly where the problem is, couldn't you set up a system where things are set up by intervals?
Store n hashs of input, where each hash ni corresponds to x bits of the original input. Store those, and make a cicle where you hash(x bits) at a time to compare with ni then if problem, execute the algorithm you already have limited to ](ni)*x-(ni+1)x] bits.

Yes i know talk is cheap.

mthornton
Offline
Joined: 2003-06-10
Points: 0

> In my opinion, if I were going to work on a
> BigDecimal implementation, I would want to to work
> on a *real* BigDecimal specification, though: one
> that extends functions like log(), sin(), exp(),
> etc., to arbitrary precision (with controllable
> rounding directions.) Just fixing the existing API
> isn't enough.

I think those methods might be best in a separate class. Perhaps BigMath, either as static methods or using a factory to obtain a suitable instance of BigMath.

mthornton
Offline
Joined: 2003-06-10
Points: 0

Argument reduction on large arguments has to use a lot of bits of PI --- up to something like 1024 bits I believe. This is quite expensive and can easily swamp the JNI overhead. By contrast BigInteger operations for small values are relatively fast.

mthornton
Offline
Joined: 2003-06-10
Points: 0

How large are the numbers?
Remember that BigInteger (usually in the guise of BigDecimal) is often used for numbers that are little (if at all) bigger than would fit in a long. Most of these uses are probably for currency. These people may be rather upset if you hit their performance in order to get better results on tasks like calculating PI.

I understand that BigInteger just uses the classical o(n^2) algorithm for multiplication, so the values don't have to be very big before packages using advanced algorithms dominate. The trouble is the large number of uses cases where n is almost always 1.

oneguyks
Offline
Joined: 2008-04-28
Points: 0

>These people may be rather upset if you hit their performance in order to get >better results on tasks like calculating PI.

Have you done benchmarks for numbers just larger than long to see whether JNI with gmp is slower than BigInteger? What if it's faster?

mthornton
Offline
Joined: 2003-06-10
Points: 0

No, but I have already pointed out that BigInteger used to be implemented using a C library called via JNI. For small numbers gmp is unlikely to be much faster. In the years since that change was made the performance of JNI has improved so perhaps it should be retested but I wouldn't be too hopefull.

oneguyks
Offline
Joined: 2008-04-28
Points: 0

Well, I don't know about BigInteger vs gmp + JNI for numbers just larger than long, but I do know that calling millions of sins and cosins with JNI is much faster than using Math.sin and Math.cos

Here is an example:

partialsums.java that uses JNI

http://pastebin.com/f69322377

MathLib.h

http://pastebin.com/f6a866605

and MathLib.cc http://pastebin.com/f4b0543cb

the command I used to compile it with visualC++

cl /O2 -Ic:\Java\jdk1.6.0_06\include -Ic:\Java\jdk1.6.0_06\include\win32 -LD CMath.cc -FeCMath.dll

and on linux/ubuntu

g++ -O3 -march=native -o libCMath.so CMath.cc -shared -fpic -I/usr/lib64/jvm/java-6-sun/include
-I/usr/lib64/jvm/java-6-sun/include/linux

I put all files in the directory and ran it with:

java -Djava.library.path=. partialsums 2500000

JNI was several times faster than Math.sin and Math.cos

mthornton
Offline
Joined: 2003-06-10
Points: 0

That illustrates the well known issue of sin/cos with arguments of magnitude greater than ~ PI/4. Your C++ library methods (and the Intel sin/cos instructions) are not accurate enough to satisfy Java's specification. The Java implementation has to do a relatively expensive argument reduction. In your benchmark, all (or almost all) the trig calls will require this extra reduction. Many real uses aren't impacted quite so severely.

oneguyks
Offline
Joined: 2008-04-28
Points: 0

Yes, I am aware of expensive argument reduction for arguments larger than PI/4. However, JNI is fast enough that itt beats Math.sin() and Math.cos() by a huge margin. If JNI was this slow, woun't argument reduction and slowness of JNI cancel each other?

linuxhippy
Offline
Joined: 2004-01-07
Points: 0

> If JNI was this slow, woun't argument
> reduction and slowness of JNI cancel each other?

Well this really depends a lot on _how_ slow JNI is.
I've played with JNI on my Core2Duo and its overhead is about 30-40 cycles, and about 100 cycles on a Pentium4.
However this is under optimal conditions (few primitive parameters), and no array or field access from the native side.

What I would call for would be a WeakMath package. StrictMath is simply too strict for whats often needed by game guys, and it would be full instrifyable across many platforms without having to worry about its correctness.

lg Clemens

alexlamsl
Offline
Joined: 2004-09-02
Points: 0

>
> Typical open-source-fanboy fuzzy thinking.
>
> It's certainly possible to have a business need for a
> feature for which there are no specific developers
> who see a need (or are interested in working on).
>

If there is a business case for it, then the businesses involved should pay for the cost of implementing the feature. Sun does not just go out and pay for your profits ;)

>
> If it produces faster code and makes Java usable for
> a larger set of applications, I don't see how that's
> not a "good thing". Impossible, maybe, but it's hard
> to argue that the goal isn't a good one.
>

IF it produces faster code. And even if that is the case, maintenance become a more complicated issue. Apart from Performance, Reliability is also a very important concern.

oneguyks
Offline
Joined: 2008-04-28
Points: 0

>One problem is that for moderate size numbers, JNI overhead would probably l>ose any performance advantage libgmp offered.

I am not sure if that is true. Have a look at this:

http://shootout.alioth.debian.org/gp4/benchmark.php?test=pidigits&lang=all

Each program uses spigot algorithm to calculate digits of Pi. The Java version uses JNI to acess libgmp and is only 15% slower than C++ implementation. The java version that uses BigInteger is 600% slower.

Message was edited by: oneguyks

linuxhippy
Offline
Joined: 2004-01-07
Points: 0

the big question is who is willed to maintain the code resulting of such a change.
It would require to have multiple versions of the same functions because for small BigInteger's the JNI performance would still be really a lot of overhead, additional native libraries and so on...

By the way as far as I know somebody is working on improving BigInteger performance in OpenJDk :)

lg Clemens

mthornton
Offline
Joined: 2003-06-10
Points: 0
oneguyks
Offline
Joined: 2008-04-28
Points: 0

>Someone is working on it:

>http://www.nabble.com/BigInteger-performance-improvements-td15197989.html

I emailed him (Alan Eliasen). He is only working on BigInteger. BigDecimal, according to him, is broken and is "beyond saving." He had more to say but I am not sure I can quote him without his permission.

eliasen
Offline
Joined: 2005-06-01
Points: 0

> >Someone is working on it:
>
> >http://www.nabble.com/BigInteger-performance-improvements-td15197989.html
>
> I emailed him (Alan Eliasen). He is only working on
> BigInteger. BigDecimal, according to him, is broken
> and is "beyond saving."

Oneguyks let me know about this thread, and here's my update. For the most detail on where we are, please read the thread in the URL posted above. It contains my current status. In short, I have submitted patches to Sun containing improved (all-Java) algorithms for multiplication in BigInteger, including Karatsuba multiplication, 3-way Toom-Cook multiplication, Karatsuba squaring, and Toom-Cook squaring. These greatly improve performance for large numbers, while leaving small number performance essentially unchanged, or faster in some cases.

I also have changes for much faster exponentiation, but since my contact at Sun hasn't commented on the multiplication changes yet, and requested smaller, self-contained patches, I haven't made an official patch for pow(), which could still use some tuning for mid-sized numbers.

However, you can test *all* of the fixes I've made. My implementation with all of my changes is available at http://futureboy.us/temp/BigInteger.java

Please, if you're interested in making BigInteger faster, please look at my changes! Put them into your JDK 1.7 package and run them! I've run *enormous* regression tests of my own on them (they take 30 hours to run), but it would benefit from others testing and helping Sun validate that the changes are correct. Somebody needs to update their regression tests. (I admit that's not my favorite thing to do, so I write my own huge regression tests, and work on the algorithms, and haven't dug into Sun's regression testing framework.)

I also have recursive divide-and-conquer algorithms for *much* faster toString() performance that I use in my own code (my programming language Frink, http://futureboy.us/frinkdocs/ ), but I haven't added to the patch because they have potential memory usage and threading tradeoffs that can be done a lot of ways and potentially could be controversial. I wanted to contribute simple, clearly-superior algorithms for multiply() and pow() first so they'd be sure to get in, and because fixing these makes almost everything else faster.

To clarify, I'm *not* planning on rewriting BigDecimal. It would require too much time and research (unless someone's willing to pay for it.) The algorithms and code structure of BigDecimal simply need to be re-done completely from scratch. There are a couple of options, including the BigDecimal drop-in replacement from IBM:

http://www2.hursley.ibm.com/decimalj/

But I don't recommend that because it's very slow for large numbers.

I think it might be best if Sun licensed the faster ArciMath BigDecimal implementation, available here:
http://users.belgacombusiness.net/arci/math/

For reasons why Sun's implementation is so bad, the ArciMath site notes things like: "Remember that e.g. a 2-digit number like 1.2E1454 is really a 2-digit number in be.arci.math and com.ibm.math, but internally is a 1456-digit number in java.math, because that package can handle no exponents; java.math.BigDecimal will transform all 1454 trailing zeroes into a long string of on-and off bits that have to be operated on."

Maybe if I could find a large team of people who wanted to work on BigDecimal, and who actually *did* any work on it. I've gotten messages from other people who want BigInteger and BigDecimal to run faster, but, to be quite frank, nobody seems interested enough or understands the trickier algorithms enough to *do* any work on it. Since I posted my patches to the list, I haven't even had anyone who has told me that they've tried my changes. I've seen other projects on SourceForge that *say* they're going to work on more algorithms, like arbitrary-precision roots, but nothing ever comes of it.

In my opinion, if I were going to work on a BigDecimal implementation, I would want to to work on a *real* BigDecimal specification, though: one that extends functions like log(), sin(), exp(), etc., to arbitrary precision (with controllable rounding directions.) Just fixing the existing API isn't enough. The class should be extended enough to
actually be *useful* to people working with large numbers.

But I want to get the changes into BigInteger first... hopefully Sun will look at them sometime, so we can proceed, and I'm not left hanging, trying to guess if my patches are going to be rejected for some unknown reason. Feedback would be nice.

It would also help me for people to write benchmarks for various operations, notably multiply(), pow(), and toString(), for various combinations of number sizes and bit patterns. I can help direct people to the places that I think can stand the most tuning if they're interested. I've already done a lot of the algorithm work, but tuning always helps, too.

In summary, I'm working at making BigInteger faster, and if anyone is interested in helping, you can. The most important things are:

* Creating smaller regression tests that Sun will accept, (mine produce 232 gigabytes of output, and you have to do this twice to compare!) and that are sure to test the larger numbers that use the Karatsuba and Toom-Cook algorithms.

* Creating performance tests so we can demonstrate clearly where we're faster and identify number sizes and bit patterns that might benefit from different algorithms and tuning of existing algorithms.

Please contact me if you're interested in helping! And please make sure Sun knows that this is important to you, and that the ball is in their court. I've done a lot of hard work, and would really like to see it get into JDK 1.7, or even earlier. My code's available for download. (See above.)

Alan Eliasen
eliasen@mindspring.com

mthornton
Offline
Joined: 2003-06-10
Points: 0

> For reasons why Sun's implementation is so bad,
> the ArciMath site notes things like: "Remember that
> e.g. a 2-digit number like 1.2E1454 is really a
> 2-digit number in be.arci.math and com.ibm.math, but
> internally is a 1456-digit number in java.math,
> because that package can handle no exponents;
> java.math.BigDecimal will transform all 1454
> trailing zeroes into a long string of on-and off
> bits that have to be operated on."

As of Java 5 this isn't true. There is a blog somewhere about the compatibility issues that arose from introducing positive exponents (negative scales). See the method BigDecimal.stripTrailingZeros

eliasen
Offline
Joined: 2005-06-01
Points: 0

> As of Java 5 this isn't true. There is a blog
> somewhere about the compatibility issues that arose
> from introducing positive exponents (negative
> scales). See the method BigDecimal.stripTrailingZeros

Thanks for the correction! I hadn't looked at later implementations of BigDecimal. I *think* I can break those old habits of not "tainting" myself by looking at Sun code when I want to contribute to gcj or Kaffe, but I'm not a lawyer, so I just avoid it.

In any case, since BigDecimal relies heavily on BigInteger for much of its calculations, the things I'm doing for multiply(), pow(), toString(), etc., should improve the performance of BigDecimal for numbers with a large number of digits. I'd be interested if anyone integrates my BigInteger fixes (linked above) into their builds and benchmarks BigDecimal with my changes.

mthornton
Offline
Joined: 2003-06-10
Points: 0

> Thanks for the correction! I hadn't looked at later
> implementations of BigDecimal. I *think* I can break
> those old habits of not "tainting" myself by looking
> at Sun code when I want to contribute to gcj or
> Kaffe, but I'm not a lawyer, so I just avoid it.
You are surely allowed to look at the JavaDoc, and the details are clear from the description of the stripTrailingZeros method.

mthornton
Offline
Joined: 2003-06-10
Points: 0

One problem is that for moderate size numbers, JNI overhead would probably lose any performance advantage libgmp offered. Originally Java did use a native code implementation for BigInteger, but this was replaced by the current [b]faster[/b] pure Java implementation.

So to make use of libgmp viable we need a faster alternative to JNI. Alternatively pure Java implementations of algorithms like those in libgmp that perform better on large numbers without losing out on the smaller end.

atripp
Offline
Joined: 2003-07-15
Points: 0

> Alternatively pure Java
> implementations of algorithms like those in libgmp
> that perform better on large numbers without losing
> out on the smaller end.

Yes....hmm...if only someone would make a really nice tool that translates C/C++ to Java. Nah..that'll never happen! Sorry, gotta get back to work on Jazillian now. :)

linuxhippy
Offline
Joined: 2004-01-07
Points: 0

Andy ... why do you want Sun to pay the things you would like to have for free?

atripp
Offline
Joined: 2003-07-15
Points: 0

> Andy ... why do you want Sun to pay the things you
> would like to have for free?

Ummm...so that they're more likely to get done.
I guess I don't understand the question.

linuxhippy
Offline
Joined: 2004-01-07
Points: 0

Well, the question was too ironic I guess ;)
Why do you try to influence the contest - if native libgmp math stuff is really that important somebody will write a proposal for it - if not it does not seem to be the case.

I doubt that an additional native library just for maths is really a good thing, however there is currently some work going on in the java implementations of several BigInteger/BigDecimal algorythms ... so stay tuned :)

lg Clemens

atripp
Offline
Joined: 2003-07-15
Points: 0

> Well, the question was too ironic I guess ;)
> Why do you try to influence the contest - if native
> libgmp math stuff is really that important somebody
> will write a proposal for it - if not it does not
> seem to be the case.

Typical open-source-fanboy fuzzy thinking.

It's certainly possible to have a business need for a feature for which there are no specific developers who see a need (or are interested in working on).

>
> I doubt that an additional native library just for
> maths is really a good thing,

If it produces faster code and makes Java usable for a larger set of applications, I don't see how that's not a "good thing". Impossible, maybe, but it's hard to argue that the goal isn't a good one.

> however there is
> currently some work going on in the java
> implementations of several BigInteger/BigDecimal
> algorythms ... so stay tuned :)

Right, and I'm also hoping to "spread the word" about the Challenge, so that perhaps someone working on this stuff might get paid for their work.

>
> lg Clemens

linuxhippy
Offline
Joined: 2004-01-07
Points: 0

> Typical open-source-fanboy fuzzy thinking.

That has nothing to do with OpenSource at all, Sun would decide (and does in this case) exactly the same:
- If Sun could make a big deal because of that feature they would implement it
- If there is consense that this feature would benefit many use cases it would also be implemented, because it would help spreading the java platform

After all, this seems not to be the case - otherwise wouldn't they have done it.

If you need such improvements and you can't do it yourself, why don't you pay somebody who does it for you. That was what I ment with "why do you want to pay Sun ...".
It seems you don't understand OpenSource - OpenSource is good for features many people would like to see, not to fullfill the wishes of some individuals.

After all Andy, why do you always come up with the "open source zealot" stuff if you're running out of arguments?

> If it produces faster code and makes Java usable for
> a larger set of applications, I don't see how that's
> not a "good thing". Impossible, maybe, but it's hard
> to argue that the goal isn't a good one.
- Dependencie on external code. Believe it or not, but every additional native libarary is major burden for maintenance and porting to new platforms.
- Slower under some circumstances, JNI overhead and JNI problems (you can't rely on GetCritical calls in that case - so you need to copy the whole stuff).
- Larger download size, since native code can't be compressed that well.

> Right, and I'm also hoping to "spread the word" about
> the Challenge, so that perhaps someone working on
> this stuff might get paid for their work.
Nobody handed in a proposal, and the proposal stage is over.

lg Clemens

atripp
Offline
Joined: 2003-07-15
Points: 0

That should say:

...even on reasonable benchmarks like this one: Telco Benchmark

If you act quickly, you can apply to the OpenJDK Challenge