Skip to main content

Software renderer becomes slow with Update10

28 replies [Last post]
egonolsen
Offline
Joined: 2003-06-10
Points: 0

Hi.

I've tried the new early access version with my 3D stuff jPCT.
The results were quite disappointing: The software renderer loses much of it's speed...it's only half as fast in some situations. This isn't acceptable IMHO...any ideas?

Edit: The linked distribution has an example dir with two examples in it. Both start in software mode in case that somebody wants to try it.

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
trembovetski
Offline
Joined: 2003-12-31
Points: 0

> No, of course you can't. But did those applets really had speed issues?

They might not have.

The hope here is that now that there's a hw accelerated pipeline
there will be applets that will use to its full potential.

I do test "out in the wild" applets regularly, and I'm yet to see
one with performance issues that make it unplayable.
Most perform the same but with lower CPU utilization
because of hw acceleration.

Regarding Swing performance: depending on L&F and desktop text antialiasing
setting, the improvement is 10%-60%. With the Nimus L&F it's
more like 100%.

Certain rendering operations, however, improved
hundreds of times. You get hw clipping, antialiasing,
transforms, alpha compositing, hw accelerated gradient fills,
buffered imaging ops.

Anyway, I thought I made it clear that we'll investigate
this scenario and see what could be done.

Dmitri

egonolsen
Offline
Joined: 2003-06-10
Points: 0

For the record: I've run this test on a Core2@2.4Ghz with an ATI X1900XT on PCIe. Results are better on that machine. I'm getting 5600fps with the old fashioned pipeline and 2500fps with the new one (and 2700 when using a BufferStrategy). I could live with that...if only all system would perform like that...:-(

I would give Vista a try too, but i killed my main machine with a BIOS update recently...:-)

Message was edited by: egonolsen

egonolsen
Offline
Joined: 2003-06-10
Points: 0

And about the performance: We are talking about 25 compared to 40 fps and 30 compared to 70 fps here (in the two cases that i have tested). That's a large drop. That can make the difference between playable and choppy.

trembovetski
Offline
Joined: 2003-12-31
Points: 0

If I increase the size of your test to 1280x1024 I get
6uN default 121
6uN nod3d 148
6u4 default 118

As you can see as the size of the window increases
the difference is pretty much nil. You need to compare
6u4 default to 6uN default since that's what most people
will see.

Dmitri

egonolsen
Offline
Joined: 2003-06-10
Points: 0

> As you can see as the size of the window increases
> the difference is pretty much nil.

Again, that highly depends on the hardware. On the above mentioned Athlon (to which i don't have access ATM), that didn't happen. A larger window size (i think it was 800*600) gave 200fps (old) compared to 40(new). u4 and uN were almost the same on that machine with a slight favor towards uN (with the old pipe).

However, let's see what can be done. I'll do some more tests once i get my new board to replace the accidently flashed one..:-)

trembovetski
Offline
Joined: 2003-12-31
Points: 0

> to replace the accidently flashed one..:-)

Heheh. Been there.

Dmitri

egonolsen
Offline
Joined: 2003-06-10
Points: 0

Here's a little test case. With the old pipeline, it delivers around 3500 fps. With the new pipeline, it maxes out at 170...:-(
[code]
import java.awt.*;
import java.awt.image.*;
import javax.swing.*;

public class PerfTest {

public static void main(String[] args) {
//System.setProperty("sun.java2d.d3d", "false");

Image output=new BufferedImage(320, 240, BufferedImage.TYPE_INT_RGB);
int[] pixels=((DataBufferInt)((BufferedImage) output).getRaster().getDataBuffer()).getData();
JFrame frame=new JFrame("Slow me down!");
frame.setSize(320, 240);
frame.setVisible(true);
frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
int cnt=0;
int fps=0;
long time=System.currentTimeMillis();
while (true) {
for (int i=0; i
pixels[i]=cnt;
}
cnt++;
fps++;
frame.getGraphics().drawImage(output, 0, 0, null);
Toolkit.getDefaultToolkit().sync();
if (System.currentTimeMillis()-time>=1000) {
time=System.currentTimeMillis();
System.out.println(fps);
fps=0;
}
}
}
}
[/code]

trembovetski
Offline
Joined: 2003-12-31
Points: 0

Yeah, not good. I'll file a bug on this but I'm not sure how much we
could improve our loop for this case.

You can improve your performance somewhat by using BufferStrategy
(results from the modified test below):

#>java -Dsun.java2d.d3d=false PerfTest
using AWT
using BI->Screen
3321
3497
3538
#>java PerfTest
using AWT
using BI->Screen
812
908
997
936
#>java -Dusebs=true PerfTest
using AWT
using BS
1204
1334
1382
1381
#>java -Dusevi=true PerfTest
using AWT
using VI
765
866
892
903

The test:
[code]
import java.awt.Frame;
import java.awt.Image;
import java.awt.Toolkit;
import java.awt.image.BufferStrategy;
import java.awt.image.BufferedImage;
import java.awt.image.DataBufferInt;
import java.awt.image.VolatileImage;
import javax.swing.JFrame;

public class PerfTest {

static boolean useVI = System.getProperty("usevi") != null;
static boolean useBS = System.getProperty("usebs") != null;
static boolean useJF = System.getProperty("usejf") != null;

public static void main(String[] args) {

Image output=new BufferedImage(320, 240, BufferedImage.TYPE_INT_RGB);
int[] pixels=((DataBufferInt)((BufferedImage) output).getRaster().getDataBuffer()).getData();
Frame frame;
if (useJF) {
System.err.println("using Swing");
frame = new JFrame("Swing JFrame: Slow me down!");
((JFrame)frame).setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
} else {
System.err.println("using AWT");
frame = new Frame("AWT Frame: Slow me down!");
}
frame.pack();
frame.setSize(320, 240);
frame.setVisible(true);

int cnt=0;
int fps=0;
long time=System.currentTimeMillis();
VolatileImage vi = null;
BufferStrategy bs = null;
if (useVI) {
System.err.println("using VI");
vi = frame.createVolatileImage(320, 240);
vi.validate(frame.getGraphicsConfiguration());
} else if (useBS) {
System.err.println("using BS");
frame.createBufferStrategy(2);
bs = frame.getBufferStrategy();
} else {
System.err.println("using BI->Screen");
}
while (true) {
for (int i=0; i
pixels[i]=cnt;
}
cnt++;
fps++;
if (useVI) {
vi.getGraphics().drawImage(output, 0, 0, null);
frame.getGraphics().drawImage(vi, 0, 0, null);
} else if (useBS) {
bs.getDrawGraphics().drawImage(output, 0, 0, null);
bs.show();
} else {
frame.getGraphics().drawImage(output, 0, 0, null);
Toolkit.getDefaultToolkit().sync();
}

if (System.currentTimeMillis()-time>=1000) {
time=System.currentTimeMillis();
System.out.println(fps);
fps=0;
}
}
}
}

[/code]

trembovetski
Offline
Joined: 2003-12-31
Points: 0

Note that the numbers with the BufferStrategy aren't that different from the
default case (meaning, no BS) in 6u4:
#>java -version
java version "1.6.0_10-ea"
Java(TM) SE Runtime Environment (build 1.6.0_10-ea-b08)
Java HotSpot(TM) Client VM (build 1.6.0_10-ea-b08, mixed mode, sharing)
#>java PerfTest
using AWT
using BI->Screen
1367
1622
1641
1636
#> java -Dusebs=true PerfTest
using AWT
using BS
1786
2256
2259
2260

But comparable cases are still slower in 6u10..

So I guess a workaround would be to use BS for your windowed
rendering (it's recommended anyway).

Dmitri

trembovetski
Offline
Joined: 2003-12-31
Points: 0

I've filed:
6652116: D3D: SW->Acceleretaed surfce blits are slower with the new pipeline

It should appear on bugs.sun.com tomorrow.

Dmitri

egonolsen
Offline
Joined: 2003-06-10
Points: 0

Thank you for filing the bug report and for the extended test case. Unfortunately, using a BS doesn't make a single fps difference on this machine. Not with the test case and not with the jPCT examples, I really hope that this situation can be improved, because if not, you've actually killed per-pixel stuff to a certain degree with this pipeline.
What i don't get: Why is it fine when using the OpenGL-pipeline? The actual call looks quite similar judging from the Java2D-logging.

trembovetski
Offline
Joined: 2003-12-31
Points: 0

> What i don't get: Why is it fine when using the OpenGL-pipeline? The actual call looks quite similar judging from the Java2D-logging.

This is because OpenGL (not the pipeline, the OpenGL API)
has (potentially) hw accelerated path for getting pixels into an
OpenGL surface. You just say "yo, here's my pixels, put them
into that OpenGL surface over there" - and it happens.

Direct3D doesn't (at least, without additional libraries like D3DX), so
we have to do it ourselves (basically, through a memcpy). And since
this was not considered the most used case (because in most cases
BI could be cached in an accelerated surface so you'd only pay the
penalty once instead of per frame like in your case) we didn't spend
too much time optimizing this path in the D3D pipeline.

Another wrinkle is on-screen rendering. In Direct3D9 you can not
render directly to the screen, so we had to introduce this mechanism
where on-screen rendering (this is when you call Component.getGraphics())
is redirected to a d3d back-buffer surface which is then flipped
to present its contents (so the on-screen rendering is essentially
double-buffered).

Thanks,
Dmitri

egonolsen
Offline
Joined: 2003-06-10
Points: 0

> This is because OpenGL (not the pipeline, the OpenGL
> API)
> has (potentially) hw accelerated path for getting
> pixels into an
> OpenGL surface. You just say "yo, here's my pixels,
> put them
> into that OpenGL surface over there" - and it
> happens.
>
> Direct3D doesn't (at least, without additional
> libraries like D3DX), so
> we have to do it ourselves (basically, through a
> memcpy).

Now that you say it, i remember reading something about it. Not even DX10 has this option?

> And since
> this was not considered the most used case (because
> in most cases
> BI could be cached in an accelerated surface so you'd
> only pay the
> penalty once instead of per frame like in your case)
> we didn't spend
> too much time optimizing this path in the D3D
> pipeline.

I would say that is a widely common use for applets. Most applets i know are not drawing GUIs with Swing or blitting static BIs...they just do pixel effects to add visual effects or 3D content to a website. People are using jPCT's software renderer more than they are using the hardware renderer because it causes much less trouble especially in applets and is usually fast enough (or should i say "was"?). Improving the applet plugin on one hand and cutting the performance in half on the other hand for these applets isn't a good idea and will damage applets more then it serves them IMHO.
Is this pipeline actually going to be used in applets too?

trembovetski
Offline
Joined: 2003-12-31
Points: 0

> Now that you say it, i remember reading something about it. Not even DX10 has this option?

Like I said, even DirectX 9 does have similar functionality, but in a separate (~1.5M) library.

>Is this pipeline actually going to be used in applets too?

Yes, it's enabled by default.

Regarding performance difference: I'd say that for most applets which
only use software rendering this may not be such an issue since
the rendering area is typically rather small, and not all of them
run flat out (most games I see are puzzlers, and such, with not
that much of action).

We could probably improve perceived performance a bit for
BI -> Screen blit case (by adding an explicit flush after the blit, like
we do in VI->Screen case). It'll help visually.

Dmitri

egonolsen
Offline
Joined: 2003-06-10
Points: 0

First i want to say thank you for your patience and time with this topic. I really appreciate it.

> Like I said, even DirectX 9 does have similar
> functionality, but in a separate (~1.5M) library.

Too bad...

> Regarding performance difference: I'd say that for
> most applets which
> only use software rendering this may not be such an
> issue since
> the rendering area is typically rather small, and not
> all of them
> run flat out (most games I see are puzzlers, and
> such, with not
> that much of action).

Then turn it of by default at least for applets. Because why do you want to speed up simple casual games that don't need it and slow down other applications that do? There are not many GUI intense applets that need this additional speed nor are there many demanding games (like you said). But there are some 3D oder 2D-effect-based applications around and this pipeline causes major trouble with them. The performance is simply bad. Microsofts jview or even the old VM in Netscape 4 (1.16 or something) can render per pixel effects faster than the new pipeline can. That is a very bad idea IMO and it will only damage applets even more (if this is still possible).

Message was edited by: egonolsen

trembovetski
Offline
Joined: 2003-12-31
Points: 0

> That is a very bad idea IMO and it will only damage applets even more (if this is still possible).

What about applets that do use hw-acceleratable graphics? Thousands of
corporate Swing applets? All those photo viewers, full-screen games
(imo, full-screen games which use direct pixel manipulation are the
minority, most just use sprites), etc?

You just can't please everyone, unfortunately.

Dmitri

egonolsen
Offline
Joined: 2003-06-10
Points: 0

> What about applets that do use hw-acceleratable
> graphics? Thousands of
> corporate Swing applets? All those photo viewers,
> full-screen games
> (imo, full-screen games which use direct pixel
> manipulation are the
> minority, most just use sprites), etc?
>
> You just can't please everyone, unfortunately.

No, of course you can't. But did those applets really had speed issues? How much faster is Swing actually with that new pipeline? 10% 100% Do you have some numbers?
I guess i'll have to implements workarounds by disabling that pipeline when it goes official and by spreading the word that one should do this if one wants to have fast per pixel effects. I would love to see this situation improved by a magical fix to the pipeline, but i doubt that this will happen due to the missing possibilites in DX.
I can only repeat myself one last time and i'm speaking as the per-pixel-guy that i am: It's a bad idea to make this pipeline the default one. It hurts applets that need the speed and speeds up applets that don't need it (Swing is already fast enough). It hurts Java as a platform. Of course, you can't please everyone, but i can't remember of a change that brings performance of one operation down to 5% in extreme cases (3500fps compared to 170) to give some speedup to others.
However, thanks again for your time and i hope that this will be fixed in some way or another.

kbr
Offline
Joined: 2003-06-16
Points: 0

Note that with the next-generation Java Plug-In you can work around this issue by adding the following parameter to your applet tag:

egonolsen
Offline
Joined: 2003-06-10
Points: 0

I know. That just doesn't cut it for older websites, when nobody is aware of this change and modifies the tag.
Will the new plugin work in Firefox2? Because the current obviously don't.

Message was edited by: egonolsen

kbr
Offline
Joined: 2003-06-16
Points: 0

> Will the new plugin work in Firefox2? Because the current obviously don't.

No. Many changes were needed on the Firefox side to decouple the Java support in the browser from the legacy OJI interface and to allow us to use the simpler and more modern NPRuntime interface for Java/JavaScript integration. It is infeasible to backport all of these changes to the Firefox 2 tree, so the new plug-in works in Firefox 3 only.

egonolsen
Offline
Joined: 2003-06-10
Points: 0

Some more info about what i'm doing:

The BufferedImage is of type BufferedImage.TYPE_INT_RGB (changing it to BufferedImage.TYPE_INT_ARGB or to an MemoryImageSource doesn't change anything) and simply blitted directly into a Frame/JFrame using a drawImage(...) on the Components Graphics-Context.
Using the OpenGL-pipeline instead of the D3D one is fast too.

trembovetski
Offline
Joined: 2003-12-31
Points: 0

> simply blitted directly into a Frame/JFrame using a drawImage(...) on the Components Graphics-Context.

Hmm. The code seems to indicate that you copy to a buffer strategy's backbuffer,
not to the screen (which is the preferred way).

If you do copy directly to the screen, could you try adding
Toolkit.sync() after you drawImage(bi, ...)?

Dmitri

egonolsen
Offline
Joined: 2003-06-10
Points: 0

There is no BufferStrategy involved in this test. It's used for fullscreen only but the behaviour stays the same no matter if i blit onto the Frame or if i'm using a BufferStrategy in fullscreen mode...it stays slow.
What i have in the simplest possible test case is a BufferedImage and a Frame/JFrame. When using the software renderer, all jPCT's display()-method does is this call:

g.drawImage(bi, 0, 0, null);

Adding a Toolkit.getDefaultToolkit().sync(); after that call doesn't do any good.

Message was edited by: egonolsen

egonolsen
Offline
Joined: 2003-06-10
Points: 0

> Does this mean that it uses software rendering?
> -> support for BufferedImage
> Version helper for 1.2+ initialized!
> -> using BufferedImage
> Software renderer (OpenGL mode) initialized
> Software renderer disposed
> Software renderer (OpenGL mode) initialized

Yes, that means software rendering.

trembovetski
Offline
Joined: 2003-12-31
Points: 0

Does this mean that it uses software rendering?
-> support for BufferedImage
Version helper for 1.2+ initialized!
-> using BufferedImage
Software renderer (OpenGL mode) initialized
Software renderer disposed
Software renderer (OpenGL mode) initialized

I get the same fps on 6u10 b10 as I do on 6u2. What kind of video board to you have?
Is it AGP or pcix?

But I agree, copying BufferedImage that is constantly modified
(especially non-opaque) to a buffer strategy
or VolatileImage may be slower with the new pipeline.

You can explicitly disable the d3d pipeline if you find it unacceptable
(-Dsun.java2d.d3d=false).

Dmitri
Java2D Team

trembovetski
Offline
Joined: 2003-12-31
Points: 0

Could you please provide output with J2D_TRACE_LEVEL=4 env. variable set.

Dmitri

egonolsen
Offline
Joined: 2003-06-10
Points: 0

Disabling the new pipeline brings performance up to a normal level again (even a bit faster than before). However, that's not a very elegant solution IMO, because many applets rely on direct manipulation of BufferedImages for special effects. Would that flag work in an applet tag too with the new plugin?

Anyway, here's the log you requested:

[I] OS Version = OS_WINXP Home
[I] CheckAdaptersInfo
[I] ------------------
[I] Adapter Ordinal : 0
[I] Adapter Handle : 0x10001
[I] Description : NVIDIA GeForce 7600 GT
[I] GDI Name, Driver : \\.\DISPLAY1, nv4_disp.dll
[I] Vendor Id : 0x10de
[I] Device Id : 0x02e0
[I] SubSys Id : 0x20a3107d
[I] Driver Version : 6.14.11.6909
[I] GUID : {D7B71E3E-41A0-11CF-3069-A80003C2CB35}
[I] D3DPPLM::CheckDeviceCaps: adapter 0: Passed
[I] ------------------
[I] D3DGD_getDeviceCapsNative
[I] D3DContext::InitContext device 0
[I] D3DContext::ConfigureContext device 0
[V] dwBehaviorFlags=D3DCREATE_FPU_PRESERVE|D3DCREATE_HARDWARE_VERTEXPROCESSING
[I] D3DContext::ConfigureContext: successfully created device: 0
[I] D3DContext::InitDevice: device 0
[I] D3DContext::InitDefice: successfully initialized device 0
[V] | CAPS_DEVICE_OK
[V] | CAPS_RT_PLAIN_ALPHA
[V] | CAPS_RT_TEXTURE_ALPHA
[V] | CAPS_RT_TEXTURE_OPAQUE
[V] | CAPS_LCD_SHADER | CAPS_BIOP_SHADER | CAPS_PS20
[V] | CAPS_PS30
[V] | CAPS_MULTITEXTURE
[V] | CAPS_TEXNONPOW2
[V] | CAPS_TEXNONSQUARE

As you can see, it's a 7600GT running under WindowsXP Home via AGP. The system itself is an Athlon X2 4200.

rogyeu
Offline
Joined: 2006-07-30
Points: 0

Hi egonolsen,

Yes, the tag should work in the new Plug-In with something like:

[i]

[/i]

Regards,
Roger Y.