Skip to main content

Swing performance on Linux

40 replies [Last post]
sat1196
Offline
Joined: 2003-11-08
Points: 0

While profiling a video playback app, I found this at the top of the cpu usage list:

rank self accum count trace method
1 48.69% 48.69% 10576 300390 sun.awt.X11.XToolkit.waitForEvents

Obviously, on linux X11 is the main bottleneck when it comes to animation/video performance. I thought then of using MIT-SHM to improve performance, the problem: I need to use jni and an AWT Canvas. Not only is C not my cup of tea, I prefer hot java. But also, that would mean that I can't draw any swing component on top of the canvas. That's a big problem for my application.
Also, the OGL pipeline doesn't improve much, X11 is not being bypassed, moreover OGL blit seems slower than j2d blit.
I doubt shm is implemented at this moment, however I think it would be good to have it for 6.0 and to be able to turn it on with a flag.

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
trembovetski
Offline
Joined: 2003-12-31
Points: 0

> myBitmap = (BufferedImage)createImage( myWidth, myHeight );

This actually creates a Pixmap-based image (btw, you should not assume
that it returns a BufferedImage, it may not).

So your fillRects are X11 operations, which are very fast, which is why
you see such good performance.

I don't quite understand why you're getting such bad performance with
the BI->VI->BB case. 20s is way way too slow.

You're not displaying through ssh tunneling or something?

Could you try this: run your app with -Dsun.java2d.trace=log , and post
the output.

Could you also run any of the jdk swing demos (like SwingSet2) with
-Dsun.java2d.pmoffscreen=false and see how they perform?

Dmitri

bolofsson
Offline
Joined: 2008-02-25
Points: 0

In response to wmeissner:

> One reason might be that primitives like fillRect() and drawLine are server side operations. i.e. since
> you're not de-accelerating the BufferedImage by getting its raster, it becomes managed, so instead of
> drawing to a local bitmap then blitting the bitmap over to the server, they just send drawing commands to
> the server - only a handful of bytes for a fillRect or drawLine vs megabytes of pixels.

You're right, that is what happens (see trace logs below).

> If Graphics2D doesn't have the primitives you need, you could do a hybrid approach.

> Draw what you can with Graphics2D on a managed or volatile image, and do any ops you can't do with
> Graphics2D into a BufferedImage with a background color of 0, 0, 0, 0 (i.e. black with an alpha of zero),
> then using drawImage() to draw just that smaller image to the volatile image with alpha compositing.

I agree that this is a good workaround when the changing part of the graphics is significantly smaller than the full frame (as in my test program). The remaining problem is that it makes the drawing code more complex, and it may also slow it down when the app is running locally.

In response to Dmitri:

> So your fillRects are X11 operations, which are very fast, which is why
> you see such good performance.
> I don't quite understand why you're getting such bad performance with
> the BI->VI->BB case. 20s is way way too slow.

> You're not displaying through ssh tunneling or something?
Yes I am. This is the way I usually connect to different Linux machines. But I guess ssh is not the problem here, or would you use some way for remote access? Using the remote login shell (rsh) gives the same performance, and other options are not available on our systems, for security reasons I believe (e.g. telnet).

> Could you try this: run your app with -Dsun.java2d.trace=log , and post
> the output.

I ran the above test programs with -Dsun.java2d.trace=log, here is the output:

----- Setting individual pixels on BI -> VI -> BB -----
WINDOWS:

D3DFillRect
sun.java2d.loops.FillRect::FillRect(AnyColor, SrcNoEa, AnyInt)
D3DFillRect
D3DFillRect
D3DFillRect
sun.java2d.d3d.D3DSwToSurfaceBlit::Blit(IntRgb, AnyAlpha, "D3D Surface")
sun.java2d.d3d.D3DRTTSurfaceToSurfaceBlit::Blit("D3D Surface (render-to-texture)
", AnyAlpha, "D3D Surface")
D3DFillRect
sun.java2d.d3d.D3DSwToSurfaceBlit::Blit(IntRgb, AnyAlpha, "D3D Surface")
sun.java2d.d3d.D3DRTTSurfaceToSurfaceBlit::Blit("D3D Surface (render-to-texture)
", AnyAlpha, "D3D Surface")
sun.java2d.d3d.D3DSwToSurfaceBlit::Blit(IntRgb, AnyAlpha, "D3D Surface")
sun.java2d.d3d.D3DRTTSurfaceToSurfaceBlit::Blit("D3D Surface (render-to-texture)
", AnyAlpha, "D3D Surface")
...

LINUX:

X11FillRect
X11FillRect
X11FillRect
X11FillRect
sun.java2d.loops.Blit::Blit(IntRgb, SrcNoEa, IntRgb)
sun.java2d.x11.X11PMBlitLoops::Blit("Integer RGB Pixmap", SrcNoEa, "Integer RGB Pixmap")
sun.java2d.x11.X11PMBlitLoops::Blit("Integer RGB Pixmap", SrcNoEa, "Integer RGB Pixmap")
sun.java2d.loops.Blit::Blit(IntRgb, SrcNoEa, IntRgb)
sun.java2d.x11.X11PMBlitLoops::Blit("Integer RGB Pixmap", SrcNoEa, "Integer RGB Pixmap")
sun.java2d.x11.X11PMBlitLoops::Blit("Integer RGB Pixmap", SrcNoEa, "Integer RGB Pixmap")
...

----- Using draw primitives on BI Graphics object -> BB -----
WINDOWS:

D3DFillRect
sun.java2d.loops.FillRect::FillRect(AnyColor, SrcNoEa, AnyInt)
D3DFillRect
D3DFillRect
sun.java2d.loops.FillRect::FillRect(AnyColor, SrcNoEa, AnyInt)
sun.java2d.loops.FillRect::FillRect(AnyColor, SrcNoEa, AnyInt)
sun.java2d.d3d.D3DSwToSurfaceBlit::Blit(IntRgb, AnyAlpha, "D3D Surface")
D3DFillRect
sun.java2d.loops.FillRect::FillRect(AnyColor, SrcNoEa, AnyInt)
sun.java2d.d3d.D3DSwToSurfaceBlit::Blit(IntRgb, AnyAlpha, "D3D Surface")
sun.java2d.loops.FillRect::FillRect(AnyColor, SrcNoEa, AnyInt)
sun.java2d.loops.FillRect::FillRect(AnyColor, SrcNoEa, AnyInt)
sun.java2d.loops.FillRect::FillRect(AnyColor, SrcNoEa, AnyInt)
...
sun.java2d.loops.FillRect::FillRect(AnyColor, SrcNoEa, AnyInt)
sun.java2d.loops.FillRect::FillRect(AnyColor, SrcNoEa, AnyInt)
sun.java2d.d3d.D3DSwToSurfaceBlit::Blit(IntRgb, AnyAlpha, "D3D Surface")
...

LINUX:

X11FillRect
X11FillRect
X11FillRect
X11FillRect
X11FillRect
sun.java2d.x11.X11PMBlitLoops::Blit("Integer RGB Pixmap", SrcNoEa, "Integer RGB Pixmap")
sun.java2d.x11.X11PMBlitLoops::Blit("Integer RGB Pixmap", SrcNoEa, "Integer RGB Pixmap")
X11FillRect
X11FillRect
X11FillRect
...
X11FillRect
X11FillRect
sun.java2d.x11.X11PMBlitLoops::Blit("Integer RGB Pixmap", SrcNoEa, "Integer RGB Pixmap")
sun.java2d.x11.X11PMBlitLoops::Blit("Integer RGB Pixmap", SrcNoEa, "Integer RGB Pixmap")
...

So it seems to be the operation
sun.java2d.loops.Blit::Blit(IntRgb, SrcNoEa, IntRgb)
that is slow when running the app remotely, while X11FillRect is much faster because it's not sending every pixels through the network.

> Could you also run any of the jdk swing demos (like SwingSet2) with
> -Dsun.java2d.pmoffscreen=false and see how they perform?
> Dmitri

I also tried to run SwingSet2. As expected it is hideously slow on the slow network connection. Once everything is loaded (which took ages), the response time for simple controls such as buttons, color selectors etc is expectedly slow but manageable. On the fast network, Swing is performing OK, so that is not the problem.

I also found another forum thread about X11 server performance (http://forums.java.net/jive/thread.jspa?messageID=237183) that you also responded to.
I realise that the whole issue is too complex for me to be able to follow all the arguments, with managed/unmanaged graphics, offscreen pixmaps, server/client issues etc.

It would still be great if there was a way that allows me to use my own drawing routines (to an int[] buffer, or to a setPixel() method..) with a similar performance as the Graphics primitives such as fillRect. Something like a new method called fillRect(int[] pixelData).
Any further ideas? I'm happy to try out anything, even if I may not understand all the background issues.

The performance of my app is great, but only when run locally. On Linux, the typical use is to run the app remotely on a server which is connected through a fast network. And this is unacceptably slow with the current implementation. I am not sure if the transfer rate for the pixel data is the sole culprit here. Is there possibly a way to do all the pixel manipulation on the client side even when the program is running on a server?

Thank you very much for your help so far.
By the way, I'm sorry to clog up this thread with a somewhat different topic. Maybe I should have opened a new one...?

wmeissner
Offline
Joined: 2005-10-09
Points: 0

> > If Graphics2D doesn't have the primitives you need,
> you could do a hybrid approach.
>
> > Draw what you can with Graphics2D on a managed or
> volatile image, and do any ops you can't do with
> > Graphics2D into a BufferedImage with a background
> color of 0, 0, 0, 0 (i.e. black with an alpha of
> zero),

Actually, that might be bad - from that other thread, it seems the X11 backend doesn't do alpha compositing on the server, so it has to pull the image back from the server - which would be mega-slow.

> > then using drawImage() to draw just that smaller
> image to the volatile image with alpha compositing.
>
> I agree that this is a good workaround when the
> changing part of the graphics is significantly
> smaller than the full frame (as in my test program).
> The remaining problem is that it makes the drawing
> code more complex, and it may also slow it down when
> the app is running locally.

I've used this method before, when I needed to render some parts using direct pixel access for speed reasons, and others using Java2D operations, and performance is fairly good if you manage it correctly.

The way I did it, I had a BufferedImage used for direct pixel access, and a VolatileImage used for the screen buffer/accelerated ops.

For java2d ops, it mirrored the operations on each Image, so keep them in sync. This is the slow part of it, since the java2d ops on the BufferedImage won't be accelerated.

If all your direct pixel access completely fills the area its writing to, then you can probably just use a BufferedImage as your screen buffer, and only do the java2d ops on it. I think it will act as a managed Image, which means all the java2d ops will be accelerated, and surface loss will be handled automagically for you. Dmitri will correct me on this, I'm sure :-)

For pixel access, it scribbled to the BufferedImage, then used the version of Graphics2D.drawImage() that allows you to draw only a part of an image, to draw that changed part to the screen buffer. In practice, keeping a full-sized BufferedImage around and using it for pixel access was a lot faster than allocating smaller ones on demand.

In paintComponent(), it just used a simple Graphics.drawImage() to draw the screen image to the swing BB. As the screen image is accelerated (i.e. in the X server), this will be done with minimal network traffic.

Even when running local, I don't think you can get much faster than this on any pipeline that accelerates java2d ops, since if you do direct pixel access to a BufferedImage, it permanently decelerates it, so you still need to keep the pixel buffer separate from the accelerated screen buffer and draw the pixel buffer to screen buffer.

bolofsson
Offline
Joined: 2008-02-25
Points: 0

Hi wmeissner,

I made some more progress last week, and I come to a final workflow that is very similar to what you propose.
I post the code snippets below for those who might read this thread and are equally clueless as I was just a week ago. Please correct me if you spot any errors.
(Remember I am trying to speed things up for the case when running a Java 2D program remotely on another X terminal.)

1) Network traffic in general when using Java 2D remotely
Dmitri wrote in his response:
'Whatever you do, the pixels need to get from the client to the server over the network.'

He is right. First, I tested my network speed by copying a file over the network using the Linux command 'scp'. Next, I measured the frame rate when posting an (unmanaged) image to the swing backbuffer and onto the remote X terminal using Graphics.drawImage. Assuming 4 bytes per pixel (32bit integer ARGB color), the frame rate was just 15% slower than copying the same amount of bytes over the network using 'scp'.
This fully explains why my original app was so slow over the network.

2) Reducing network traffic (a)
In my application, I can get away with using single byte color coding (255 colors) instead of 4 byte ARGB colors.
[code]
BufferedImage myBitmap = new BufferedImage( myImageWidth, myImageHeight, BufferedImage.TYPE_BYTE_INDEXED ); // Unmanaged image
Image myImage = createImage( myImageWidth, myImageHeight ); // Managed image
[/code]
This simple trick already speeds up the app almost 4x, because there are 4 times less bytes to transfer over the network for a full repaint of the image.

3) Reducing network traffic (b)
As you suggest, I only directly access pixels on the local BufferedImage
[code]
byte[] byteBuffer = ((DataBufferByte)myBitmap.getRaster().getDataBuffer()).getData();
// byteBuffer[.] = ...;
[/code]
and only repaint the part of the image that changed using Graphics.drawImage()
[code]
Graphics g = myImage.getGraphics();
// The following line of code transfers all selected pixels over the network
g.drawImage( myBitmap, xmin, ymin, xmax, ymax, xmin, ymin, xmax, ymax, null );
[/code]

Working locally, I don't see any difference between redrawing the whole image, or only part of it. But working over a network this speeds up things significantly.

4) Using Graphics 'primitives' wherever possible
Simple stuff is drawn onto the accelerated (possibly volatile) image using the Graphics draw methods:
[code]
Graphics2D g2 = (Graphics2D)myImage.getGraphics();
g2.setXORMode( new Color( 0x00aaaaff ) );
g2.drawLine( 0, ypos, myImageWidth, ypos );
[/code]

5) Drawing to Swing back buffer
In paintComponent, I can now draw the accelerated image as many times as I want to without any further network traffic.
[code]
public void paintComponent( Graphics g ) {
Graphics2D g2 = (Graphics2D)g;
g2.drawImage( myVolatile, 0, 0, this );
}
[/code]

Thanks for your help. I hope this will be useful for others as well. As I mentioned, please correct me if you spot any errors.

wmeissner
Offline
Joined: 2005-10-09
Points: 0

> Well, I've made some progress. Using wmeissner's
> gstreamer design, I create a BufferedImage like
> this:
>
>     bufferedImage = new
> BufferedImage(w, h, BufferedImage.TYPE_INT_RGB);
>
> and adding some instrumentation to the code, I get
> this:
>
> [INFO] avg rendering time = 10.126278 ms
> [INFO] avg YUV->RGB conversion time = 11.319964 ms
>
> "rendering time" is basically for a drawImage (I'm
> not using VolatileImage right now). I'm also using
> setElem instead of pulling out a DataBufferInt,
> because I now want to try creating my BufferedImage
> like this:
>
>     bufferedImage =
> component.getGraphicsConfiguration().createCompatible
> mage(w, h, Transparency.OPAQUE);
>
> My performance changes dramatically:
>
> [INFO] avg rendering time = 0.2855 ms
> [INFO] avg YUV->RGB conversion time = 2050.9004 ms

Deja-vu. You're following the same path I followed before I settled on the render-to-DataBufferInt then drawImage-to-Volatile way.

Like Dmitri recommends, try getting the pixel array from the TYPE_INT_RGB BufferedImage and do the YUV->RGB conversion directly into that array. When I tried that vs the setElem() way, there was a marked improvement in speed.

trembovetski
Offline
Joined: 2003-12-31
Points: 0

> Also, is this optimized primarily for Windows? From what I've read about VolatileImage, it sounds like it doesn't do much on other architectures. Is that right?

Depending on the pipeline and your configuration, a VolatileImage may be
a (shared memory) Pixmap (unix), pbuffer/fbobject (opengl on unix/win),
or Direct3D surface (d3d pipeline, windows).

> but I'm still scratching my head over just when and how this code gets invoked.

It is called when we detect that the rate of pixels _read_ from the destination
is larger than certain percentage of total pixels in the surface.
Then the Pixmap is punted to a shared memory pixmap.

But you shouldn't concern yourself with the implementation too much.
Haven't you determined that even forcing Pixmaps to be shared memory
pixmaps from the get go doesn't help?

Currently he pixels need to go through these stages when getting to the screen:
- decoded by video codec => put into a native array
- copied to java heap (like a data buffer of buffered image, or just some java array)
- if you're using toolkit image api, there may be other copies of these
pixels made
- if the format of the screen is not the same as the format of the
pixels, they need to be converted to the screen format
(this is done on the fly in most cases)
- they need to be copied to a Pixmap (a back-buffer) - this may be
slow, depending on whether it's a shared memory pixmap or
regular server pixmap.
- pixmap is copied to the screen - this is reversed now: if this is a
server pixmap (which is likely in the vram), it's fast, if it's a shared
memory pixmap - it's slower

What we really need is media api which will do all this for you, of
course. =(

Dmitri

brent_baccala
Offline
Joined: 2008-01-23
Points: 0

> Haven't you determined that even forcing Pixmaps to
> be shared memory
> pixmaps from the get go doesn't help?

Well, no, I don't think I've determined that yet. I don't exactly what J2D_PIXMAPS does, but I didn't think it puts every single image into shared memory. And even if it did, I've still got the overhead of copying from an int[] into an Image.

The way I'd like it to work (on X/*nix) is to create an image (of some kind) in shared memory, snag a pointer to its data array as an int[] (or something that can be optimized to an int[] by hotspot), query the image to see what its native pixel format is, and have the codec write straight into it as it converts from YUV to RGB.

No, I take that back. The way I'd like it to work, well, I'd just like it to work :-)

brent_baccala
Offline
Joined: 2008-01-23
Points: 0

> If you want a (admittedly somewhat convoluted)
> example of displaying video into a lightweight
> component at decent speed on a 1.4+ jvm, have a look
> at
> http://gstreamer-java.googlecode.com/svn/trunk/gstream
> er-java/src/org/gstreamer/swing/GstVideoComponent.java

OK, I've had the chance to look over your code and I've got some questions...

First, why do you do this:

Graphics2D g2d = (Graphics2D) g.create();

and how can we know that the cast always works?

Also, is this optimized primarily for Windows? From what I've read about VolatileImage, it sounds like it doesn't do much on other architectures. Is that right?

I'm also hoping that somebody can explain just what is required to get a shared memory pixmap on an X/unix system. I've looked into the source code around X11SD_PuntPixmap, but I'm still scratching my head over just when and how this code gets invoked.

Thanks for all the help so far.

wmeissner
Offline
Joined: 2005-10-09
Points: 0

> First, why do you do this:
>
> Graphics2D g2d = (Graphics2D) g.create();
> and how can we know that the cast always works?

I need to create a new Graphics2D because I fiddle with it (i.e. by calling setRenderingHints()), and paintComponent() is supposed to leave the Graphics passed to it in the same state on return.

AFAIK, as of java 1.2, the Graphics instance passed to paintComponent() is in reality a Graphics2D instance, and therefore create() will also return a Graphics2D instance - so the cast is guaranteed to work.

>
> Also, is this optimized primarily for Windows? From
> what I've read about VolatileImage, it sounds like it
> doesn't do much on other architectures. Is that
> right?

I actually put the VolatileImage code in there for OpenGL on linux. There is a case where you do BILINEAR scaling with the OpenGL java2d pipeline where it goes incredibly slow (i.e. about 0.5fps), unless the image is copied to a VolatileImage first.

So to avoid having a special case in there for OpenGL - and to avoid hitting the same problem with any future pipeline (e.g. the Direct3D one), I use the VolatileImage on all platforms.

Whilst on Linux/X11, MacOS and Windows/GDI you can get away with this:
raw pixels -> BufferedImage -> paintComponent()

It seems the way to guarantee consistent performance is:
raw pixels -> BufferedImage -> VolatileImage -> paintComponent()

Since a VolatileImage is supposed to be in accelerated memory, the copy from the VolatileImage to the swing backbuffer is done in video ram which is usually a lot faster than system ram, so the cost of the extra copy is negligible.

>
> I'm also hoping that somebody can explain just what
> is required to get a shared memory pixmap on an
> X/unix system. I've looked into the source code
> around X11SD_PuntPixmap, but I'm still scratching my
> head over just when and how this code gets invoked.

I think you're worrying too much about the overhead of copying an array from heap memory to video memory. Even on an old cpu, they should be able to do the 40M of copying per second (for DVD resolution) without raising a sweat.

By rendering directly into the DataBuffer of a BufferedImage, then splatting that to the swing backbuffer, you're going to get about as fast as you can without using a Canvas and a BufferStrategy.

brent_baccala
Offline
Joined: 2008-01-23
Points: 0

> I think you're worrying too much about the overhead
> of copying an array from heap memory to video memory.
> Even on an old cpu, they should be able to do the
> 40M of copying per second (for DVD resolution)
> without raising a sweat.

OK, so then where is the bottleneck with just using Image and drawImage?

trembovetski
Offline
Joined: 2003-12-31
Points: 0

The "Toolkit images" mechanism may create various copies of the pixels
along the way (this is with image producer/consumer apis, and MemoryImageSource
stuff).

BufferedImages are more efficient in that regard as they only
have one single copy of the pixels in the heap.

But in principle there isn't that much of a difference between
using regular image and copying it to the back-buffer and
doing a manual BI->VI->BB copy. The only advantage here
is that if your frame rate is lower than your screen update speed
you can copy the last frame which you have in the VI when you
need to update the screen.

Also, with accelerated pipelines (ogl/d3d) you can do stuff like
resizing this VI (since VI->BB is hw accelerated and resize
is free).

Dmitri

brent_baccala
Offline
Joined: 2008-01-23
Points: 0

> Did you try the flags suggested above? (the
> J2D_PIXMAPS
> env. variable)?

I did try that. It seems like it might have changed the behavior just a little in terms of exactly when it started dropping frames, but it didn't produce any big improvement.

wmeissner
Offline
Joined: 2005-10-09
Points: 0

If you want a (admittedly somewhat convoluted) example of displaying video into a lightweight component at decent speed on a 1.4+ jvm, have a look at http://gstreamer-java.googlecode.com/svn/trunk/gstreamer-java/src/org/gs...

As Dmitri suggested, it paints the RGB data directly into a BufferedImage (so saving a copy), but it also then draws that BufferedImage into a VolatileImage, so scaling can be accelerated by e.g. the OpenGL pipeline. Without this step, rendering with the opengl pipeline enabled drops to around 0.5fps.

It also keeps a pool of BufferedImages for the rendering stage, so the garbage collector doesn't get hammered - which can also be a huge CPU hog when rendering video.

trembovetski
Offline
Joined: 2003-12-31
Points: 0

If the frames are of the same dimensions, couldn't you just reuse one BI?

Dmitri

wmeissner
Offline
Joined: 2005-10-09
Points: 0

Describing it as a pool of BI was kinda misleading - its really just double buffering now - an older implementation used to use a pool.

You need 2 BufferedImages - one for the rendering thread to scribble into, one for swing to use to [re]paint.

wmeissner
Offline
Joined: 2005-10-09
Points: 0

Thinking about it again, using a single BI is doable - you just need to lock around all accesses to it by either the EDT or the render thread.

brent_baccala
Offline
Joined: 2008-01-23
Points: 0

I'm now trying to duplicate my earlier results using openjdk6. It seems that (as trembovetski reported for jdk7) the "compatible" buffers are not DataBufferNative on this version, either.

Classes like X11CachingSurfaceManager, RemoteOffScreenImage, and X11RemoteOffScreenImage seem to have disappeared. And particularly, in X11GraphicsConfig's createCompatibleImage, the following code (from jdk5) is gone:

[code]
if (X11SurfaceData.isAccelerationEnabled()) {
return new X11RemoteOffScreenImage(null, model, raster,
model.isAlphaPremultiplied());
}
[/code]

So, how can I get a DataBufferNative BufferedImage in openjdk6?

trembovetski
Offline
Joined: 2003-12-31
Points: 0

openjdk6 is based on jdk7, so the RemoteOffScreenImage is gone.

You can only get a pixmap-based image (thus having DBN as the data buffer -
for which you won't have access anyway) by creating a VolatileImage.

As mentioned before, this was done to reduce the confusion over different
image types' behavior.

Why would you want to have access to DBN BI? The DBN is just a plug for
some operations that would have otherwise caused an exception. It's very slow
it performs XGetImage/XPutImage per pixel.

Dmitri

brent_baccala
Offline
Joined: 2008-01-23
Points: 0

> Why would you want to have access to DBN BI? The DBN
> is just a plug for
> some operations that would have otherwise caused an
> exception. It's very slow
> it performs XGetImage/XPutImage per pixel.

Well, it seemed like actually drawing a DBN BI to the screen was quite fast, averaging about .2 ms, although openjdk6 seems to be running about 2-3 ms with its DataBufferInt implementation, so that's not so bad.

By the way, is there a way to see what assembly code hotspot generates for a particular java function?

bolofsson
Offline
Joined: 2008-02-25
Points: 0

Thanks. Here is the source code of my test program once again.

[code]
package test;

import java.awt.*;
import java.awt.event.*;
import java.awt.image.*;
import javax.swing.*;

// Use MemoryImageSource, set int[] buffer pixels, newPixels to BI, then drawImage to BB
// Very fast WINDOWS. Fast LINUX. Very slow remotely.
public class TestGraph22 extends JPanel {
public Point myMousePos;
public int myObjectSize;
public int myObjectStep;
public int myWidth;
public int myHeight;

private int[] myScreenBuffer;
MemoryImageSource myMISource;
private Image myBitmap;

public TestGraph22() {
myMousePos = new Point(-1,-1);
myObjectSize = 80;
myObjectStep = 10;
myHeight = 0;
myWidth = 0;
myScreenBuffer = null;
myBitmap = null;

setSize(myWidth,myHeight);
validate();
addMouseMotionListener( new MouseMotionAdapter() {
public void mouseMoved( MouseEvent e ) {
myMousePos = e.getPoint();
TestGraph22.this.repaint();
}
});
}
private void resetBitmap() {
myScreenBuffer = new int[myWidth * myHeight];
myMISource = new MemoryImageSource(myWidth,myHeight,myScreenBuffer,0,myWidth);
myMISource.setAnimated(true);
myMISource.setFullBufferUpdates(true);
myBitmap = createImage( myMISource );
}
public void paintComponent( Graphics g ) {
Rectangle rect = getVisibleRect();
if( myBitmap == null || myHeight != rect.height || myWidth != rect.width ) {
myHeight = rect.height;
myWidth = rect.width;
resetBitmap();
}
paintObject();
g.drawImage( myBitmap, 0, 0, this );
}
public void paintObject() {
int color = Color.lightGray.getRGB();
for( int iy = 0; iy < myHeight; iy++ ) {
int indexRow = iy*myWidth;
for( int ix = 0; ix < myWidth; ix++ ) {
myScreenBuffer[ix+indexRow] = color;
}
}

if( myMousePos.x >= 0 && myMousePos.y >= 0 ) {
int color1 = Color.black.getRGB();
int color2 = Color.white.getRGB();
int xpos1 = Math.max( myMousePos.x-myObjectSize/2, 0 );
int xpos2 = Math.min( myMousePos.x+myObjectSize/2, myWidth );
int ypos1 = Math.max( myMousePos.y-myObjectSize/2, 0 );
int ypos2 = Math.min( myMousePos.y+myObjectSize/2, myHeight );
for( int iy = ypos1; iy < ypos2; iy++ ) {
int indexRow = iy*myWidth;
int yZeroOne = (int)( iy / myObjectStep ) % 2;
for( int ix = xpos1; ix < xpos2; ix++ ) {
int xZeroOne = (int)( ix / myObjectStep ) % 2;
if( xZeroOne == yZeroOne )
myScreenBuffer[ix+indexRow] = color1;
else
myScreenBuffer[ix+indexRow] = color2;
}
}
}
myMISource.newPixels();
}
public static void main(String[] argv){
JFrame frame = new JFrame("Test graphics 22: screenB -> BI -> BB");
JPanel p = new TestGraph22();
frame.getContentPane().add(p);
frame.setDefaultCloseOperation( JFrame.EXIT_ON_CLOSE );
frame.pack();
frame.setSize(800,600) ;
frame.setVisible(true);
}
}

[/code]

trembovetski
Offline
Joined: 2003-12-31
Points: 0

Whatever you do, the pixels need to get from the client to the server over
the network.

If this needs to be done on every repaint, in remote X case you will see
a slowdown, there's just no way around that.

> 'Remote' is a relative term, as the server and client are linked through a Gigabit network.

Doesn't matter. Consider the difference between sysmem -> vram copy speed
(local X server case) vs sending a bunch of packets (and the X server does break
the image into multiple packets) over the network. The difference is in many orders of
magnitude.

The most optimal way of implementing what you're doing would be
to create a BufferedImage (say of INT_RGB type), get the data buffer, and
directly manipulate the pixels. Copy this image (with SRC alpha compositing
mode) into a VolatileImage, which is then copied to the swing's back-buffer.

Only update the VolatileImage when needed. You can throttle the update
rate to something manageable (do not update the volatile image on every
data update).

Thanks,
Dmitri

trembovetski
Offline
Joined: 2003-12-31
Points: 0

And to clarify a bit: the reason for the use of VolatileImage is that on X11 it is implemented as X11 Pixmap which is a server resource. Meaning, it lives where the X server is.

So once the data is uploaded to the VI, it is on the X server, and can be very
quickly copied to the back-buffer (another Pixmap) or the screen.

And another clarification: the case of local X server uses some other
optimizations like shared memory extension, which allows to pass a pointer to to the pixels residing in shared memory between the X client and the X server instead of
sending a bunch of pixels through the pipes. This extension obviously doesn't
apply to the remote case where X client and server are on different systems so
they can not have shared memory.

Dmitri

bolofsson
Offline
Joined: 2008-02-25
Points: 0

Hi Dimitri,

Thanks for your response. I think I can follow you, and I understand that working on a remote X server will slow down any graphical application.
Using a VolatileImage sounds convincing enough, but unfortunately I haven't been able to make it work (fast) in practise.

Following your advice, I guess my code would look like this (I only re-post the part that changed):
This is similar to what I called method 4 previously.

[code]
private void resetBitmap() {
myBitmap = new BufferedImage( myWidth, myHeight, BufferedImage.TYPE_INT_RGB );
myRaster = myBitmap.getRaster();
myVolatile = this.createVolatileImage( myWidth, myHeight );
}
public void paintComponent( Graphics g ) {
Rectangle rect = getVisibleRect();
if( myBitmap == null || myHeight != rect.height || myWidth != rect.width ) {
myHeight = rect.height;
myWidth = rect.width;
resetBitmap();
}
paintObject();
g.drawImage( myVolatile, 0, 0, this );
}
public void paintObject() {
int[] pixels = ( (DataBufferInt) myRaster.getDataBuffer()).getData();
// ...for loops, paint image into int[] pixel buffer
pixels[ix+indexRow] = color;
// ...
Graphics2D g2 = (Graphics2D)myVolatile.getGraphics();
g2.setComposite( AlphaComposite.Src );
g2.drawImage( myBitmap, 0, 0, this );
}
[/code]

The above works fine locally on both Windows and Linux.
Now, I test this on a slow network, where the repaint of one single frame takes ~20 seconds!

Now consider the code below, where instead I directly draw into the Graphics object of a BufferedImage which I then draw into the back buffer (method 1). In this case, even on the slow network, the refresh rate for one frame becomes ~0.1s which is 200 times faster!

This is the kind of speed I am looking for, but how can I achieve that without having to use the (limited) drawing methods defined for a Graphics2D?
I'm sure there must be a way to push an int[] array into the screen buffer with the same speed as the Graphics2D interface manages with the methods fillRect() or drawLine(). Or am I wrong?

Bjorn

[code]
private void resetBitmap() {
myBitmap = (BufferedImage)createImage( myWidth, myHeight );
}
public void paintComponent( Graphics g ) {
Rectangle rect = getVisibleRect();
if( myBitmap == null || myHeight != rect.height || myWidth != rect.width ) {
myHeight = rect.height;
myWidth = rect.width;
resetBitmap();
}
paintObject();
g.drawImage( myBitmap, 0, 0, this );
}
public void paintObject() {
Graphics2D g2 = (Graphics2D)myBitmap.getGraphics();
g2.setColor( Color.lightGray );
g2.fillRect( 0, 0, myWidth, myHeight );

if( myMousePos.x >= 0 && myMousePos.y >= 0 ) {
Color color1 = Color.black;
Color color2 = Color.white;
int xpos1 = myMousePos.x-myObjectSize/2;
int xpos2 = myMousePos.x+myObjectSize/2;
int ypos1 = myMousePos.y-myObjectSize/2;
int ypos2 = myMousePos.y+myObjectSize/2;
for( int iy = ypos1; iy < ypos2; ) {
int ylen = myObjectStep - ( iy % myObjectStep );
ylen = Math.min( ylen, ypos2-iy );
int yZeroOne = (int)( iy / myObjectStep ) % 2;
for( int ix = xpos1; ix < xpos2; ) {
int xlen = myObjectStep - ( ix % myObjectStep );
xlen = Math.min( xlen, xpos2-ix );
int xZeroOne = (int)( ix / myObjectStep ) % 2;
if( xZeroOne == yZeroOne )
g2.setColor( color1 );
else
g2.setColor( color2 );
g2.fillRect( ix, iy, xlen, ylen );
ix += xlen;
}
iy += ylen;
}
}
}
[/code]

wmeissner
Offline
Joined: 2005-10-09
Points: 0

> draw into the Graphics object of a BufferedImage
> which I then draw into the back buffer (method 1). In
> this case, even on the slow network, the refresh rate
> for one frame becomes ~0.1s which is 200 times
> faster!

One reason might be that primitives like fillRect() and drawLine are server side operations. i.e. since you're not de-accelerating the BufferedImage by getting its raster, it becomes managed, so instead of drawing to a local bitmap then blitting the bitmap over to the server, they just send drawing commands to the server - only a handful of bytes for a fillRect or drawLine vs megabytes of pixels.

If Graphics2D doesn't have the primitives you need, you could do a hybrid approach.

Draw what you can with Graphics2D on a managed or volatile image, and do any ops you can't do with Graphics2D into a BufferedImage with a background color of 0, 0, 0, 0 (i.e. black with an alpha of zero), then using drawImage() to draw just that smaller image to the volatile image with alpha compositing.

bolofsson
Offline
Joined: 2008-02-25
Points: 0

...for some reason, I am unable to post the program listing correctly. Some characters must upset the editor, so I had to take away some code

public void paintObject() {
int color = Color.lightGray.getRGB();
for( int iy = 0; iy < myHeight; iy++ ) {
int indexRow = iy*myWidth;
for( int ix = 0; ix < myWidth; ix++ ) {
myScreenBuffer[ix+indexRow] = color;
}
}

if( myMousePos.x >= 0 && myMousePos.y >= 0 ) {
int color1 = Color.black.getRGB();
int color2 = Color.white.getRGB();
int xpos1 = Math.max( myMousePos.x-myObjectSize/2, 0 );
int xpos2 = Math.min( myMousePos.x+myObjectSize/2, myWidth );
int ypos1 = Math.max( myMousePos.y-myObjectSize/2, 0 );
int ypos2 = Math.min( myMousePos.y+myObjectSize/2, myHeight );
for( int iy = ypos1; iy < ypos2; iy++ ) {
int indexRow = iy*myWidth;
int yZeroOne = ....;
for( int ix = xpos1; ix < xpos2; ix++ ) {
...
...set myScreenBuffer
...
myMISource.newPixels();
}
public static void main(String[] argv){
JFrame frame = new JFrame("Test graphics 22");
JPanel p = new TestGraph22();
frame.getContentPane().add(p);
frame.setDefaultCloseOperation( JFrame.EXIT_ON_CLOSE );
frame.pack();
frame.setSize(800,600) ;
frame.setVisible(true);
}
}
//---------------------------------------------------------------

trembovetski
Offline
Joined: 2003-12-31
Points: 0

You might want to use [ code ] and [ / code ] tags for posting code.

[code]
public static void main(String argv[]) {
}
[/code]

bolofsson
Offline
Joined: 2008-02-25
Points: 0

I am also experiencing performance problems with Java 2D on Linux. However, it is when running a Java app remotely that it is slowing down significantly.
'Remote' is a relative term, as the server and client are linked through a Gigabit network. There is hardly any performance drop for applications that are installed on the server but are started remotely from a client. That is also true for Java applications that use the standard Swing components. The problems start as soon as I try to draw something from an int[] buffer onto the screen.

I have read through many posts on the Internet on how to speed up Java 2D graphics, and tried all options that I could find.
From these, and from the performance tests I have run, I note there are four genuinely different techniques (with many more variations thereof):
1) Draw to Graphics object, using standard methods such as fillRect()...
2) Draw seemingly directly to Image/BufferedImage using setRGB()
3) Use MemoryImageSource, by first setting an int[] buffer, then transferring the values to a BufferedImage
4) Set pixels directly in underlying data buffer

Additionally one has the option of drawing to an intermediate VolatileBuffer before drawing the image to the screen, but I haven't seen any difference in performance when using VolatileBuffer, at least not when running the test program below.

Here are my findings regarding performance, using the test program below (shown is the test program for method 3):

Test 1) Draw to Graphics object, using standard methods such as fillRect()...
Here I tried two techniques:
a) draw directly to the back buffer Graphics object passed by paintComponent
b) draw to Graphics object of a BufferedImage object, then draw to back buffer Graphics object using drawImage
Windows & Linux: Very very fast
Linux Remote: Fast

Test 2) Draw to Image/BufferedImage using setRGB()
This test is the same as test 1.b), only that I used setRGB() to set pixel colors in the BufferedImage instead of drawing onto the Graphics object
Windows & Linux: Slow
Linux Remote: Very very slow

Test 3) Use MemoryImageSource, by first setting an int[] buffer, then transferring the values to a BufferedImage
This is supposed to be the fastest and safest way of moving image data from an int buffer to the screen.
Windows & Linux: Very fast
Linux Remote: Very very slow

Test 4) Set pixels directly in underlying data buffer
This test is using lots of tricks. First, a DataBufferInt and a SampleModel is created, from which a Raster is created, which in turn is used to create a BufferedImage.
Windows & Linux: Very very fast
Linux Remote: Very very slow

Conclusions:
Method 2 (using setRGB) is too slow, even when running locally. Method 4 is the fastest of all when run locally, by a small margin above methods 1 and 3.
Only method 1 (drawing to Graphics objects) gives acceptable performance when running the application remotely. "Very very slow" means that it takes many seconds for the screen to refresh when the mouse has moved.
In my full application, I do not have the option to use the standard draw methods on a Graphics object. I have to draw the image myself to an int[] buffer first.

So my questions are:
Why do methods 3 and 4 above work well locally, but not remotely?
Why does method 1 work so well remotely? Can this be replicated by other means?
Is there another, better way of pushing an int[] array onto the screen?
Is there something else fundamentally wrong with my test program? Such as calling repaint()?

I would very much appreciate any help to resolve the problem.

//---------------------------------------------------------------
package test;

import java.awt.*;
import java.awt.event.*;
import java.awt.image.*;
import javax.swing.*;

// Test program for method 3
// Use MemoryImageSource, set int[] buffer pixels, newPixels to BI, then drawImage to BB
public class TestGraph22 extends JPanel {
public Point myMousePos;
public int myObjectSize;
public int myObjectStep;
public int myWidth;
public int myHeight;

private int[] myScreenBuffer;
private MemoryImageSource myMISource;
private Image myBitmap;

public TestGraph22() {
myMousePos = new Point(-1,-1);
myObjectSize = 80;
myObjectStep = 10;
myHeight = 0;
myWidth = 0;
myScreenBuffer = null;
myBitmap = null;

setSize(myWidth,myHeight);
validate();
addMouseMotionListener( new MouseMotionAdapter() {
public void mouseMoved( MouseEvent e ) {
myMousePos = e.getPoint();
TestGraph22.this.repaint();
}
});
}
private void resetBitmap() {
myScreenBuffer = new int[myWidth * myHeight];
myMISource = new MemoryImageSource(myWidth,myHeight,myScreenBuffer,0,myWidth);
myMISource.setAnimated(true);
myMISource.setFullBufferUpdates(true);
myBitmap = createImage( myMISource );
}
public void paintComponent( Graphics g ) {
Rectangle rect = getVisibleRect();
if( myBitmap == null || myHeight != rect.height || myWidth != rect.width ) {
myHeight = rect.height;
myWidth = rect.width;
resetBitmap();
}
paintObject();
g.drawImage( myBitmap, 0, 0, this );
}
public void paintObject() {
int color = Color.lightGray.getRGB();
for( int iy = 0; iy < myHeight; iy++ ) {
int indexRow = iy*myWidth;
for( int ix = 0; ix < myWidth; ix++ ) {
myScreenBuffer[ix+indexRow] = color;
}
}

if( myMousePos.x >= 0 && myMousePos.y >= 0 ) {
int color1 = Color.black.getRGB();
int color2 = Color.white.getRGB();
int xpos1 = Math.max( myMousePos.x-myObjectSize/2, 0 );
int xpos2 = Math.min( myMousePos.x+myObjectSize/2, myWidth );
int ypos1 = Math.max( myMousePos.y-myObjectSize/2, 0 );
int ypos2 = Math.min( myMousePos.y+myObjectSize/2, myHeight );
for( int iy = ypos1; iy < ypos2; iy++ ) {
int indexRow = iy*myWidth;
int yZeroOne = (int)( iy / myObjectStep ) % 2;
for( int ix = xpos1; ix < xpos2; ix++ ) {
int xZeroOne = (int)( ix / myObjectStep ) % 2;
if( xZeroOne == yZeroOne )
myScreenBuffer[ix+indexRow] = color1;
else
myScreenBuffer[ix+indexRow] = color2;
}
}
}
myMISource.newPixels();
}
public static void main(String[] argv){
JFrame frame = new JFrame("Test graphics 22: intBuffer -> BI -> BB");
JPanel p = new TestGraph22();
frame.getContentPane().add(p);
frame.setDefaultCloseOperation( JFrame.EXIT_ON_CLOSE );
frame.pack();
frame.setSize(800,600) ;
frame.setVisible(true);
}
}
//---------------------------------------------------------------

Oh by the way, I tested this with Java SE Runtime Environment build 1.6.0_03-b05 on Windows XP, and build version 1.6.0_04-b12 on SUSE Linux 10.0 (also tested this on several other Linux flavours, i.e. Fedora, Redhat 3 & 4).

brent_baccala
Offline
Joined: 2008-01-23
Points: 0

Well, I've made some progress. Using wmeissner's gstreamer design, I create a BufferedImage like this:

    bufferedImage = new BufferedImage(w, h, BufferedImage.TYPE_INT_RGB);

and adding some instrumentation to the code, I get this:

[INFO] avg rendering time = 10.126278 ms
[INFO] avg YUV->RGB conversion time = 11.319964 ms

"rendering time" is basically for a drawImage (I'm not using VolatileImage right now). I'm also using setElem instead of pulling out a DataBufferInt, because I now want to try creating my BufferedImage like this:

    bufferedImage = component.getGraphicsConfiguration().createCompatibleImage(w, h, Transparency.OPAQUE);

My performance changes dramatically:

[INFO] avg rendering time = 0.2855 ms
[INFO] avg YUV->RGB conversion time = 2050.9004 ms

Now, THIS is a little more like what I want to see - a sub-millisecond rendering time. The problem, of course, is that the final YUV->RGB conversion step now takes forever. My best guess, from looking at DataBufferNative.c, is that the main culprit is the locking operation on setElem(), which does a lock/unlock for every single value written to the DataBuffer. Any ideas of how to get around this?

Oh, and what happens if I now set J2D_PIXMAPS=shared?

# An unexpected error has been detected by Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x4243a70c, pid=12491, tid=294931
#
# Java VM: Java HotSpot(TM) Client VM (10.0-b19 mixed mode, sharing linux-x86)
# Problematic frame:
# C [libawt.so+0x7770c] Java_sun_awt_image_DataBufferNative_setElem+0xcc

The problem seems to be caused at the bottom of the frame, so by modifying the code not to process the bottom 8 lines, I get:

[INFO] avg rendering time = 0.38954544 ms
[INFO] avg YUV->RGB conversion time = 518.8233 ms

A significant improvement, but still nowhere near good enough.

trembovetski
Offline
Joined: 2003-12-31
Points: 0

> bufferedImage = component.getGraphicsConfiguration().createCompatibleImage(w, h, Transparency.OPAQUE);

That's "compatible" images are kept in pixmaps, as you have probably guessed.
BTW, this changed in 1.7 (the dev builds) since it was confusing for people,
and inconsistent. Originally this was done so that applications which use compatible
images as back-buffers would benefit from accelerated X11 rendering.

>The problem, of course, is that the final YUV->RGB conversion step now takes forever. My
>best guess, from looking at DataBufferNative.c, is that the main culprit is the locking
>operation on setElem(), which does a lock/unlock for every single value written to the
>DataBuffer. Any ideas of how to get around this?

Ugh, no, unfortunately not. set/get elem calls lock/unlock per pixel, and currently
there's no way to work around it.

I doubt that you'd get any faster than your 10+10ms time for the BI case.

I still think that ->(convert)BI->VI->BackBuffer would be a better way to do this,
especially if you need to update the screen more often than your frame rate -
then some of the copies will be done from the cached VI.

>Oh, and what happens if I now set J2D_PIXMAPS=shared?
># An unexpected error has been detected by Java Runtime Environment:

Not good. Could you please either file a bug or send/post a test case?
(tdv at sun dot com)

Dmitri

brent_baccala
Offline
Joined: 2008-01-23
Points: 0

> That's "compatible" images are kept in pixmaps, as
> you have probably guessed.
> BTW, this changed in 1.7 (the dev builds) since it
> was confusing for people,
> and inconsistent. Originally this was done so that
> applications which use compatible
> images as back-buffers would benefit from accelerated
> X11 rendering.

How has it changed?

I'm interested to know since just because this can't be done well right now, doesn't mean it can't be done well later. In particular, I hope that a major goal of JDK 7 is good video support, and I'd really like that to include good native codec support. So something like the per-pixel locking code isn't a problem; it can always become per-frame locking code.

> >Oh, and what happens if I now set
> J2D_PIXMAPS=shared?
> ># An unexpected error has been detected by Java
> Runtime Environment:
>
> Not good. Could you please either file a bug or
> send/post a test case?
> (tdv at sun dot com)

Right now, my only test case is an entire (patched) cortado applet. Give me a couple of days, I'll try to narrow it down to something manageable.

trembovetski
Offline
Joined: 2003-12-31
Points: 0

> How has it changed?

In 1.7 compatible images are "normal" managed BufferedImages, not something
special that's kept in a Pixmap. So when you render to it, the rendering
goes to the heap-based array, not pixmap.

Dmitri

brent_baccala
Offline
Joined: 2008-01-23
Points: 0

Though I'm not the original poster, I'm experiencing something very similar to this using a video playback app (fluendo.com's cortado app/applet, to be exact).

Commenting out the drawImage used to send the video to the screen results in 50% CPU usage and codec keeping up with things. Put the drawImage back in, and now we're pegged at 100% with dropped frames.

So, I figure the drawImage must be the problem, and I guessing we're not using SHM. The way the code is setup, the codec writes RGB data into a static (i.e, not garbage collected) int array, then uses the ImageProducer/ImageConsumer interface to convert the whole thing to an Image (once per frame) using a single call to setPixels, then paints this to the screen with a drawImage. Not using BufferStrategy.

I'm wondering if I can get the codec to write directly into a shared memory pixmap.

What I really want (of course) is a design that will optimize performance on all supported platforms, delivered as an applet, so with minimal assumptions about the runtime. A video playback applet, with the codec delivered in the applet.

And to anticipate the question asked two years ago, here's what my java2d trace looks like:

X11FillRect
X11DrawRect
X11FillRect
X11DrawGlyphs
sun.java2d.loops.Blit::Blit(IntRgb, SrcNoEa, IntRgb)
sun.java2d.x11.X11PMBlitLoops::Blit("Integer RGB Pixmap", SrcNoEa, "Integer RGB
Pixmap")
X11FillRect
X11DrawRect
X11FillRect
X11DrawGlyphs
sun.java2d.loops.Blit::Blit(IntRgb, SrcNoEa, IntRgb)
sun.java2d.x11.X11PMBlitLoops::Blit("Integer RGB Pixmap", SrcNoEa, "Integer RGB
Pixmap")
X11FillRect
X11DrawRect
X11FillRect
X11DrawGlyphs
sun.java2d.loops.Blit::Blit(IntRgb, SrcNoEa, IntRgb)
sun.java2d.x11.X11PMBlitLoops::Blit("Integer RGB Pixmap", SrcNoEa, "Integer RGB
Pixmap")
X11FillRect
X11DrawRect
X11FillRect
X11DrawGlyphs
sun.java2d.loops.Blit::Blit(IntRgb, SrcNoEa, IntRgb)
sun.java2d.x11.X11PMBlitLoops::Blit("Integer RGB Pixmap", SrcNoEa, "Integer RGB
Pixmap")
X11FillRect
X11DrawRect
X11FillRect
X11DrawGlyphs
X11DrawGlyphs
sun.java2d.loops.Blit::Blit(IntRgb, SrcNoEa, IntRgb)
sun.java2d.x11.X11PMBlitLoops::Blit("Integer RGB Pixmap", SrcNoEa, "Integer RGB
Pixmap")
X11FillRect
X11DrawRect
X11FillRect
X11DrawGlyphs
X11DrawGlyphs
sun.java2d.loops.Blit::Blit(IntRgb, SrcNoEa, IntRgb)
sun.java2d.x11.X11PMBlitLoops::Blit("Integer RGB Pixmap", SrcNoEa, "Integer RGB
Pixmap")
sun.java2d.loops.Blit::Blit(IntRgb, SrcNoEa, IntRgb)
sun.java2d.loops.Blit::Blit(IntRgb, SrcNoEa, IntRgb)
sun.java2d.loops.Blit::Blit(IntRgb, SrcNoEa, IntRgb)
sun.java2d.loops.Blit::Blit(IntRgb, SrcNoEa, IntRgb)
sun.java2d.loops.Blit::Blit(IntRgb, SrcNoEa, IntRgb)
sun.java2d.loops.Blit::Blit(IntRgb, SrcNoEa, IntRgb)
sun.java2d.loops.Blit::Blit(IntRgb, SrcNoEa, IntRgb)
sun.java2d.loops.Blit::Blit(IntRgb, SrcNoEa, IntRgb)
sun.java2d.loops.Blit::Blit(IntRgb, SrcNoEa, IntRgb)
sun.java2d.loops.Blit::Blit(IntRgb, SrcNoEa, IntRgb)
sun.java2d.loops.Blit::Blit(IntRgb, SrcNoEa, IntRgb)
sun.java2d.loops.Blit::Blit(IntRgb, SrcNoEa, IntRgb)
sun.java2d.loops.Blit::Blit(IntRgb, SrcNoEa, IntRgb)
sun.java2d.loops.Blit::Blit(IntRgb, SrcNoEa, IntRgb)
sun.java2d.loops.Blit::Blit(IntRgb, SrcNoEa, IntRgb)
sun.java2d.loops.Blit::Blit(IntRgb, SrcNoEa, IntRgb)
sun.java2d.loops.Blit::Blit(IntRgb, SrcNoEa, IntRgb)
sun.java2d.loops.Blit::Blit(IntRgb, SrcNoEa, IntRgb)
sun.java2d.loops.Blit::Blit(IntRgb, SrcNoEa, IntRgb)
sun.java2d.loops.Blit::Blit(IntRgb, SrcNoEa, IntRgb)
sun.java2d.loops.Blit::Blit(IntRgb, SrcNoEa, IntRgb)
sun.java2d.loops.Blit::Blit(IntRgb, SrcNoEa, IntRgb)
sun.java2d.loops.Blit::Blit(IntRgb, SrcNoEa, IntRgb)

etc.

trembovetski
Offline
Joined: 2003-12-31
Points: 0

Unfortunately I don't have a good answer for you.
The scenario you describe just doesn' t work well with
the current pipeline.

You could try to optimize a bit by creating a BufferedImage
instead of the whole ImageConsumer/setPixels thing
(that'd cut down on intermediate pixel transfers) but
I think you're right and the main problem in in transferring
pixels to the backbuffer.

But still, try this: create a buffered image of the type
you like, get the data buffer, get the pixel array, and have
the video codec write to that array.

(or, the other way around - create a data buffer from
your array, then create raster out of that data buffer
and then a BI from that)

Did you try the flags suggested above? (the J2D_PIXMAPS
env. variable)?

Dmitri
Java2D Team

trembovetski
Offline
Joined: 2003-12-31
Points: 0

Also, I wanted to add, that we have people working on providing better media
support in the jdk out of the box, hopefully soon (I know, it can't be soon enough).

Dmitri

trembovetski
Offline
Joined: 2003-12-31
Points: 0

Java2D's X11 pipeline does use SHM extension for getting
pixels to the screen (or to a pixmap if you're
copying to swing's back-buffer).

Note that since your image is very likely to be changing
on every frame (because of loop which may look like this:
upload a new frame, copy it to the screen), we won't be
able to accelerate it (that is, cache in a pixmap and then
just use XCopyArea to copy it to the destination).

If that's not the case, you might be doing something else to prevent us from caching that image, like grabbing
the DataBuffer of the image.

Could you please run your app with
-Dsun.java2d.trace=log
and post the output.

Also, a couple of options to try:
set env. variable
J2D_PIXMAPS=shared
prior to starting your app.

Another one to try is
-Dsun.java2d.pmoffscreen=false

Check out this page for more java2d flags:
http://java.sun.com/j2se/1.5.0/docs/guide/2d/flags.html

Thanks,
Dmitri
Java2D Team

ssprimordial
Offline
Joined: 2005-03-01
Points: 0

please ignore

bino_george
Offline
Joined: 2003-06-16
Points: 0

Hi,

> Obviously, on linux X11 is the main bottleneck when
> it comes to animation/video performance. I thought
> then of using MIT-SHM to improve performance, the
> problem: I need to use jni and an AWT Canvas. Not
> only is C not my cup of tea, I prefer hot java. But
> also, that would mean that I can't draw any swing
> component on top of the canvas. That's a big problem
> for my application.
> Also, the OGL pipeline doesn't improve much, X11 is
> not being bypassed, moreover OGL blit seems slower
> than j2d blit.
> I doubt shm is implemented at this moment, however I
> think it would be good to have it for 6.0 and to be
> able to turn it on with a flag.

Can you give me some more info about your video driver,
and keep in mind the blit performance will vary with
driver. Also what platform are you on Suse/Fedora etc
and what version of XServer ?

What does your app do ? Is it using drawImage most of the
time (since it is a video playback app) ? Do you draw
directly to the screen or do you use a BufferStrategy ?

Some more info about your app and setup will be helpful.

Thanks,
Bino.

sat1196
Offline
Joined: 2003-11-08
Points: 0

Sorry I forgot to mention. I'm using Mandrake 10.0 w/ kernel 2.6.3 XFree86-4.3 with an ATI (that's probably the problem) radeon 9200 and the latest drivers. My app simply draws a TYPE_INT_RGB BufferedImage onto a JPanel subclass for each frame. I haven't looked much at the BufferStrategy API. And I haven't changed the default so I guess my panel is double-buffered. Still, I think the problem is mostly due to X11, since I only call update() once per frame. Am I right? There's no documentation on XToolkit.waitForEvents(). Is there currently a way to work around the problem or should I post a RFE?
Thanks for your help!

sat1196
Offline
Joined: 2003-11-08
Points: 0

sorry I skipped your post on waitForEvents(). I'll try again with hprof=cpu=times.

bino_george
Offline
Joined: 2003-06-16
Points: 0

Hi,

> While profiling a video playback app, I found this at
> the top of the cpu usage list:
>
> rank self accum count trace method
> 1 48.69% 48.69% 10576 300390
> 390 sun.awt.X11.XToolkit.waitForEvents

You will see this kind of profile in most GUI apps.
"waitForEvents" is the native method that calls
XNextEvent to get X11 events off the queue. So this
is normal, but keep in mind this is misleading.

Sampling profilers will often show this method
because all they do is repeatedly sample the call
stack and in the case of the Toolkit thread all
it does is sit in the method and wait for events,
when there is an event it will push it on to the
Java level event queue and go back to waiting.

Instrumeting profilers are more accurate, but they
have higher overhead.

I will ask someone in 2D team to comment on MIT
-SHM and OGL.

Thanks,
Bino.