Skip to main content

[JAI] Tips for optimising processing speed when memory is limited

4 replies [Last post]
Anonymous

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
Jason Grant

Hi Brian,

Thanks for your helpful responses on the JAI list over recent weeks.

Regarding your recent request about contributing my Canny edge finder
back to JAI, I've attached the relevant classes to this mail, and am
cc'ing the list in case it's useful to anyone else. The OpImages only
include processing loops for byte images, but I thought that I may as
well pass them on because the algorithm seems to be working nicely, and
the byteLoops may serve as a good sketch for others who may be better
placed to test int/float loops, tune the existing algorithms, advise on
better design, or weave them into JAI. I have only dealt with
RenderedOps so far, and the code reflects that too - there is no use of
Renderables anywhere.

I'm sending the code now, since I'm about to rework it significantly in
a way that is probably not useful to others, as indicated by my recent
post. I have a need to cancel JAI processing, and so will soon alter
the CannyEdgeFinder class to instead use queueTiles() to perform the
processing, since it seems to me that this is the only way to cancel JAI
jobs. I coded the Canny algorithm as a vanilla Java class rather than
a composite OpImage, because I did the former originally (as described
in the list), but ultimately found that this was going to compromise my
ability to track progress for feedback in my UI, cancel intermediate
steps, and generally have access to the outputs of intermediate steps.
(I'm still not quite sure how you guys recommend "compound" operations
to be produced).

Note also that the tracking Op is an Untiled one. I don't like this,
because of the implication about forcing computation of full source
images, but I'm not confident enough to implement a more sophisticated
approach, given my experience with JAI. Right now, it suits my
particular product development status, which is only dealing with small
preview images. I suspect I'll need to revisit this when I start to
process the full-size original images, since I bet it's slow when the
tile cache starts to thrash.

Another fudge is that the non-maximal-suppression Op puts an 'edge
direction' array in a property, and this is used by the subsequent
tracking op. This may have originally been OK if these subordinate Ops
were embedded within a higher-level Canny Op (i.e. unseen by clients),
but now that they're visible as stand-alone ops, I think this is a
little flakey. I wanted to keep the non-maximal-suppression and
tracking ops separate, as they take different parameters that my user is
able to tweak independently - thus I don't want to recompute an NMS
image if the user is only changing params for the tracking.

Anyway, hope this helps, and I'm willing to rework this code if I get
pointers from the community.

Cheers,

Jason.

[att1.html]
[canny.tgz]
---------------------------------------------------------------------
To unsubscribe, e-mail: interest-unsubscribe@jai.dev.java.net
For additional commands, e-mail: interest-help@jai.dev.java.net

Brian Burkhalter

Jason,

> Thanks for your helpful responses on the JAI list over recent weeks.

You're welcome.

> Regarding your recent request about contributing my Canny edge finder
> back to JAI, I've attached the relevant classes to this mail, and am
> cc'ing the list in case it's useful to anyone else.

As for contributions to JAI - and this message is for everyone on the list -
we cannot accept any code for (prospective) use in JAI unless we have a signed
Joint Copyright Assignment form on file. For more information on this please
see https://jai.dev.java.net/contribute.html (jai JCAs received this far are
listed here https://jai.dev.java.net/received-jcas.html). Note that the
jai-imageio projects require a separate JCA although they are licensed under a
BSD license as opposed to the JDL/JRL used for JAI.

> The OpImages only
> include processing loops for byte images, but I thought that I may as
> well pass them on because the algorithm seems to be working nicely, and
> the byteLoops may serve as a good sketch for others who may be better
> placed to test int/float loops, tune the existing algorithms, advise on
> better design, or weave them into JAI.

In most cases that is where we have started as well, with the byte version.

> I have only dealt with
> RenderedOps so far, and the code reflects that too - there is no use of
> Renderables anywhere.

Renderable mode is not obligatory but is in many cases simple to support. See
for example the javax.media.jai.CRIFImpl class and the AddCRIF source code.

> I'm sending the code now, since I'm about to rework it significantly in
> a way that is probably not useful to others, as indicated by my recent
> post. I have a need to cancel JAI processing, and so will soon alter
> the CannyEdgeFinder class to instead use queueTiles() to perform the
> processing, since it seems to me that this is the only way to cancel JAI
> jobs. I coded the Canny algorithm as a vanilla Java class rather than
> a composite OpImage, because I did the former originally (as described
> in the list), but ultimately found that this was going to compromise my
> ability to track progress for feedback in my UI, cancel intermediate
> steps, and generally have access to the outputs of intermediate steps.
> (I'm still not quite sure how you guys recommend "compound" operations
> to be produced).

We have little experience of them in fact.

> Note also that the tracking Op is an Untiled one. I don't like this,
> because of the implication about forcing computation of full source
> images, but I'm not confident enough to implement a more sophisticated
> approach, given my experience with JAI. Right now, it suits my
> particular product development status, which is only dealing with small
> preview images. I suspect I'll need to revisit this when I start to
> process the full-size original images, since I bet it's slow when the
> tile cache starts to thrash.

If thrashing occurs that will be almost certain to be true.

I think it was mentioned recently, but you might want to look into the TCTool
listed here

http://java.sun.com/products/java-media/jai/utilities/jaiutils.html

which can aid in TileCache tuning.

> Another fudge is that the non-maximal-suppression Op puts an 'edge
> direction' array in a property, and this is used by the subsequent
> tracking op. This may have originally been OK if these subordinate Ops
> were embedded within a higher-level Canny Op (i.e. unseen by clients),
> but now that they're visible as stand-alone ops, I think this is a
> little flakey. I wanted to keep the non-maximal-suppression and
> tracking ops separate, as they take different parameters that my user is
> able to tweak independently - thus I don't want to recompute an NMS
> image if the user is only changing params for the tracking.
>
> Anyway, hope this helps, and I'm willing to rework this code if I get
> pointers from the community.

Thanks!

Brian

---------------------------------------------------------------------
To unsubscribe, e-mail: interest-unsubscribe@jai.dev.java.net
For additional commands, e-mail: interest-help@jai.dev.java.net

Jason Grant

Given a ceiling on available memory, what sort of tricks can be used to
optimise the processing speed of a linear OpImage chain?

>From what I've learnt so far, it seems to me that the big one to be
careful about is avoiding recomputation of tiles. I'm wondering if a
smart choice of tile size will help the default caching algorithm to
keep relevant tiles in memory. For example, in my case, I have a linear
chain of operations that entails {fileload(jpeg), colorconvert,
convolve, custom AreaOpImage, custom AreaOpImage, filestore(jpeg)}. I'm
hoping that when filestore does its work, it requests, say, tile(0,0),
and the computation of this tile propagates back through the chain
towards the fileload, with neighbouring tiles being used when border
expansion occurs. Hopefully, if I choose a tile size that allows all of
these five OpImage tile(0,0)+border 'layers' to remain in memory, then
I'll have a processing chain that runs entirely off memory. I'm kind of
speculating on how the 'macro' tile request algorithm works here:

Would a chain like this really start with tile(0,0) and attempt to back-
compute it through the chain like I've explained, before moving to, say,
tile(0,1)?

Are there any guesstimates that can be used to calculate this tile size
(e.g. tile_memory=(available_memory)/number_of_nodes)?

How does inclusion of 'untiled' OpImages complicate this? e.g. I guess
they'll mean that all previous tiles might get thrown out of cache.

Are the load/store operations able to work off single tiles like this
(I'm using jpegs for now), or would a load attempt to cache the whole
lot?

Also, if I don't require the chain to be recomputed at a later stage
(e.g. by changing a parameter), could another approach towards keeping
things in memory be to compute a single *entire* operation in the chain,
and then dispose() of the previous step before proceeding with the next,
and so on?

Thanks,

Jason.
[att1.html]

Brian Burkhalter

On Fri, 3 Jun 2005, Jason Grant wrote:

> Given a ceiling on available memory, what sort of tricks can be used to
> optimise the processing speed of a linear OpImage chain?
>
> From what I've learnt so far, it seems to me that the big one to be
> careful about is avoiding recomputation of tiles. I'm wondering if a
> smart choice of tile size will help the default caching algorithm to
> keep relevant tiles in memory. For example, in my case, I have a linear
> chain of operations that entails {fileload(jpeg), colorconvert,
> convolve, custom AreaOpImage, custom AreaOpImage, filestore(jpeg)}. I'm
> hoping that when filestore does its work, it requests, say, tile(0,0),
> and the computation of this tile propagates back through the chain
> towards the fileload, with neighbouring tiles being used when border
> expansion occurs. Hopefully, if I choose a tile size that allows all of
> these five OpImage tile(0,0)+border 'layers' to remain in memory, then
> I'll have a processing chain that runs entirely off memory. I'm kind of
> speculating on how the 'macro' tile request algorithm works here:

There is another thing to consider here when, as in this case, computing a
given destination rectangle requires a larger source rectangle. If an
operation uses cobbling as almost all do in JAI, the larger source rectangle
will be requested using PlanarImage.getData(). This will require allocating a
new Raster to hold the source data and then copying them into this Raster.
This Raster for the cobbled data will of course consume memory as well.

> Would a chain like this really start with tile(0,0) and attempt to back-
> compute it through the chain like I've explained, before moving to, say,
> tile(0,1)?

It all depends on what is requested. If you use a blocking data request call
like getTile() then the complete chain will be executed in a blocking manner
and all computation back propagated until the requested rectangle is computed.
If you use a non-blocking call like PlanarImage.queueTiles() then you might
have overlapping computations.

> Are there any guesstimates that can be used to calculate this tile size
> (e.g. tile_memory=(available_memory)/number_of_nodes)?

Nothing is published but I think that could be derived if you control the tile
dimensions throughout and know the details of the padding required for example
for all the area and geometric operations in the chain.

> How does inclusion of 'untiled' OpImages complicate this? e.g. I guess
> they'll mean that all previous tiles might get thrown out of cache.

Computing an untiled OpImage will force computation of all tiles in all
sources of the OpImage.

> Are the load/store operations able to work off single tiles like this
> (I'm using jpegs for now), or would a load attempt to cache the whole
> lot?

For JPEG read or write the entire image will be read or written. If you are
using JAI or JAI Image I/O Tools the TIFF reader can read individual tiles
from its source. If you are using JAI-Image I/O Tools the JPEG 2000 reader
also supports tiling. Also the PNM reader can read random regions from source
images as I think can the Raw reader thereby minimizing memory use.

> Also, if I don't require the chain to be recomputed at a later stage
> (e.g. by changing a parameter), could another approach towards keeping
> things in memory be to compute a single *entire* operation in the chain,
> and then dispose() of the previous step before proceeding with the next,
> and so on?

Sure you can do that. You can supply a separate TileCache to each operation. A
"cheap" operation might benefit for example from a zero-size cache so nothing
is saved whereas an expensive operation might have its own cache which can
hold the entire image. Interposing a TiledImage is also a means of caching an
entire image.

I would recommend also looking into data array recycling:

http://java.sun.com/products/java-media/jai/README-1_1_2.html#Core

This might help minimize for example the impact of Rasters allocated expressly
for tile cobbling.

Brian

----------------
Brian Burkhalter
Advanced Development, Graphics and Media
Software Chief Technology Office
Sun Microsystems, Inc.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This email message is for the sole use of the intended recipient(s)
and may contain confidential and privileged information. Any
unauthorized review, use, disclosure or distribution is prohibited.
If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

---------------------------------------------------------------------
To unsubscribe, e-mail: interest-unsubscribe@jai.dev.java.net
For additional commands, e-mail: interest-help@jai.dev.java.net