What's a trick way to clamp a byte to 255 - 0?

24 replies [Last post]
Anonymous

I'm doing the usual byte shift stuff to make a pixel. Part of my bi-cubic interpolation. Right now I clamp the value of a byte to be between 255 and 0 to prevent technicolor speculation. You know, those little annoying dots of out of place color.

This is what I'm doing now.

r = (int)(B0*(yReds[0]&0xff) + B1*(yReds[1]&0xff) + B2*(yReds[2]&0xff) + B3*(yReds[3]&0xff));
g = (int)(B0*(yGrns[0]&0xff) + B1*(yGrns[1]&0xff) + B2*(yGrns[2]&0xff) + B3*(yGrns[3]&0xff));
b = (int)(B0*(yBlus[0]&0xff) + B1*(yBlus[1]&0xff) + B2*(yBlus[2]&0xff) + B3*(yBlus[3]&0xff));

if(r < 0) r = 0;
if(r > 255) r = 255;
if(g < 0) g = 0;
if(g > 255) g = 255;
if(b < 0) b = 0;
if(b > 255) b = 255;

Clumsy and awkward. Is there a better way to clamp the values?
Maybe with a mask or something?

===========================================================================
To unsubscribe, send email to listserv@java.sun.com and include in the body
of the message "signoff JAVA2D-INTEREST". For general help, send email to
listserv@java.sun.com and include in the body of the message "help".

Ken Warner

It was a localization problem. I was using class variables for all the intermediate channel values. Switching to local variables -- method variables -- I see the slight speed up I was expecting.

Example:

r3 = (int)((((p0&0x00FF0000) >> 16)*A0) + (((p1&0x00FF0000) >> 16)*A1) + (((p2&0x00FF0000) >> 16)*A2) + (((p3&0x00FF0000) >> 16)*A3));
g3 = (int)((((p0&0x0000FF00) >> 8)*A0) + (((p1&0x0000FF00) >> 8)*A1) + (((p2&0x0000FF00) >> 8)*A2) + (((p3&0x0000FF00) >> 8)*A3));
b3 = (int)((((p0&0x000000FF))*A0) + (((p1&0x000000FF))*A1) + (((p2&0x000000FF))*A2) + (((p3&0x000000FF))*A3));

r3 = ((((r3 << 23) >> 31) | r3) & ~(r3>>31))&0xFF;
g3 = ((((g3 << 23) >> 31) | g3) & ~(g3>>31))&0xFF;
b3 = ((((b3 << 23) >> 31) | b3) & ~(b3>>31))&0xFF;

If r3,g3,b3 are class variables, I see the slowdown. Changing to local variables -- stack variables -- I see a slight improvment.

So the triple shift made the interpolation slower (aprox. 1350ms to 1440ms) using class variables.

Using local variables I see times like below. So a few milliseconds are saved. It's easy to argue that the time spent engineering is more than the time gained at runtime. I wonder if a place like Pixar would agree.

Time = 1302
Waiting...
Time = 1302
Waiting...
Time = 1322
Waiting...
Time = 1292
Waiting...
Time = 1312
Waiting...
Time = 1302
Waiting...
Time = 1352
Waiting...
Time = 1312
Waiting...
Time = 1292
Waiting...
Time = 1302
Waiting...
Time = 1302
Waiting...
Time = 1302
Waiting...
Time = 1422
Waiting...
Time = 1412
Waiting...
Time = 1342
Waiting...
Time = 1292
Waiting...
Time = 1302
Waiting...
Time = 1292
Waiting...

>>In my test suite, the time to interpolate a 944 x 644 image went from 1350ms using the logical test clamp
>>to 1440ms using the triple shift. I'm not sure why... I work on a real slow (800mhz) machine just
>>so I can see these kinds of things better.
>
>
> Could you post that test? Maybe someone can spot what makes the difference.
> [Message sent by forum member 'rah003' (rah003)]
>
>
> ===========================================================================
> To unsubscribe, send email to listserv@java.sun.com and include in the body
> of the message "signoff JAVA2D-INTEREST". For general help, send email to
> listserv@java.sun.com and include in the body of the message "help".
>
>

===========================================================================
To unsubscribe, send email to listserv@java.sun.com and include in the body
of the message "signoff JAVA2D-INTEREST". For general help, send email to
listserv@java.sun.com and include in the body of the message "help".

Jim Graham

All of this discussion happened while I was away on vacation. The code
that came up was pretty much what I tend to use for clamping without
branches, but there are a couple of things to note for this particular
application:

It's only valid for values that overflow in the range MININT=>511 and it
only clamps to the output range 0=>255.

The reason for the first constraint is that c<<23 only sets the high
order bit if the overflow stays under 512. If you get up to 512 then
the 9th bit is clear again and the "clamp to 255" part fails. (Though,
it will also correctly clamp values between 768 and 1023 and every other
256-sized range of overflow values up to MAXINT - i.e. the ones that
manage to have the 9th bit set...)

The reason for the second constraint is because of the assumptions in
how the math was constructed.

The first constraint may not be a problem if the Cubic Interpolation
values can never cause the value to sum up to 2x the maximum value for a
component which is probably true, but depends on the cubic formula used
to generate the interpolation coefficients (there are a few such
formulas in common use) and so needs to be proven for a given formula.

The second constraint doesn't seem like a problem if you are always
dealing with 8-bit values which is very typical for today's computer
graphics, except for one issue which has been ignored.

The issue is that image interpolation should really be done in a
premultiplied form - as should alpha compositing. The reason for this
is that if you blend the 4 components (Alpha, R, G, and B) separately in
a non-premultiplied form then a transparent pixel with a non-zero value
in, say, its red component will contribute some red tinting to the final
answer even though it was transparent and should not have contributed
any energy in the first place.

Java2D, for example, does linear and cubic interpolation internally in
the premultiplied form in the Graphics2D implementation for this reason.

Why is this an issue for the clamping equations? The reason is that if
your accumulation values are premultiplied then the alpha component
needs to be clamped to 0=>255, but the color components need to be
clamped to the range 0=>alpha. The different range on the clamping of
the color components means that the proposed equations won't work if you
perform your calculations in the premultiplied form.

What I've used instead is a sequence of operations that can clamp any
input range to an arbitrary 0=>N output range as follows:

c &= ~(c >> 31);
c -= N;
c &= (c >> 31);
c += N;

After the first line c is in the range 0=>MAXINT with all negative
values mapped to 0. After the second line c is set up so that all valid
values are in the range -N=>0 and all positive values are overflow
values. After the third line all values are in the range -N=>0 with the
positive values mapped to 0. After the fourth line all values are
finally in the range 0=>N...

(Also, I believe it uses the same number of ALU instructions as the
((((c << 23) >> 31) | c) & ~(c>>31))&0xFF formula...)

...jim

===========================================================================
To unsubscribe, send email to listserv@java.sun.com and include in the body
of the message "signoff JAVA2D-INTEREST". For general help, send email to
listserv@java.sun.com and include in the body of the message "help".

Ken Warner

Jim,

EXCELLENT! I need to do a couple of things then.

1) Check to see if the multiplication by the basis functions can possibly
lead to the kind of values that can possibly cause the kind of overflow problems you describe. I think they might.

2) Generate a test case for the code you provided and post results.

The premultiplication problem will not be an issue for my little interpolator since I do not composite. But it would be for someone trying to use my interpolator for another purpose than what it was designed for.

The reason for the existence of my interpolator is that the primary function of my cylindrical panorama viewer is to project (the best I can) a cylindrical image into a rectified viewport. Interpolation before or after the projection simply doesn't work so the interpolator is integrated into the projection code.

It would have been real nice to be able to hand off a chunk of the cylindrical image to a Java2D bi-cubic interpolator and get a nicely interpolated rectified viewport but that is impossible. The corners of the source image chunk are stretched in a non-linear fashion to fit the canvas. It's not a simple resize.

But this is really interesting information. And maybe someday someone will ask me about it and I can sound smart... :-)

Ken

Jim Graham wrote:
> All of this discussion happened while I was away on vacation. The code
> that came up was pretty much what I tend to use for clamping without
> branches, but there are a couple of things to note for this particular
> application:
>
> It's only valid for values that overflow in the range MININT=>511 and it
> only clamps to the output range 0=>255.
>
> The reason for the first constraint is that c<<23 only sets the high
> order bit if the overflow stays under 512. If you get up to 512 then
> the 9th bit is clear again and the "clamp to 255" part fails. (Though,
> it will also correctly clamp values between 768 and 1023 and every other
> 256-sized range of overflow values up to MAXINT - i.e. the ones that
> manage to have the 9th bit set...)
>
> The reason for the second constraint is because of the assumptions in
> how the math was constructed.
>
> The first constraint may not be a problem if the Cubic Interpolation
> values can never cause the value to sum up to 2x the maximum value for a
> component which is probably true, but depends on the cubic formula used
> to generate the interpolation coefficients (there are a few such
> formulas in common use) and so needs to be proven for a given formula.
>
> The second constraint doesn't seem like a problem if you are always
> dealing with 8-bit values which is very typical for today's computer
> graphics, except for one issue which has been ignored.
>
> The issue is that image interpolation should really be done in a
> premultiplied form - as should alpha compositing. The reason for this
> is that if you blend the 4 components (Alpha, R, G, and B) separately in
> a non-premultiplied form then a transparent pixel with a non-zero value
> in, say, its red component will contribute some red tinting to the final
> answer even though it was transparent and should not have contributed
> any energy in the first place.
>
> Java2D, for example, does linear and cubic interpolation internally in
> the premultiplied form in the Graphics2D implementation for this reason.
>
> Why is this an issue for the clamping equations? The reason is that if
> your accumulation values are premultiplied then the alpha component
> needs to be clamped to 0=>255, but the color components need to be
> clamped to the range 0=>alpha. The different range on the clamping of
> the color components means that the proposed equations won't work if you
> perform your calculations in the premultiplied form.
>
> What I've used instead is a sequence of operations that can clamp any
> input range to an arbitrary 0=>N output range as follows:
>
> c &= ~(c >> 31);
> c -= N;
> c &= (c >> 31);
> c += N;
>
> After the first line c is in the range 0=>MAXINT with all negative
> values mapped to 0. After the second line c is set up so that all valid
> values are in the range -N=>0 and all positive values are overflow
> values. After the third line all values are in the range -N=>0 with the
> positive values mapped to 0. After the fourth line all values are
> finally in the range 0=>N...
>
> (Also, I believe it uses the same number of ALU instructions as the
> ((((c << 23) >> 31) | c) & ~(c>>31))&0xFF formula...)
>
> ...jim
>
> ===========================================================================
> To unsubscribe, send email to listserv@java.sun.com and include in the body
> of the message "signoff JAVA2D-INTEREST". For general help, send email to
> listserv@java.sun.com and include in the body of the message "help".
>
>

===========================================================================
To unsubscribe, send email to listserv@java.sun.com and include in the body
of the message "signoff JAVA2D-INTEREST". For general help, send email to
listserv@java.sun.com and include in the body of the message "help".

Jim Graham

> The premultiplication problem will not be an issue for my little
> interpolator since I do not composite. But it would be for someone
> trying to use my interpolator for another purpose than what it was
> designed for.

You said that you don't need have the premultiplication problem because
you do not composite, but I wanted to be clear about the terminology
here in case someone else comes across this thread.

The premultiplied form should be used if you have alpha in the images,
like your images don't have or need alpha since you are simply doing
panoramic stretching of opaque photographs, right? So, it's really the
lack of an alpha channel which means that the premultiplied issue isn't
applicable here, not the lack of "compositing"...

...jim

===========================================================================
To unsubscribe, send email to listserv@java.sun.com and include in the body
of the message "signoff JAVA2D-INTEREST". For general help, send email to
listserv@java.sun.com and include in the body of the message "help".

rah003
Offline
Joined: 2004-05-26

> In my test suite, the time to interpolate a 944 x 644 image went from 1350ms using the logical test clamp
> to 1440ms using the triple shift. I'm not sure why... I work on a real slow (800mhz) machine just
> so I can see these kinds of things better.

Could you post that test? Maybe someone can spot what makes the difference.

Ken Warner

Not really, it's the whole applet. I just stick in some timing code to watch sections of it. Just imagine those logical tests replaced by the triple shift. I'll post a snippet tomorrow.

See:

http://pancyl.com/

>>In my test suite, the time to interpolate a 944 x 644 image went from 1350ms using the logical test clamp
>>to 1440ms using the triple shift. I'm not sure why... I work on a real slow (800mhz) machine just
>>so I can see these kinds of things better.
>
>
> Could you post that test? Maybe someone can spot what makes the difference.
> [Message sent by forum member 'rah003' (rah003)]
>
>
> ===========================================================================
> To unsubscribe, send email to listserv@java.sun.com and include in the body
> of the message "signoff JAVA2D-INTEREST". For general help, send email to
> listserv@java.sun.com and include in the body of the message "help".
>
>

===========================================================================
To unsubscribe, send email to listserv@java.sun.com and include in the body
of the message "signoff JAVA2D-INTEREST". For general help, send email to
listserv@java.sun.com and include in the body of the message "help".

Jim

It could be due to branch prediction by the cpu.

Ken Warner wrote:
> Yeah, when I integrated it into my bi-cubic interpolator, it slowed it
> down compared to the logical check clamp. I'm not sure why yet. Maybe
> it was the way I integrated it. I don't know.
>
> In my test suite, the time to interpolate a 944 x 644 image went from
> 1350ms using the logical test clamp to 1440ms using the triple shift.
> I'm not sure why... I work on a real slow (800mhz) machine just so I
> can see these kinds of things better.

===========================================================================
To unsubscribe, send email to listserv@java.sun.com and include in the body
of the message "signoff JAVA2D-INTEREST". For general help, send email to
listserv@java.sun.com and include in the body of the message "help".

Peter B. West

Branch prediction would work just as well in the original tests Ken ran,
I think. There must be something else going on.

Jim wrote:
> It could be due to branch prediction by the cpu.
>
> Ken Warner wrote:
>> Yeah, when I integrated it into my bi-cubic interpolator, it slowed it
>> down compared to the logical check clamp. I'm not sure why yet. Maybe
>> it was the way I integrated it. I don't know.
>>
>> In my test suite, the time to interpolate a 944 x 644 image went from
>> 1350ms using the logical test clamp to 1440ms using the triple shift.
>> I'm not sure why... I work on a real slow (800mhz) machine just so I
>> can see these kinds of things better.
>

--
Peter B. West
Folio

===========================================================================
To unsubscribe, send email to listserv@java.sun.com and include in the body
of the message "signoff JAVA2D-INTEREST". For general help, send email to
listserv@java.sun.com and include in the body of the message "help".

Ken Warner

Hey Peter, it works! Nice job! I'll work it into my code and see
if I get any speed up. I only time to the milliseconds but a millisecond
here -- a millisecond there. Pretty soon it adds up to a decisecond...

int foo = 0;
int bar = 0;
for(int i = -10; i < 265; i++)
{
//foo = (i >>> 31) | (i & 0xFF);
//foo = (i & 0xff);
foo = i;
bar = (((((foo << 23) >> 31) | foo) & 0xFF) & ~(foo>>31))&0xFF;
System.err.println(i + ": foo = " + foo + ", bar = " + bar );
}
-5: foo = -5, bar = 0
-4: foo = -4, bar = 0
-3: foo = -3, bar = 0
-2: foo = -2, bar = 0
-1: foo = -1, bar = 0
0: foo = 0, bar = 0
1: foo = 1, bar = 1
2: foo = 2, bar = 2
3: foo = 3, bar = 3
.
.
.
252: foo = 252, bar = 252
253: foo = 253, bar = 253
254: foo = 254, bar = 254
255: foo = 255, bar = 255
256: foo = 256, bar = 255
257: foo = 257, bar = 255
258: foo = 258, bar = 255

Ken Warner wrote:
> Well, I did a test. The double shift does clamp the top end. But negative
> numbers are twisted to be 255. In bi-cubic interpolation (the reason
> for all
> this nonsense) negative byte (channel) values can occur. The goal is to
> clamp to the range [0-255] so -100 goes to 0 and 300 goes to 255.
>
> Here's the snippet:
>
> // sum = (s1[s1PixelOffset]&0xFF) + (s2[s2PixelOffset]&0xFF);
> // d[dPixelOffset] = (byte)((((sum<<23) >> 31) | sum) & 0xFF);
> int foo = 0;
> int bar = 0;
> for(int i = -10; i < 265; i++)
> {
> foo = i;
> bar = ((((foo << 23) >> 31) | foo) & 0xFF);
> System.err.println(i + ": foo = " + foo + ", bar = " + bar );
> }
>
> And here is the abbreviated output:
> -4: foo = -4, bar = 255
> -3: foo = -3, bar = 255
> -2: foo = -2, bar = 255
> -1: foo = -1, bar = 255
> 0: foo = 0, bar = 0
> 1: foo = 1, bar = 1
> 2: foo = 2, bar = 2
> 3: foo = 3, bar = 3
> 4: foo = 4, bar = 4
> .
> .
> .
> 252: foo = 252, bar = 252
> 253: foo = 253, bar = 253
> 254: foo = 254, bar = 254
> 255: foo = 255, bar = 255
> 256: foo = 256, bar = 255
> 257: foo = 257, bar = 255
> 258: foo = 258, bar = 255
> 259: foo = 259, bar = 255
>
>
> Finally, com.sun.media.jai.util.ImageUtil.java does this to clamp a byte
> value
> which is equivalent to what I was doing. I need to clamp both ends
> reliably.
> I wonder if it can be done with shifts and masks? Maybe I've done the best
> that can be done.
>
> public static final byte clampByte(int in) {
> return (in > 0xFF ? (byte)0xFF : (in >= 0 ? (byte)in : (byte)0));
> }
>
> Ken Warner wrote:
>
>> Never mind -- found it...
>>
>> http://java.sun.com/docs/books/jls/third_edition/html/expressions.html#1...
>>
>>
>> At run time, shift operations are performed on the two's complement
>> integer representation of the value of the left operand.
>>
>> The value of n< >> equivalent (even if overflow occurs) to multiplication by two to the
>> power s.
>>
>> The value of n>>s is n right-shifted s bit positions with
>> sign-extension. The resulting value is ï¿½ n/2sâŒ‹. For nonnegative values
>> of n, this is equivalent to truncating integer division, as computed
>> by the integer division operator /, by two to the power s.
>>
>> Now if I could only understand this...
>>
>>
>>>> if(r < 0) r = 0;
>>>> if(r > 255) r = 255;
>>>> if(g < 0) g = 0;
>>>> if(g > 255) g = 255;
>>>> if(b < 0) b = 0;
>>>> if(b > 255) b = 255;
>>>>
>>>> Clumsy and awkward. Is there a better way to clamp
>>>> the values?
>>>> Maybe with a mask or something?
>>>
>>>
>>>
>>>
>>> If your "better" meant "faster", you might like to take a look at this:
>>> https://jai-core.dev.java.net/source/browse/jai-core/src/share/classes/c...
>>>
>>> under computeRectByte()
>>>
>>> //
>>> // The next two lines are a fast way to do
>>> // an add with saturation on U8 elements.
>>> // It eliminates the need to do clamping.
>>> //
>>> sum = (s1[s1PixelOffset]&0xFF) +
>>> (s2[s2PixelOffset]&0xFF);
>>> d[dPixelOffset] = (byte)((((sum<<23) >> 31) |
>>> sum) & 0xFF);
>>>
>>> HTH,
>>> -James
>>> [Message sent by forum member 'jxc' (jxc)]
>>>
>>>
>>> ===========================================================================
>>>
>>> To unsubscribe, send email to listserv@java.sun.com and include in
>>> the body
>>> of the message "signoff JAVA2D-INTEREST". For general help, send
>>> email to
>>> listserv@java.sun.com and include in the body of the message "help".
>>>
>>>
>>
>> ===========================================================================
>>
>> To unsubscribe, send email to listserv@java.sun.com and include in the
>> body
>> of the message "signoff JAVA2D-INTEREST". For general help, send
>> email to
>> listserv@java.sun.com and include in the body of the message "help".
>>
>>
>

===========================================================================
To unsubscribe, send email to listserv@java.sun.com and include in the body
of the message "signoff JAVA2D-INTEREST". For general help, send email to
listserv@java.sun.com and include in the body of the message "help".

Ken Warner

I did a test -- I did it on a real slow (800mhz) machine so the
differences are magnified but the shift clamp method works a little
faster than the logical test. Enough to include it in my code.
Results at the bottom. And surely enough to be used as the clamping
strategy in:

com.sun.media.jai.util.ImageUtil.java

int foo = 0;
int bar = 0;
long t0 = 0l;
long t1 = 0l;
for(int j = 0; j < 100; j++)
{
t0 = System.currentTimeMillis();
for(int i = -10000000; i < 10000000; i++)
{
//foo = (i >>> 31) | (i & 0xFF);
//foo = (i & 0xff);
foo = i;
bar = (((((foo << 23) >> 31) | foo) & 0xFF) & ~(foo>>31))&0xFF;

//System.err.println(i + ": foo = " + foo + ", bar = " + bar );

}
t1 = System.currentTimeMillis();
System.err.println("(1)Time = " +(t1 - t0));

t0 = System.currentTimeMillis();
for(int i = -10000000; i < 10000000; i++)
{

foo = i;
if(foo > 255)foo = 255;
else if(foo < 0)foo = 0;

//System.err.println(i + ": foo = " + foo + ", bar = " + bar );

}
t1 = System.currentTimeMillis();
System.err.println("(2)Time = " +(t1 - t0));
System.err.println("----------------");

}

Ken Warner wrote:
> Hey Peter, it works! Nice job! I'll work it into my code and see
> if I get any speed up. I only time to the milliseconds but a millisecond
> here -- a millisecond there. Pretty soon it adds up to a decisecond...
>
>
> int foo = 0;
> int bar = 0;
> for(int i = -10; i < 265; i++)
> {
> //foo = (i >>> 31) | (i & 0xFF);
> //foo = (i & 0xff);
> foo = i;
> bar = (((((foo << 23) >> 31) | foo) & 0xFF) & ~(foo>>31))&0xFF;
> System.err.println(i + ": foo = " + foo + ", bar = " + bar );
> }
> -5: foo = -5, bar = 0
> -4: foo = -4, bar = 0
> -3: foo = -3, bar = 0
> -2: foo = -2, bar = 0
> -1: foo = -1, bar = 0
> 0: foo = 0, bar = 0
> 1: foo = 1, bar = 1
> 2: foo = 2, bar = 2
> 3: foo = 3, bar = 3
> .
> .
> .
> 252: foo = 252, bar = 252
> 253: foo = 253, bar = 253
> 254: foo = 254, bar = 254
> 255: foo = 255, bar = 255
> 256: foo = 256, bar = 255
> 257: foo = 257, bar = 255
> 258: foo = 258, bar = 255
>
1)Time = 200
(2)Time = 221
----------------
(1)Time = 220
(2)Time = 210
----------------
(1)Time = 231
(2)Time = 250
----------------
(1)Time = 180
(2)Time = 221
----------------
(1)Time = 170
(2)Time = 220
----------------
(1)Time = 221
(2)Time = 270
----------------
(1)Time = 190
(2)Time = 221
----------------
(1)Time = 190
(2)Time = 220
----------------
(1)Time = 160
(2)Time = 221
----------------
(1)Time = 170
(2)Time = 210
----------------
(1)Time = 191
(2)Time = 220
----------------
(1)Time = 180
(2)Time = 210
----------------
(1)Time = 181
(2)Time = 220
----------------
(1)Time = 170
(2)Time = 221
----------------
(1)Time = 180
(2)Time = 220
----------------
(1)Time = 180
(2)Time = 211
----------------
(1)Time = 180
(2)Time = 210
----------------
(1)Time = 181
(2)Time = 220
----------------
(1)Time = 170
(2)Time = 210
----------------
(1)Time = 181
(2)Time = 210
----------------
(1)Time = 180
(2)Time = 221
----------------
(1)Time = 170
(2)Time = 220
----------------
(1)Time = 180
(2)Time = 211
----------------
(1)Time = 180
(2)Time = 210
----------------
(1)Time = 180
(2)Time = 221
----------------
(1)Time = 170
(2)Time = 220
----------------
(1)Time = 181
(2)Time = 210
----------------
(1)Time = 180
(2)Time = 210
----------------
(1)Time = 180
(2)Time = 220
----------------
(1)Time = 170
(2)Time = 221
----------------
(1)Time = 180
(2)Time = 220
----------------
(1)Time = 170
(2)Time = 251
----------------
(1)Time = 210
(2)Time = 220
----------------
(1)Time = 181
(2)Time = 210
----------------
(1)Time = 180
(2)Time = 221
----------------
(1)Time = 200
(2)Time = 370
----------------
(1)Time = 181
(2)Time = 210
----------------
(1)Time = 180
(2)Time = 221
----------------
(1)Time = 180
(2)Time = 210
----------------
(1)Time = 180
(2)Time = 221
----------------
(1)Time = 170
(2)Time = 220
----------------
(1)Time = 181
(2)Time = 210
----------------
(1)Time = 180
(2)Time = 210
----------------
(1)Time = 181
(2)Time = 220
----------------
(1)Time = 170
(2)Time = 291
----------------
(1)Time = 180
(2)Time = 220
----------------
(1)Time = 170
(2)Time = 221
----------------
(1)Time = 180
(2)Time = 220
----------------
(1)Time = 181
(2)Time = 230
----------------
(1)Time = 180
(2)Time = 210
----------------
(1)Time = 181
(2)Time = 220
----------------
(1)Time = 170
(2)Time = 221
----------------
(1)Time = 180
(2)Time = 230
----------------
(1)Time = 190
(2)Time = 211
----------------
(1)Time = 190
(2)Time = 250
----------------
(1)Time = 191
(2)Time = 220
----------------
(1)Time = 180
(2)Time = 221
----------------
(1)Time = 160
(2)Time = 230
----------------
(1)Time = 220
(2)Time = 241
----------------
(1)Time = 180
(2)Time = 250
----------------
(1)Time = 191
(2)Time = 210
----------------
(1)Time = 180
(2)Time = 221
----------------
(1)Time = 160
(2)Time = 220
----------------
(1)Time = 170
(2)Time = 221
----------------
(1)Time = 180
(2)Time = 240
----------------
(1)Time = 181
(2)Time = 220
----------------
(1)Time = 180
(2)Time = 220
----------------
(1)Time = 191
(2)Time = 210
----------------
(1)Time = 180
(2)Time = 221
----------------
(1)Time = 300
(2)Time = 361
----------------
(1)Time = 230
(2)Time = 220
----------------
(1)Time = 170
(2)Time = 221
----------------
(1)Time = 220
(2)Time = 250
----------------
(1)Time = 211
(2)Time = 320
----------------
(1)Time = 180
(2)Time = 211
----------------
(1)Time = 180
(2)Time = 230
----------------
(1)Time = 181
(2)Time = 210
----------------
(1)Time = 180
(2)Time = 220
----------------
(1)Time = 181
(2)Time = 230
----------------
(1)Time = 180
(2)Time = 211
----------------
(1)Time = 180
(2)Time = 210
----------------
(1)Time = 180
(2)Time = 221
----------------
(1)Time = 180
(2)Time = 210
----------------
(1)Time = 181
(2)Time = 210
----------------
(1)Time = 180
(2)Time = 220
----------------
(1)Time = 171
(2)Time = 220
----------------
(1)Time = 180
(2)Time = 211
----------------
(1)Time = 180
(2)Time = 220
----------------
(1)Time = 170
(2)Time = 271
----------------
(1)Time = 310
(2)Time = 371
----------------
(1)Time = 170
(2)Time = 240
----------------
(1)Time = 181
(2)Time = 290
----------------
(1)Time = 180
(2)Time = 211
----------------
(1)Time = 180
(2)Time = 220
----------------
(1)Time = 180
(2)Time = 231
----------------
(1)Time = 180
(2)Time = 210
----------------
(1)Time = 181
(2)Time = 330
----------------
(1)Time = 331
(2)Time = 230
----------------
(1)Time = 180
(2)Time = 271
----------------
(1)Time = 260
(2)Time = 270
----------------

===========================================================================
To unsubscribe, send email to listserv@java.sun.com and include in the body
of the message "signoff JAVA2D-INTEREST". For general help, send email to
listserv@java.sun.com and include in the body of the message "help".

Peter B. West

Ken,

Looking at this again, there appears to one too many byte masks.
bar = ((((foo << 23) >> 31) | foo) & ~(foo>>31))&0xFF;
should do.

Peter

Ken Warner wrote:
> I did a test -- I did it on a real slow (800mhz) machine so the
> differences are magnified but the shift clamp method works a little
> faster than the logical test. Enough to include it in my code.
> Results at the bottom. And surely enough to be used as the clamping
> strategy in:
>
> com.sun.media.jai.util.ImageUtil.java
>
>
> int foo = 0;
> int bar = 0;
> long t0 = 0l;
> long t1 = 0l;
> for(int j = 0; j < 100; j++)
> {
> t0 = System.currentTimeMillis();
> for(int i = -10000000; i < 10000000; i++)
> {
> //foo = (i >>> 31) | (i & 0xFF);
> //foo = (i & 0xff);
> foo = i;
> bar = (((((foo << 23) >> 31) | foo) & 0xFF) &
> ~(foo>>31))&0xFF;
>
> //System.err.println(i + ": foo = " + foo + ", bar = " +
> bar );
>
> }
> t1 = System.currentTimeMillis();
> System.err.println("(1)Time = " +(t1 - t0));
>
> t0 = System.currentTimeMillis();
> for(int i = -10000000; i < 10000000; i++)
> {
>
> foo = i;
> if(foo > 255)foo = 255;
> else if(foo < 0)foo = 0;
>
> //System.err.println(i + ": foo = " + foo + ", bar = " +
> bar );
>
> }
> t1 = System.currentTimeMillis();
> System.err.println("(2)Time = " +(t1 - t0));
> System.err.println("----------------");
>
> }
>
> Ken Warner wrote:
>> Hey Peter, it works! Nice job! I'll work it into my code and see
>> if I get any speed up. I only time to the milliseconds but a millisecond
>> here -- a millisecond there. Pretty soon it adds up to a decisecond...
>>
>>
>> int foo = 0;
>> int bar = 0;
>> for(int i = -10; i < 265; i++)
>> {
>> //foo = (i >>> 31) | (i & 0xFF);
>> //foo = (i & 0xff);
>> foo = i;
>> bar = (((((foo << 23) >> 31) | foo) & 0xFF) & ~(foo>>31))&0xFF;
>> System.err.println(i + ": foo = " + foo + ", bar = " + bar );
>> }
>> -5: foo = -5, bar = 0
>> -4: foo = -4, bar = 0
>> -3: foo = -3, bar = 0
>> -2: foo = -2, bar = 0
>> -1: foo = -1, bar = 0
>> 0: foo = 0, bar = 0
>> 1: foo = 1, bar = 1
>> 2: foo = 2, bar = 2
>> 3: foo = 3, bar = 3
>> .
>> .
>> .
>> 252: foo = 252, bar = 252
>> 253: foo = 253, bar = 253
>> 254: foo = 254, bar = 254
>> 255: foo = 255, bar = 255
>> 256: foo = 256, bar = 255
>> 257: foo = 257, bar = 255
>> 258: foo = 258, bar = 255
>>
> 1)Time = 200
> (2)Time = 221
> ----------------
> (1)Time = 220
> (2)Time = 210
> ----------------
> (1)Time = 231
> (2)Time = 250
> ----------------
> (1)Time = 180
> (2)Time = 221
> ----------------
> (1)Time = 170
> (2)Time = 220
> ----------------
> (1)Time = 221
> (2)Time = 270
> ----------------
> (1)Time = 190
> (2)Time = 221
> ----------------
> (1)Time = 190
> (2)Time = 220
> ----------------
> (1)Time = 160
> (2)Time = 221
> ----------------
> (1)Time = 170
> (2)Time = 210
> ----------------
> (1)Time = 191
> (2)Time = 220
> ----------------
> (1)Time = 180
> (2)Time = 210
> ----------------
> (1)Time = 181
> (2)Time = 220
> ----------------
> (1)Time = 170
> (2)Time = 221
> ----------------
> (1)Time = 180
> (2)Time = 220
> ----------------
> (1)Time = 180
> (2)Time = 211
> ----------------
> (1)Time = 180
> (2)Time = 210
> ----------------
> (1)Time = 181
> (2)Time = 220
> ----------------
> (1)Time = 170
> (2)Time = 210
> ----------------
> (1)Time = 181
> (2)Time = 210
> ----------------
> (1)Time = 180
> (2)Time = 221
> ----------------
> (1)Time = 170
> (2)Time = 220
> ----------------
> (1)Time = 180
> (2)Time = 211
> ----------------
> (1)Time = 180
> (2)Time = 210
> ----------------
> (1)Time = 180
> (2)Time = 221
> ----------------
> (1)Time = 170
> (2)Time = 220
> ----------------
> (1)Time = 181
> (2)Time = 210
> ----------------
> (1)Time = 180
> (2)Time = 210
> ----------------
> (1)Time = 180
> (2)Time = 220
> ----------------
> (1)Time = 170
> (2)Time = 221
> ----------------
> (1)Time = 180
> (2)Time = 220
> ----------------
> (1)Time = 170
> (2)Time = 251
> ----------------
> (1)Time = 210
> (2)Time = 220
> ----------------
> (1)Time = 181
> (2)Time = 210
> ----------------
> (1)Time = 180
> (2)Time = 221
> ----------------
> (1)Time = 200
> (2)Time = 370
> ----------------
> (1)Time = 181
> (2)Time = 210
> ----------------
> (1)Time = 180
> (2)Time = 221
> ----------------
> (1)Time = 180
> (2)Time = 210
> ----------------
> (1)Time = 180
> (2)Time = 221
> ----------------
> (1)Time = 170
> (2)Time = 220
> ----------------
> (1)Time = 181
> (2)Time = 210
> ----------------
> (1)Time = 180
> (2)Time = 210
> ----------------
> (1)Time = 181
> (2)Time = 220
> ----------------
> (1)Time = 170
> (2)Time = 291
> ----------------
> (1)Time = 180
> (2)Time = 220
> ----------------
> (1)Time = 170
> (2)Time = 221
> ----------------
> (1)Time = 180
> (2)Time = 220
> ----------------
> (1)Time = 181
> (2)Time = 230
> ----------------
> (1)Time = 180
> (2)Time = 210
> ----------------
> (1)Time = 181
> (2)Time = 220
> ----------------
> (1)Time = 170
> (2)Time = 221
> ----------------
> (1)Time = 180
> (2)Time = 230
> ----------------
> (1)Time = 190
> (2)Time = 211
> ----------------
> (1)Time = 190
> (2)Time = 250
> ----------------
> (1)Time = 191
> (2)Time = 220
> ----------------
> (1)Time = 180
> (2)Time = 221
> ----------------
> (1)Time = 160
> (2)Time = 230
> ----------------
> (1)Time = 220
> (2)Time = 241
> ----------------
> (1)Time = 180
> (2)Time = 250
> ----------------
> (1)Time = 191
> (2)Time = 210
> ----------------
> (1)Time = 180
> (2)Time = 221
> ----------------
> (1)Time = 160
> (2)Time = 220
> ----------------
> (1)Time = 170
> (2)Time = 221
> ----------------
> (1)Time = 180
> (2)Time = 240
> ----------------
> (1)Time = 181
> (2)Time = 220
> ----------------
> (1)Time = 180
> (2)Time = 220
> ----------------
> (1)Time = 191
> (2)Time = 210
> ----------------
> (1)Time = 180
> (2)Time = 221
> ----------------
> (1)Time = 300
> (2)Time = 361
> ----------------
> (1)Time = 230
> (2)Time = 220
> ----------------
> (1)Time = 170
> (2)Time = 221
> ----------------
> (1)Time = 220
> (2)Time = 250
> ----------------
> (1)Time = 211
> (2)Time = 320
> ----------------
> (1)Time = 180
> (2)Time = 211
> ----------------
> (1)Time = 180
> (2)Time = 230
> ----------------
> (1)Time = 181
> (2)Time = 210
> ----------------
> (1)Time = 180
> (2)Time = 220
> ----------------
> (1)Time = 181
> (2)Time = 230
> ----------------
> (1)Time = 180
> (2)Time = 211
> ----------------
> (1)Time = 180
> (2)Time = 210
> ----------------
> (1)Time = 180
> (2)Time = 221
> ----------------
> (1)Time = 180
> (2)Time = 210
> ----------------
> (1)Time = 181
> (2)Time = 210
> ----------------
> (1)Time = 180
> (2)Time = 220
> ----------------
> (1)Time = 171
> (2)Time = 220
> ----------------
> (1)Time = 180
> (2)Time = 211
> ----------------
> (1)Time = 180
> (2)Time = 220
> ----------------
> (1)Time = 170
> (2)Time = 271
> ----------------
> (1)Time = 310
> (2)Time = 371
> ----------------
> (1)Time = 170
> (2)Time = 240
> ----------------
> (1)Time = 181
> (2)Time = 290
> ----------------
> (1)Time = 180
> (2)Time = 211
> ----------------
> (1)Time = 180
> (2)Time = 220
> ----------------
> (1)Time = 180
> (2)Time = 231
> ----------------
> (1)Time = 180
> (2)Time = 210
> ----------------
> (1)Time = 181
> (2)Time = 330
> ----------------
> (1)Time = 331
> (2)Time = 230
> ----------------
> (1)Time = 180
> (2)Time = 271
> ----------------
> (1)Time = 260
> (2)Time = 270
> ----------------
>
> ===========================================================================
> To unsubscribe, send email to listserv@java.sun.com and include in the body
> of the message "signoff JAVA2D-INTEREST". For general help, send email to
> listserv@java.sun.com and include in the body of the message "help".
>
>
>

--
Peter B. West
Folio

===========================================================================
To unsubscribe, send email to listserv@java.sun.com and include in the body
of the message "signoff JAVA2D-INTEREST". For general help, send email to
listserv@java.sun.com and include in the body of the message "help".

Ken Warner

Never mind -- found it...

http://java.sun.com/docs/books/jls/third_edition/html/expressions.html#1...

At run time, shift operations are performed on the two's complement integer representation of the value of the left operand.

The value of n<

The value of n>>s is n right-shifted s bit positions with sign-extension. The resulting value is ï¿½ n/2sâŒ‹. For nonnegative values of n, this is equivalent to truncating integer division, as computed by the integer division operator /, by two to the power s.

Now if I could only understand this...

>>if(r < 0) r = 0;
>>if(r > 255) r = 255;
>>if(g < 0) g = 0;
>>if(g > 255) g = 255;
>>if(b < 0) b = 0;
>>if(b > 255) b = 255;
>>
>>Clumsy and awkward. Is there a better way to clamp
>>the values?
>>Maybe with a mask or something?
>
>
> If your "better" meant "faster", you might like to take a look at this:
> https://jai-core.dev.java.net/source/browse/jai-core/src/share/classes/c...
> under computeRectByte()
>
> //
> // The next two lines are a fast way to do
> // an add with saturation on U8 elements.
> // It eliminates the need to do clamping.
> //
> sum = (s1[s1PixelOffset]&0xFF) + (s2[s2PixelOffset]&0xFF);
> d[dPixelOffset] = (byte)((((sum<<23) >> 31) | sum) & 0xFF);
>
> HTH,
> -James
> [Message sent by forum member 'jxc' (jxc)]
>
>
> ===========================================================================
> To unsubscribe, send email to listserv@java.sun.com and include in the body
> of the message "signoff JAVA2D-INTEREST". For general help, send email to
> listserv@java.sun.com and include in the body of the message "help".
>
>

===========================================================================
To unsubscribe, send email to listserv@java.sun.com and include in the body
of the message "signoff JAVA2D-INTEREST". For general help, send email to
listserv@java.sun.com and include in the body of the message "help".

Ken Warner

Well, I did a test. The double shift does clamp the top end. But negative
numbers are twisted to be 255. In bi-cubic interpolation (the reason for all
this nonsense) negative byte (channel) values can occur. The goal is to
clamp to the range [0-255] so -100 goes to 0 and 300 goes to 255.

Here's the snippet:

// sum = (s1[s1PixelOffset]&0xFF) + (s2[s2PixelOffset]&0xFF);
// d[dPixelOffset] = (byte)((((sum<<23) >> 31) | sum) & 0xFF);

int foo = 0;
int bar = 0;
for(int i = -10; i < 265; i++)
{
foo = i;
bar = ((((foo << 23) >> 31) | foo) & 0xFF);

System.err.println(i + ": foo = " + foo + ", bar = " + bar );

}

And here is the abbreviated output:
-4: foo = -4, bar = 255
-3: foo = -3, bar = 255
-2: foo = -2, bar = 255
-1: foo = -1, bar = 255
0: foo = 0, bar = 0
1: foo = 1, bar = 1
2: foo = 2, bar = 2
3: foo = 3, bar = 3
4: foo = 4, bar = 4
.
.
.
252: foo = 252, bar = 252
253: foo = 253, bar = 253
254: foo = 254, bar = 254
255: foo = 255, bar = 255
256: foo = 256, bar = 255
257: foo = 257, bar = 255
258: foo = 258, bar = 255
259: foo = 259, bar = 255

Finally, com.sun.media.jai.util.ImageUtil.java does this to clamp a byte value
which is equivalent to what I was doing. I need to clamp both ends reliably.
I wonder if it can be done with shifts and masks? Maybe I've done the best
that can be done.

public static final byte clampByte(int in) {
return (in > 0xFF ? (byte)0xFF : (in >= 0 ? (byte)in : (byte)0));
}

Ken Warner wrote:
> Never mind -- found it...
>
> http://java.sun.com/docs/books/jls/third_edition/html/expressions.html#1...
>
>
> At run time, shift operations are performed on the two's complement
> integer representation of the value of the left operand.
>
> The value of n< > (even if overflow occurs) to multiplication by two to the power s.
>
> The value of n>>s is n right-shifted s bit positions with
> sign-extension. The resulting value is ï¿½ n/2sâŒ‹. For nonnegative values
> of n, this is equivalent to truncating integer division, as computed by
> the integer division operator /, by two to the power s.
>
> Now if I could only understand this...
>
>
>>> if(r < 0) r = 0;
>>> if(r > 255) r = 255;
>>> if(g < 0) g = 0;
>>> if(g > 255) g = 255;
>>> if(b < 0) b = 0;
>>> if(b > 255) b = 255;
>>>
>>> Clumsy and awkward. Is there a better way to clamp
>>> the values?
>>> Maybe with a mask or something?
>>
>>
>>
>> If your "better" meant "faster", you might like to take a look at this:
>> https://jai-core.dev.java.net/source/browse/jai-core/src/share/classes/c...
>>
>> under computeRectByte()
>>
>> //
>> // The next two lines are a fast way to do
>> // an add with saturation on U8 elements.
>> // It eliminates the need to do clamping.
>> //
>> sum = (s1[s1PixelOffset]&0xFF) +
>> (s2[s2PixelOffset]&0xFF);
>> d[dPixelOffset] = (byte)((((sum<<23) >> 31) | sum)
>> & 0xFF);
>>
>> HTH,
>> -James
>> [Message sent by forum member 'jxc' (jxc)]
>>
>>
>> ===========================================================================
>>
>> To unsubscribe, send email to listserv@java.sun.com and include in the
>> body
>> of the message "signoff JAVA2D-INTEREST". For general help, send
>> email to
>> listserv@java.sun.com and include in the body of the message "help".
>>
>>
>
> ===========================================================================
> To unsubscribe, send email to listserv@java.sun.com and include in the body
> of the message "signoff JAVA2D-INTEREST". For general help, send email to
> listserv@java.sun.com and include in the body of the message "help".
>
>

===========================================================================
To unsubscribe, send email to listserv@java.sun.com and include in the body
of the message "signoff JAVA2D-INTEREST". For general help, send email to
listserv@java.sun.com and include in the body of the message "help".

Peter B. West

Ken Warner wrote:
> Well, I did a test. The double shift does clamp the top end. But negative
> numbers are twisted to be 255. In bi-cubic interpolation (the reason
> for all
> this nonsense) negative byte (channel) values can occur. The goal is to
> clamp to the range [0-255] so -100 goes to 0 and 300 goes to 255.
>
> Here's the snippet:
>
> // sum = (s1[s1PixelOffset]&0xFF) + (s2[s2PixelOffset]&0xFF);
> // d[dPixelOffset] = (byte)((((sum<<23) >> 31) | sum) & 0xFF);
>
> int foo = 0;
> int bar = 0;
> for(int i = -10; i < 265; i++)
> {
> foo = i;
> bar = ((((foo << 23) >> 31) | foo) & 0xFF);
>
> System.err.println(i + ": foo = " + foo + ", bar = " + bar );
>
> }
>

int >> 31 will fill the int with the most significant bit, because of
sign extension. That will give you either 0 or -1.

The <<23 drops everything except the least significant 9 bits off the
high end. That leaves bit 8 ( the ninth bit ) in the most significant
bit position, aka the sign bit. In the original code, an addition is
performed on two byte values, masked to 8 bits. This effectively
converts the byte values to unsigned. The highest bit that can be set by
such an addition is bit 8. When this bit is set, the shifts will give
-1, which OR'd with the original value will give -1, thence 255 when

How are you generating the values? It looks as though the values, before
calculation, are unsigned. If so, you might try this:

int foo = 0;
int bar = 0;
for(int i = -10; i < 265; i++)
{
foo = i;
bar = (((((foo << 23) >> 31) | foo) & ~(foo>>31))&0xFF);
// ^ this is a "not"
// Thunderbird is sucking mightily

System.err.println(i + ": foo = " + foo + ", bar = " + bar );
}

Basically, take the previous result, but override when the sign is
negative (foo>>31);

The question remains, is it worthwhile?

Peter

> And here is the abbreviated output:
> -4: foo = -4, bar = 255
> -3: foo = -3, bar = 255
> -2: foo = -2, bar = 255
> -1: foo = -1, bar = 255
> 0: foo = 0, bar = 0
> 1: foo = 1, bar = 1
> 2: foo = 2, bar = 2
> 3: foo = 3, bar = 3
> 4: foo = 4, bar = 4
> .
> .
> .
> 252: foo = 252, bar = 252
> 253: foo = 253, bar = 253
> 254: foo = 254, bar = 254
> 255: foo = 255, bar = 255
> 256: foo = 256, bar = 255
> 257: foo = 257, bar = 255
> 258: foo = 258, bar = 255
> 259: foo = 259, bar = 255
>
>
>
> Finally, com.sun.media.jai.util.ImageUtil.java does this to clamp a byte
> value
> which is equivalent to what I was doing. I need to clamp both ends
> reliably.
> I wonder if it can be done with shifts and masks? Maybe I've done the best
> that can be done.
>
> public static final byte clampByte(int in) {
> return (in > 0xFF ? (byte)0xFF : (in >= 0 ? (byte)in : (byte)0));
> }
>
> Ken Warner wrote:
>> Never mind -- found it...
>>
>> http://java.sun.com/docs/books/jls/third_edition/html/expressions.html#1...
>>
>>
>>
>> At run time, shift operations are performed on the two's complement
>> integer representation of the value of the left operand.
>>
>> The value of n< >> (even if overflow occurs) to multiplication by two to the power s.
>>
>> The value of n>>s is n right-shifted s bit positions with
>> sign-extension. The resulting value is ï¿½ n/2sâŒ‹. For nonnegative values
>> of n, this is equivalent to truncating integer division, as computed by
>> the integer division operator /, by two to the power s.
>>
>> Now if I could only understand this...
>>
>>
>>>> if(r < 0) r = 0;
>>>> if(r > 255) r = 255;
>>>> if(g < 0) g = 0;
>>>> if(g > 255) g = 255;
>>>> if(b < 0) b = 0;
>>>> if(b > 255) b = 255;
>>>>
>>>> Clumsy and awkward. Is there a better way to clamp
>>>> the values?
>>>> Maybe with a mask or something?
>>>
>>>
>>>
>>> If your "better" meant "faster", you might like to take a look at this:
>>> https://jai-core.dev.java.net/source/browse/jai-core/src/share/classes/c...
>>>
>>>
>>> under computeRectByte()
>>>
>>> //
>>> // The next two lines are a fast way to do
>>> // an add with saturation on U8 elements.
>>> // It eliminates the need to do clamping.
>>> //
>>> sum = (s1[s1PixelOffset]&0xFF) +
>>> (s2[s2PixelOffset]&0xFF);
>>> d[dPixelOffset] = (byte)((((sum<<23) >> 31) | sum)
>>> & 0xFF);
>>>
>>> HTH,
>>> -James
>>> [Message sent by forum member 'jxc' (jxc)]
>>>
>>>

--
Peter B. West
Folio

===========================================================================
To unsubscribe, send email to listserv@java.sun.com and include in the body
of the message "signoff JAVA2D-INTEREST". For general help, send email to
listserv@java.sun.com and include in the body of the message "help".

Ken Warner

Last week, I asked the question:

What is a trick way to clamp a byte to the range of [0-255] An operation that is needed all the time in image processing. The common strategy to perform this operation is usually a logical check of the form:

if(foo > 255)foo = 255;
else if(foo < 0)foo = 0;

This is basically the clamping strategy in:

com.sun.media.jai.util.ImageUtil.java

public static final byte clampByte(int in) {
return (in > 0xFF ? (byte)0xFF : (in >= 0 ? (byte)in : (byte)0));
}

James Cheng sent me a snippet from

sum = (s1[s1PixelOffset]&0xFF) + (s2[s2PixelOffset]&0xFF);
d[dPixelOffset] = (byte)((((sum<<23) >> 31) | sum) & 0xFF);

Which clamps the top end to 255 but leaves the bottom end to dangle negative.

Peter West then sent me a mod to the above code --

bar = (((((foo << 23) >> 31) | foo) & 0xFF) & ~(foo>>31))&0xFF;

Which clamps the integer to the desired range and is faster quantitatively than the logical test.

Peter then improved it with a variant to the shift method above by leaving out the intermediate and trailing mask and cast to byte.

bar = (byte)((((foo << 23) >> 31) | foo) & ~(foo>>31));

Which quantitatively tests a bit faster.

I've attached the test program used to time the various strategies and post the results below.

In summary, the triple shift with no intermediate mask is 30% to 40% faster than the logical test strategy. This can mean a lot when you consider that a bi-cubic interpolation can have 15 clamping ops per pixel -- that's 15,000,000 clamping ops on a 1K x 1K image.

Below are the test results:
(0)- bar = (((((foo << 23) >> 31) | foo) & 0xFF) & ~(foo>>31))&0xFF;
(1)- bar = ((((foo << 23) >> 31) | foo) & ~(foo>>31))&0xFF;
(2)- bar = (byte)((((foo << 23) >> 31) | foo) & ~(foo>>31));
(3)- if(foo > 255)foo = 255;
else if(foo < 0)foo = 0;

shiftTest()...
0:----------------
(0)Time = 160
(1)Time = 120
(2)Time = 130
(3)Time = 221
1:----------------
(0)Time = 150
(1)Time = 190
(2)Time = 151
(3)Time = 270
2:----------------
(0)Time = 190
(1)Time = 141
(2)Time = 130
(3)Time = 210
3:----------------
(0)Time = 160
(1)Time = 120
(2)Time = 131
(3)Time = 220
4:----------------
(0)Time = 200
(1)Time = 160
(2)Time = 151
(3)Time = 250
5:----------------
(0)Time = 190
(1)Time = 171
(2)Time = 160
(3)Time = 250
6:----------------
(0)Time = 190
(1)Time = 141
(2)Time = 170
(3)Time = 250
7:----------------
(0)Time = 191
(1)Time = 160
(2)Time = 160
(3)Time = 250
8:----------------
(0)Time = 191
(1)Time = 160
(2)Time = 160
(3)Time = 260
9:----------------
(0)Time = 161
(1)Time = 160
(2)Time = 160
(3)Time = 251
10:----------------
(0)Time = 180
(1)Time = 170
(2)Time = 160
(3)Time = 251
11:----------------
(0)Time = 200
(1)Time = 160
(2)Time = 150
(3)Time = 251
12:----------------
(0)Time = 190
(1)Time = 160
(2)Time = 160
(3)Time = 231
13:----------------
(0)Time = 190
(1)Time = 160
(2)Time = 161
(3)Time = 250
14:----------------
(0)Time = 190
(1)Time = 160
(2)Time = 161
(3)Time = 250
15:----------------
(0)Time = 190
(1)Time = 171
(2)Time = 160
(3)Time = 250
16:----------------
(0)Time = 190
(1)Time = 141
(2)Time = 170
(3)Time = 250
17:----------------
(0)Time = 190
(1)Time = 141
(2)Time = 160
(3)Time = 250
18:----------------
(0)Time = 181
(1)Time = 170
(2)Time = 150
(3)Time = 250
19:----------------
(0)Time = 171
(1)Time = 170
(2)Time = 150
(3)Time = 250
20:----------------
(0)Time = 171
(1)Time = 160
(2)Time = 160
(3)Time = 251
21:----------------
(0)Time = 190
(1)Time = 170
(2)Time = 160
(3)Time = 251
22:----------------
(0)Time = 190
(1)Time = 160
(2)Time = 170
(3)Time = 231
23:----------------
(0)Time = 180
(1)Time = 160
(2)Time = 150
(3)Time = 231
24:----------------
(0)Time = 190
(1)Time = 160
(2)Time = 161
(3)Time = 260
25:----------------
(0)Time = 180
(1)Time = 160
(2)Time = 151
(3)Time = 250
26:----------------
(0)Time = 190
(1)Time = 160
(2)Time = 151
(3)Time = 250
27:----------------
(0)Time = 190
(1)Time = 161
(2)Time = 160
(3)Time = 250
28:----------------
(0)Time = 190
(1)Time = 151
(2)Time = 160
(3)Time = 260
29:----------------
(0)Time = 180
(1)Time = 160
(2)Time = 160
(3)Time = 250
30:----------------
(0)Time = 191
(1)Time = 160
(2)Time = 170
(3)Time = 260
31:----------------
(0)Time = 161
(1)Time = 170
(2)Time = 160
(3)Time = 251
32:----------------
(0)Time = 190
(1)Time = 160
(2)Time = 160
(3)Time = 261
33:----------------
(0)Time = 190
(1)Time = 160
(2)Time = 150
(3)Time = 231
34:----------------
(0)Time = 190
(1)Time = 160
(2)Time = 150
(3)Time = 231
35:----------------
(0)Time = 190
(1)Time = 170
(2)Time = 151
(3)Time = 250
36:----------------
(0)Time = 190
(1)Time = 160
(2)Time = 151
(3)Time = 250
37:----------------
(0)Time = 190
(1)Time = 160
(2)Time = 131
(3)Time = 250
38:----------------
(0)Time = 180
(1)Time = 171
(2)Time = 150
(3)Time = 250
39:----------------
(0)Time = 190
(1)Time = 141
(2)Time = 150
(3)Time = 250
40:----------------
(0)Time = 190
(1)Time = 141
(2)Time = 150
(3)Time = 260
41:----------------
(0)Time = 191
(1)Time = 160
(2)Time = 160
(3)Time = 250
42:----------------
(0)Time = 181
(1)Time = 160
(2)Time = 160
(3)Time = 250
43:----------------
(0)Time = 181
(1)Time = 160
(2)Time = 170
(3)Time = 251
44:----------------
(0)Time = 200
(1)Time = 160
(2)Time = 160
(3)Time = 251
45:----------------
(0)Time = 190
(1)Time = 160
(2)Time = 170
(3)Time = 231
46:----------------
(0)Time = 190
(1)Time = 160
(2)Time = 160
(3)Time = 241
47:----------------
(0)Time = 190
(1)Time = 160
(2)Time = 171
(3)Time = 250
48:----------------
(0)Time = 180
(1)Time = 170
(2)Time = 141
(3)Time = 260
49:----------------
(0)Time = 180
(1)Time = 161
(2)Time = 160
(3)Time = 250
50:----------------
(0)Time = 190
(1)Time = 151
(2)Time = 150
(3)Time = 250
51:----------------
(0)Time = 190
(1)Time = 141
(2)Time = 160
(3)Time = 260
52:----------------
(0)Time = 191
(1)Time = 170
(2)Time = 160
(3)Time = 260
53:----------------
(0)Time = 171
(1)Time = 170
(2)Time = 160
(3)Time = 250
54:----------------
(0)Time = 171
(1)Time = 160
(2)Time = 170
(3)Time = 251
55:----------------
(0)Time = 190
(1)Time = 160
(2)Time = 160
(3)Time = 251
56:----------------
(0)Time = 190
(1)Time = 160
(2)Time = 170
(3)Time = 241
57:----------------
(0)Time = 190
(1)Time = 170
(2)Time = 161
(3)Time = 260
58:----------------
(0)Time = 190
(1)Time = 190
(2)Time = 161
(3)Time = 250
59:----------------
(0)Time = 190
(1)Time = 161
(2)Time = 160
(3)Time = 310
60:----------------
(0)Time = 191
(1)Time = 170
(2)Time = 150
(3)Time = 260
61:----------------
(0)Time = 181
(1)Time = 170
(2)Time = 190
(3)Time = 260
62:----------------
(0)Time = 171
(1)Time = 160
(2)Time = 170
(3)Time = 251
63:----------------
(0)Time = 190
(1)Time = 170
(2)Time = 170
(3)Time = 241
64:----------------
(0)Time = 190
(1)Time = 160
(2)Time = 160
(3)Time = 231
65:----------------
(0)Time = 200
(1)Time = 160
(2)Time = 161
(3)Time = 250
66:----------------
(0)Time = 190
(1)Time = 160
(2)Time = 141
(3)Time = 260
67:----------------
(0)Time = 190
(1)Time = 160
(2)Time = 141
(3)Time = 250
68:----------------
(0)Time = 190
(1)Time = 161
(2)Time = 170
(3)Time = 250
69:----------------
(0)Time = 190
(1)Time = 141
(2)Time = 180
(3)Time = 250
70:----------------
(0)Time = 191
(1)Time = 160
(2)Time = 160
(3)Time = 250
71:----------------
(0)Time = 161
(1)Time = 170
(2)Time = 150
(3)Time = 250
72:----------------
(0)Time = 181
(1)Time = 170
(2)Time = 160
(3)Time = 251
73:----------------
(0)Time = 190
(1)Time = 160
(2)Time = 180
(3)Time = 251
74:----------------
(0)Time = 190
(1)Time = 160
(2)Time = 170
(3)Time = 231
75:----------------
(0)Time = 190
(1)Time = 170
(2)Time = 150
(3)Time = 231
76:----------------
(0)Time = 190
(1)Time = 160
(2)Time = 161
(3)Time = 260
77:----------------
(0)Time = 180
(1)Time = 160
(2)Time = 131
(3)Time = 250
78:----------------
(0)Time = 190
(1)Time = 160
(2)Time = 141
(3)Time = 260
79:----------------
(0)Time = 180
(1)Time = 171
(2)Time = 150
(3)Time = 250
80:----------------
(0)Time = 190
(1)Time = 151
(2)Time = 160
(3)Time = 250
81:----------------
(0)Time = 190
(1)Time = 151
(2)Time = 150
(3)Time = 260
82:----------------
(0)Time = 181
(1)Time = 170
(2)Time = 170
(3)Time = 250
83:----------------
(0)Time = 171
(1)Time = 160
(2)Time = 170
(3)Time = 250
84:----------------
(0)Time = 171
(1)Time = 160
(2)Time = 150
(3)Time = 261
85:----------------
(0)Time = 180
(1)Time = 170
(2)Time = 160
(3)Time = 251
86:----------------
(0)Time = 190
(1)Time = 160
(2)Time = 170
(3)Time = 231
87:----------------
(0)Time = 190
(1)Time = 160
(2)Time = 171
(3)Time = 250
88:----------------
(0)Time = 190
(1)Time = 170
(2)Time = 141
(3)Time = 250
89:----------------
(0)Time = 190
(1)Time = 170
(2)Time = 141
(3)Time = 250
90:----------------
(0)Time = 190
(1)Time = 171
(2)Time = 160
(3)Time = 260
91:----------------
(0)Time = 190
(1)Time = 141
(2)Time = 170
(3)Time = 250
92:----------------
(0)Time = 191
(1)Time = 160
(2)Time = 150
(3)Time = 250
93:----------------
(0)Time = 181
(1)Time = 160
(2)Time = 160
(3)Time = 250
94:----------------
(0)Time = 171
(1)Time = 160
(2)Time = 170
(3)Time = 261
95:----------------
(0)Time = 190
(1)Time = 160
(2)Time = 160
(3)Time = 251
96:----------------
(0)Time = 190
(1)Time = 170
(2)Time = 160
(3)Time = 231
97:----------------
(0)Time = 190
(1)Time = 160
(2)Time = 160
(3)Time = 231
98:----------------
(0)Time = 190
(1)Time = 160
(2)Time = 161
(3)Time = 260
99:----------------
(0)Time = 190
(1)Time = 160
(2)Time = 151
(3)Time = 250

===========================================================================
To unsubscribe, send email to listserv@java.sun.com and include in the body
of the message "signoff JAVA2D-INTEREST". For general help, send email to
listserv@java.sun.com and include in the body of the message "help".
[Miller.java]

kirillcool
Offline
Joined: 2004-11-17

So the clamping is 30 to 40% faster. Did you measure what impact does it have on the overall image processing? How much time does clamping take in the entire computation of the pixel color (with that bicubic interpolation)?

Kirill

Ken Warner

Yeah, when I integrated it into my bi-cubic interpolator, it slowed it down compared to the logical check clamp. I'm not sure why yet. Maybe it was the way I integrated it. I don't know.

In my test suite, the time to interpolate a 944 x 644 image went from 1350ms using the logical test clamp to 1440ms using the triple shift. I'm not sure why... I work on a real slow (800mhz) machine just so I can see these kinds of things better.

> So the clamping is 30 to 40% faster. Did you measure what impact does it have on the overall image processing? How much time does clamping take in the entire computation of the pixel color (with that bicubic interpolation)?
>
> Kirill
> [Message sent by forum member 'kirillcool' (kirillcool)]
>
>
> ===========================================================================
> To unsubscribe, send email to listserv@java.sun.com and include in the body
> of the message "signoff JAVA2D-INTEREST". For general help, send email to
> listserv@java.sun.com and include in the body of the message "help".
>
>

===========================================================================
To unsubscribe, send email to listserv@java.sun.com and include in the body
of the message "signoff JAVA2D-INTEREST". For general help, send email to
listserv@java.sun.com and include in the body of the message "help".

kirillcool
Offline
Joined: 2004-11-17

> Yeah, when I integrated it into my bi-cubic
> interpolator, it slowed it down compared to the
> logical check clamp. I'm not sure why yet. Maybe it
> was the way I integrated it. I don't know.

Don't you just love micro benchmarks?

jxc
Offline
Joined: 2005-02-24

> if(r < 0) r = 0;
> if(r > 255) r = 255;
> if(g < 0) g = 0;
> if(g > 255) g = 255;
> if(b < 0) b = 0;
> if(b > 255) b = 255;
>
> Clumsy and awkward. Is there a better way to clamp
> the values?
> Maybe with a mask or something?

If your "better" meant "faster", you might like to take a look at this:
https://jai-core.dev.java.net/source/browse/jai-core/src/share/classes/c...
under computeRectByte()

//
// The next two lines are a fast way to do
// an add with saturation on U8 elements.
// It eliminates the need to do clamping.
//
sum = (s1[s1PixelOffset]&0xFF) + (s2[s2PixelOffset]&0xFF);
d[dPixelOffset] = (byte)((((sum<<23) >> 31) | sum) & 0xFF);

HTH,
-James

Ken Warner

That's obscure enough to be intriguing -- do you
have a link to the rules Java works within on shifts?
Because the (>> 31) looks a little magical.

I will also be searching the docs....

>>if(r < 0) r = 0;
>>if(r > 255) r = 255;
>>if(g < 0) g = 0;
>>if(g > 255) g = 255;
>>if(b < 0) b = 0;
>>if(b > 255) b = 255;
>>
>>Clumsy and awkward. Is there a better way to clamp
>>the values?
>>Maybe with a mask or something?
>
>
> If your "better" meant "faster", you might like to take a look at this:
> https://jai-core.dev.java.net/source/browse/jai-core/src/share/classes/c...
> under computeRectByte()
>
> //
> // The next two lines are a fast way to do
> // an add with saturation on U8 elements.
> // It eliminates the need to do clamping.
> //
> sum = (s1[s1PixelOffset]&0xFF) + (s2[s2PixelOffset]&0xFF);
> d[dPixelOffset] = (byte)((((sum<<23) >> 31) | sum) & 0xFF);
>
> HTH,
> -James
> [Message sent by forum member 'jxc' (jxc)]
>
>
> ===========================================================================
> To unsubscribe, send email to listserv@java.sun.com and include in the body
> of the message "signoff JAVA2D-INTEREST". For general help, send email to
> listserv@java.sun.com and include in the body of the message "help".
>
>

===========================================================================
To unsubscribe, send email to listserv@java.sun.com and include in the body
of the message "signoff JAVA2D-INTEREST". For general help, send email to
listserv@java.sun.com and include in the body of the message "help".

James Cheng

On 2007/08/14 04:40 PM, Ken Warner wrote:
> That's obscure enough to be intriguing -- do you
> have a link to the rules Java works within on shifts?
> Because the (>> 31) looks a little magical.

http://java.sun.com/docs/books/jls/third_edition/html/j3TOC.html

-James

===========================================================================
To unsubscribe, send email to listserv@java.sun.com and include in the body
of the message "signoff JAVA2D-INTEREST". For general help, send email to
listserv@java.sun.com and include in the body of the message "help".

kirillcool
Offline
Joined: 2004-11-17

In my code i usually use Math.min and Math.max. It does take more lines, but the intent is obvious and the code is readable.

You can play with OR'ing three AND'ed expressions. Each AND expression handles a specific color channel, using the shifted value as the base and shifted mask. Something like

r << 16 & 255 << 16 (for red)
g << 8 & 255 << 8 (for green)
b & 255 (for blue)

Then, the result would be

255 << 24 | (r << 16 & 255 << 16) | (g << 8 & 255 << 8) | (b & 255)

It is shorter but not necessarily easier to understand without comments.

Ken Warner

...easier - schmeazier -- looks pretty good to me.

Thanks... I needed a kick in the head...

> In my code i usually use Math.min and Math.max. It does take more lines, but the intent is obvious and the code is readable.
>
> You can play with OR'ing three AND'ed expressions. Each AND expression handles a specific color channel, using the shifted value as the base and shifted mask. Something like
>
> r << 16 && 255 << 16 (for red)
> g << 8 && 255 << 8 (for green)
> b && 255 (for blue)
>
> Then, the result would be
>
> 255 << 24 | (r << 16 && 255 << 16) | (g << 8 && 255 << 8) | (b && 255)
>
> It is shorter but not necessarily easier to understand without comments.
> [Message sent by forum member 'kirillcool' (kirillcool)]
>
>
> ===========================================================================
> To unsubscribe, send email to listserv@java.sun.com and include in the body
> of the message "signoff JAVA2D-INTEREST". For general help, send email to
> listserv@java.sun.com and include in the body of the message "help".
>
>

===========================================================================
To unsubscribe, send email to listserv@java.sun.com and include in the body
of the message "signoff JAVA2D-INTEREST". For general help, send email to
listserv@java.sun.com and include in the body of the message "help".

kirillcool
Offline
Joined: 2004-11-17

> ...easier - schmeazier -- looks pretty good to me.
>
> Thanks... I needed a kick in the head...

Just make sure you test it (i just wrote it in the Firefox text area, and when reviewing it i realized that it needed & instead of &&).

Kirill