Page 1 of 1

Cheaper Blackman computation for resize.c

Posted: 2010-09-04T17:26:04-07:00
by NicolasRobidoux
Suggested code only needs one trig call instead of two and two muladds and one * instead of two muladds and two *.

Code: Select all

static MagickRealType Blackman(const MagickRealType x,
  const ResizeFilter *magick_unused(resize_filter))
{
  /*
    Blackman: 2rd Order cosine windowing function:
    .42 + .5 cos(pi x) + .08 cos(2pi x).
    Refactored by Chantal Racette and Nicolas Robidoux so it needs
    only one trig call and five flops.
  */
  const double alpha = cos(MagickPI*(double) x);
  return(0.34+alpha*(0.5+alpha*0.16));
}

Re: Cheaper Blackman computation for resize.c

Posted: 2010-09-04T18:37:50-07:00
by magick
Look for your patch in ImageMagick 6.6.4-1 Beta available by sometime tomorrow. Thanks.

Re: Cheaper Blackman computation for resize.c

Posted: 2010-09-05T21:56:13-07:00
by anthony
Nice optimization!

Re: Cheaper Blackman computation for resize.c

Posted: 2010-09-08T14:01:34-07:00
by NicolasRobidoux
Anthony:

There are a number of very small improvements that could be done to the code. Perfectionist improvements, one may call them. Maybe even a cosmetic waste of your time.

I'd like to know if you are interested in me going through the code for such.

I'm not talking about things which do make some sort of significant difference here!

Here is an example:

Code: Select all

static MagickRealType Hanning(const MagickRealType x,
  const ResizeFilter *magick_unused(resize_filter))
{
  /*
    A Cosine windowing function.
  */
  return(0.5+0.5*cos(MagickPI*(double) x));
}
When MagickRealType is long double, the cosine is computed by casting x to a double, and then multiplying it by MagickPI which, being defined by an untyped macro, is also a double. Of course, in the end, what's fed to cos has to be a double.

So, #defining a long version of pi and multiplying before casting may lead to a minuscule improvement in the precision of the final result. While we're at it, we may as well cast the cos up to MagickRealType so that the remaining arithmetic is done in long double precision for Q32 and up.

This gives:

Code: Select all

#define MagickPIL 3.14159265358979323846264338327950288420L

static MagickRealType Hanning(const MagickRealType x,
  const ResizeFilter *magick_unused(resize_filter))
{
  /*
    A cosine windowing function: .5 + .5 cos(pi x)
  */
  double pix = MagickPIL*x;
  MagickRealType cospix = cos(pix);
  return(0.5+0.5*cospix);
}
or

Code: Select all

#define MagickPIL 3.14159265358979323846264338327950288420L

static MagickRealType Hanning(const MagickRealType x,
  const ResizeFilter *magick_unused(resize_filter))
{
  /*
    A cosine windowing function: .5 + .5 cos(pi x)
  */
  return(0.5+0.5*(MagickRealType) cos((double) (MagickPIL*x)));
}
In other words:

Would you rather that I not make these kinds of changes. If you are OK with it, which of the two above forms do you prefer? (Note that I omitted my favorite keyword, "const"!)

Re: Cheaper Blackman computation for resize.c

Posted: 2010-09-08T15:22:53-07:00
by NicolasRobidoux
Well, I did it anyway.

You can find the obsessive-compulsive result (which should give slightly more precise results, esp. when MagickRealType = long double) here:

http://web.cs.laurentian.ca/nrobidoux/misc/IM/resize.c

A few flops disappeared here and there.

The one "nontrivial" thing I did was add a cast to double to a MagickRealType x fed to fabs in GetResizeFilterWeight, and eliminated the call to fabs altogether in SincPolynomial.

(I don't know what the C rules are for long doubles fed to double functions. I'm guessing this is harmless but elsewhere in the code the coder bothered casting MagickRealTypes fed to fabs so I figure it should be done everywhere.)

-----

I also added a

#define MagickPIL 3.14159265358979323846264338327950288420L

which you may prefer a different name for (the "L" is for "Long," I swear ;-) ) and may want to put somewhere else.

The rounded 39 decimals come from ANSI rules about correct rounding of floating point values typed in in decimal form.

Re: Cheaper Blackman computation for resize.c

Posted: 2010-09-08T15:38:08-07:00
by magick
We added your code to the ImageMagick subversion repository. We'll leave it to Anthony to refine the code if need be.

Re: Cheaper Blackman computation for resize.c

Posted: 2010-09-08T20:23:43-07:00
by anthony
Perhaps all the constants like MagickPI should have that 'L' for double float added to the end.
(set in "magick/image-private.h")

Though I prefer to use the single line version. No need to allocate the extra stack variable, though gcc would probably optimize it out in any case.

Hmmm we could use a constant for
sqrt((double)(8.0/MagickPI)

Why did you use
(1.0/MagickPI)*sin(pix)
rather than
sin(pix)/MagickPI

Unless of course you want the compiler to optimize it to a multiply. In which case why not just directly define a 1/MagickPI constant! Such a constant may be useful in a number of other places such as convolve 1D gaussian (blur) kernel generation.

Re: Cheaper Blackman computation for resize.c

Posted: 2010-09-09T03:50:58-07:00
by NicolasRobidoux
anthony wrote:Perhaps all the constants like MagickPI should have that 'L' for double float added to the end.
(set in "magick/image-private.h")
Appending an "L" to all the pi-related #define constants will have the unfortunate side effect of making all computations in which the constant appears be done in long double precision. Not having anything (like what is currently done) has the possibly unfortunate side effect of making all computations in which the constant appears be done in double precision. However, it appears to me that IM "never" does anything "floating" at precision less than double, so in the latter case the unfortunate side effect is irrelevant. In the former, it may, at least in principle, mess up calls to double precision routines. Some languages will adjust the type of typed in constants (like 3.1415...) based on what they are used for. Not C: 3.1415...7 is a double, no matter what it's used for, if the compiler is ANSI compliant.
anthony wrote: Though I prefer to use the single line version. No need to allocate the extra stack variable, though gcc would probably optimize it out in any case.
Carefully put together multiple line versions are actually closer to SSA (Static Single Assignment) form (what gcc will turn any code into before producing assembly). Only way to know for sure which one is best is to look at assembly (no time for that now) and do very careful bencharks. I would be really surprised if gcc was dumb enough not to put something in register which can on the basis of one line vs multiple lines. Gcc will put something in register even if it is, in principle, a stack variable (within limits).

I've had good results with multi-line in the past.
anthony wrote: Hmmm we could use a constant for
sqrt((double)(8.0/MagickPI)
True. But this snippet (actually, the -alpha on the next line; I should have put the "-" with the sqrt because it makes this point clearer) will be computed at compile time (gcc will optimize away constant subexpressions, so you can pretty much do whatever in constant subexpressions and there will be no impact on run time performance). I reorganized the code so as to group constant things as much as possible to exploit this; this is why the new gaussian code uses one less flop AT RUN TIME. Given that this constant probably is only in the code once, there is no point creating a macro for it.
anthony wrote: Why did you use

Code: Select all

  (1.0/MagickPI)*sin(pix)
rather than

Code: Select all

  sin(pix)/MagickPI
Unless of course you want the compiler to optimize it to a multiply. In which case why not just directly define a 1/MagickPI constant! Such a constant may be useful in a number of other places such as convolve 1D gaussian (blur) kernel generation.
It was an "optimize to multiply" (blush).

A 1/pi constant, indeed, is likely to be used in many places. On the other hand, (1.0/MagicPI) is "constantified" at compile time, so there is no significant difference between the two. If the code was instead written

Code: Select all

sin(pix)/MagickPI
then the divide is done at run time. If a few extra bits of precision (without a divide at run time) is what you're after, you are actually better of doing this:

Code: Select all

  const MagickRealType oneoverpi = 1.0/MagickPIL;
  return (oneoverpi*sin(pix));
The final multiplication will then be done in MagickRealType precision. (Likewise, sin(pix)/MagickRealPIL would have the division done in MagickRealType precision, and in principle would be just a little more precise.) This being said, with ONE multiplication or division, I'm not sure there will be any difference in the result: there is no cancellation effect at play.

(I blushed because in this context this is a fairly pointless optimization.)

Now, you may say: Yeah, but now you are computing 1/pi in long double precision even if MagickRealType is double.

True, but the divide and cast to double are done at compile time. So no matter.

(The above is based on my experience and best understanding. Correct me if I'm wrong.)

Re: Cheaper Blackman computation for resize.c

Posted: 2010-09-09T16:20:58-07:00
by anthony
I bow to someone who has obviously more experience in these matters.

I have not delved into assembly since my university days. Most of my actual work is with shell and perl scripts. The only real C programming I get is in the adding of new code to ImageMagick.

Re: Cheaper Blackman computation for resize.c

Posted: 2010-09-10T06:02:41-07:00
by NicolasRobidoux
anthony wrote:I have not delved into assembly since my university days. Most of my actual work is with shell and perl scripts. The only real C programming I get is in the adding of new code to ImageMagick.
Looking at what resample.c does, you don't seem to be doing too badly.

Re: Cheaper Blackman computation for resize.c

Posted: 2010-09-10T06:57:06-07:00
by anthony
I didn't say I don't know how to program :D I have been doing it for ... hmmm 30 years (since highschool and 'basic'). I understand data structures too. 8)

I do very well, but only if I can get a uninterrupted block to time to true concentrate on it. I don't get a lot of that!

Take a look at the new 'Gaussian' filter change report.

Re: Cheaper Blackman computation for resize.c

Posted: 2010-09-10T18:11:08-07:00
by NicolasRobidoux
anthony wrote:Take a look at the new 'Gaussian' filter change report.
I don't understand: I checked svn carefully, and I don't see any change.

Re: Cheaper Blackman computation for resize.c

Posted: 2010-09-10T18:27:48-07:00
by NicolasRobidoux
Just a suggestion:

Maybe it's time to give me commit privileges.

Re: Cheaper Blackman computation for resize.c

Posted: 2010-09-11T06:17:01-07:00
by anthony
You will have to ask Cristy (Magick) for that. But you do have to thoroughly check code before commit or you generate problems for the later beta releases.