Cheaper sinc computation for resize.c

NicolasRobidoux · Post by **NicolasRobidoux** » 2010-09-09T06:43:09-07:00

I've not triple-checked my recent changes at all quantum levels.

May take me a while.

NicolasRobidoux · Post by **NicolasRobidoux** » 2010-09-09T07:15:35-07:00

Anthony:

Thanks for the updated benchmarks.

-----

If you want me to revert the Q32 version so that the "within 1 guarantee" holds, let me know.

Post by **anthony** » 2010-09-09T16:58:36-07:00

NicolasRobidoux wrote:(Note completely fit to IM, but just so you get the picture one "lanczos 3 with one trig call".)

Code: Select all

   /*
    Sinc(x)*Sinc(x/3) refactored by Nicolas Robidoux and Chantal
    Racette down to two conditionals, 1 trig call and 7 flops.
  */
  const double xx = x*x;
  if (xx == 0.0)
    return(1.0);
  if (xx <= 9.0)
    {
      const double s = sin((MagickPI/3.0)*x);
      const double ss = s*s;
      const double pp = MagickPI*MagickPI;
      return(ss*((9.0/pp)-ss*(12.0/pp))/xx);
    }
  return(0.0);

Can this be generalized?

Note that the filter function is called with a pointer to the _ResizeFilter (typically it isn't used, except for cubic functions, so it can use the pre-calculated cubic coefficients). In that stucture is both the requested 'support' of the filter, and the 'window_support' (which is usally the same value unless overridden by an expert option from some strange reason).

However there is currently no indication if the function call is for the base filter function, or as a windowing filter function. Though if the window() function point points to the Box() filter function, you can be sure no windowing function is being used to scale the base function.

HOWEVER, for a Lanczos() function, can be regarded as purely a base function.
That would be similar to Lagrange() which uses support values for to determine the 'order' of the Lagrange filter.

NicolasRobidoux · Post by **NicolasRobidoux** » 2010-09-09T18:03:08-07:00

anthony wrote: Can this be generalized?

You mean: Can Chantal and I produce some magical formula which computes Sinc(x)Sinc(x/n) with one trig call for arbitrary positive integer n?

Answer: No.

We can only do things one n at a time, and for some n's, it's going to be really complicated.

-----

With separate formulas, 2,3,4 are no problem, for example.

Post by **anthony** » 2010-09-09T18:29:41-07:00

Expanded the output of the -set option:filter:verbose 1 expert option so that it understands the used of the -precision setting and to also expanded the details of exactly what functions was selected in the comments at the top verbose report.

For example you can now verify that the Lanczos filter is currently defined in terms
of a Sinc windowed Sinc support 3.

Code: Select all

  convert xc: -set option:filter:verbose 1 -filter Lanczos \
          -resize 2x2   null:  |  grep '^#'

#
# Resize Filter (for graphing)
#
# filter = Sinc
# window = Sinc
# support = 3
# win-support = 3
# blur = 1
# blurred_support = 3
# B,C = 0,0
#

The other 'non-commented' values are used allow checking and graphing of the filters output values, such as the many graphs of filters in IM Examples, Resize Filters
http://www.imagemagick.org/Usage/resize/#filter

Post by **anthony** » 2010-09-09T18:33:37-07:00

NicolasRobidoux wrote:
anthony wrote: Can this be generalized?
You mean: Can Chantal and I produce some magical formula which computes Sinc(x)Sinc(x/n) with one trig call for arbitrary positive integer n?

Answer: No.

We can only do things one n at a time, and for some n's, it's going to be really complicated.

I just wonder how complicated? The constants in the above are integers, and also divisible by 3 so I wondered similar results was present for other values of 'n'. At least an integer 'n', seeing as non-integer n in Lanczos while allowed, is rather useless.

NicolasRobidoux · Post by **NicolasRobidoux** » 2010-09-09T19:13:11-07:00

Oops! The lanczos 3 snippet actually only needs 6 flops, B/C MagickPI*MagickPI is computed at compile time.

NicolasRobidoux · Post by **NicolasRobidoux** » 2010-09-09T19:31:21-07:00

P.S. The following is FINE. (Thanks Chantal. Note to self: Do not triple check things when called to make dinner.)

How to compute all lanczos n with only ONE TRIG CALL

Summary: You can use a recursive formula for the sines of multiples of an angle (http://en.wikipedia.org/wiki/List_of_tr ... hev_method) to compute any lanczos n with only one trig call.

For pi lovers: The interesting thing (to a mathematician) is that the Chebyshev method does NOT allow computing all sines or cosines of multiples of an angle with only ONE trig call (unless one involves square roots and some mechanism for computing root signs, two trig calls are needed). There is something "just right" about products of sines a.k.a the expensive part of the Lanczos kernel formula.

Details: Nothing speaks like code:

Code: Select all

static MagickRealType Lanczos(const MagickRealType x,
  const ResizeFilter *resize_filter)
{
  /*
    Lanczos(n) filter function computed with only one call to a
    trigonometric function with a recursive formula, based on the
    Chebyshev method for the computation of sines and cosines of
    multiples of an angle, discovered by Nicolas Robidoux (pending the
    discovery of an earlier discoverer) with the assistance of Chantal
    Racette.
  */
  const double support = resize_filter->support;
  const MagickRealType x2 = x*x;
  if (x2 == 0)
    return(1.0);
  {
  const MagickRealType pi2 = MagickPIL*MagickPIL;
  const MagickRealType c = cos((double) ((MagickPIL/support)*x));
  const MagickRealType s2 = 1 - c * c;
  const MagickRealType s2c = s2 * c;
  const MagickRealType ss2 = s2c + s2c;
  /*
    Really, we want to use Lanczos 2 if the support is <= 2, but doing
    things as follows allows for inaccuracy in the support
    computation.
   */
  if (support<2.5)
    return((2/pi2)/x2*ss2);
  {
  const MagickRealType ss3 = s2*(3+-4*s2);
  if (support<3.5)
    return((3/pi2)/x2*ss3);
  {
  const MagickRealType t4 = c*ss3;
  const MagickRealType ss4 = t4-ss2+t4;
  if (support<4.5)
    return((4/pi2)/x2*ss4);
  {
  const MagickRealType t5 = c*ss4;
  const MagickRealType ss5 = t5-ss3+t5;
  if (support<5.5)
    return((5/pi2)/x2*ss5);
  {
  const MagickRealType t6 = c*ss5;
  const MagickRealType ss6 = t6-ss4+t6;
  if (support<6.5)
    return((6/pi2)/x2*ss6);
  {
  const MagickRealType t7 = c*ss6;
  const MagickRealType ss7 = t7-ss5+t7;
  if (support<7.5)
    return((7/pi2)/x2*ss7);
  {
  const MagickRealType t8 = c*ss7;
  const MagickRealType ss8 = t8-ss6+t8;
  if (support<8.5)
    return((8/pi2)/x2*ss8);
  {
  const MagickRealType t9 = c*ss8;
  const MagickRealType ss9 = t9-ss7+t9;
  if (support<9.5)
    return((9/pi2)/x2*ss9);
  if (support<10.5)
    { 
      const MagickRealType t10 = c*ss9;
      const MagickRealType ss10 = t10-ss8+t10;
      return((10/pi2)/x2*ss10);
    }
  return(
	 Sinc((double) x,resize_filter)
	 *
	 Sinc((double) (x/nearbyint(support)),resize_filter));
  }}}}}}}}
}

Warning:
I can't compile this because this needs to be fitted in the bigger code in ways that I don't have time to figure just now.

The main point, however, is that (assuming the chip's branch prediction figures out what needs to be computed and what does not):

lanczos2(x) can be computed with 1 trig & 1/ & 5* & 1- & 1+

computing lanczos3(x) adds 2* & 1+ to the cost of lanczos2(x) (if the chip's branch prediction is really good, it should only add 1*)

and, from then on, increasing n by 1 adds 1* & 1- & 1+.

(Note that these counts take into account the fact that some operations will be performed at compile time, not at runtime. Only the runtime flops are included.)

I stopped at n = 10, but clearly it is only for very large n that a second trig call beats the recursion (unless the chip's branch prediction sucks).

NicolasRobidoux · Post by **NicolasRobidoux** » 2010-09-10T08:20:56-07:00

anthony wrote: ...
Update: On my work machine (3 years old) with a Dual Core processor, with OpenMP enabled, the above Lanczos tests, using a starting image 1/10 the length (yes it is that much of an older and slower machine) produced...
Code: Select all
time convert -size 10000000x1 xc:  \
       -filter Sinc -set option:filter:window Sinc \
       -resize 1000x1 null:
real 0m12.621s
user 0m22.365s
sys 0m0.253s
Code: Select all
time convert -size 10000000x1 xc:  \
      -filter SincPolynomial -set option:filter:window SincPolynomial \
      -resize 1000x1 null:
real 0m8.858s
user 0m16.634s
sys 0m0.235s

this shows a much greater difference, but only in this special case for which caching filter values does not help very much.
...

Anthony: This is just a conjecture, but I think that the more noticeable relative difference in execution time may be due to the fact that you used openmp. I am guessing that the pixel computation is more parallelized than the coefficient computation, so anything which shortens the pixel computation leads to a larger relative speedup of "Poor Man's Sinc" when compared to "math.c sin sinc."

NicolasRobidoux · Post by **NicolasRobidoux** » 2010-09-10T08:33:08-07:00

Anthony Thyssen privately pointed out that I had changed the behavior of the gaussian filter.

I, indeed, had misread the code (I'm used to formulas having spacing to group things; what a lame excuse!).

In any case, here is code which computes the gaussian filter with three multiplies and a call to the exponential function. This code, in addition, makes it easy to change the value of the variance by changing a macro. (Me and my student Adam Turcotte used to customize resize.c for our purposes, so this is not unheard of.)

Code: Select all

static MagickRealType Gaussian(const MagickRealType x,
  const ResizeFilter *magick_unused(resize_filter))
{
  /*
    Normalized Gaussian with variance 1/2 (by default):
      1/sqrt(2 pi sigma^2) exp(-x^2/(2 sigma^2))
  */
  #define MagickGAUSSIANSIGMAL 0.5L
  /*
    Change the value of MagickGAUSSIANSIGMAL if you want to override
    the default.
  */
  const MagickRealType sigma2 = MagickGAUSSIANSIGMAL*MagickGAUSSIANSIGMAL;
  const MagickRealType alpha = -1.0/(2.0*sigma2);
  const MagickRealType normalizer = sqrt((double) (1.0/(2.0*MagickPIL*sigma2)));
  return(normalizer*exp((double) (alpha*x*x)));
}

As usual: http://web.cs.laurentian.ca/nrobidoux/misc/IM/resize.c

(Based on this morning's svn, but not manically checked. Anthony's nice automated check option will notice any bad stuff

anyway.)

NicolasRobidoux · Post by **NicolasRobidoux** » 2010-09-10T17:26:29-07:00

I did the same tests Anthony did on my 3 year old laptop with Intel Core 2 and 2GB running 64 bit mint, namely

Code: Select all

time convert -size 10000000x1 xc:  \
      -filter SincPolynomial -set option:filter:window SincPolynomial \
      -resize 1000x1 null:

and a similar one with just Sinc, and I got best of 5 Real time 8.054s for Sinc, and 5.497s for SincPolynomial.

(Vanilla IM, so with Q16 and openmp enabled.)

Best of 5 runs for 2500x2500 to 1000x1000 gives Real time .858s for Sinc, .854 for SincPolynomial.

Conclusion: Clearly the coefficients are computed much faster, but this leads to very little difference for "mainstream" resizing.

Post by **anthony** » 2010-09-11T06:40:46-07:00

NicolasRobidoux wrote:Conclusion: Clearly the coefficients are computed much faster, but this leads to very little difference for "mainstream" resizing.

Yes as I thought at the begining. The Caching makes the calculations a once off for each row and column. For Distorts a cache of 1000 values is generated as a lookup table for a cylindrical filter that is used for a "elliptical weighted averge" technique.

As such for 'typical' resizing and distorts, the caching vastly reduces the cost of trig functions.
Only in the extreme case where 'caching' provides little benefit, does the polynomial form show a major improvement. However that does not mean we would not want to make use of the improvements. Especially when the error difference is so low. We may even make the polynomial function the default!

-----

Aside: The EWA technique by the way was also developed originally by the same author as the "zoom" program, Professor Paul Heckbert, that fathered basically every image resize function that is used on computers today.

See Area Resampling for Distorts...
http://www.imagemagick.org/Usage/distor ... a_resample
and Introduction to Resize Filters
http://www.imagemagick.org/Usage/resize/#fillter

At this time the EWA default to the use of a built-in 2D Gaussian filter (it is a cylindrical filter).
Now that I have a improved understanding, due to this discussion, I may be able to make more direct use of resize filters for cylindrical filters, like I originally planned.

However while lots of information is available for 2-pass orthogonal filters and the many alternatives, very little is available on 2D cylindrical variants. For example a 2D Lanczos equivalent (Bessel windowed Bessel???) could generate much better and less blurry image distortions (such as perspective correction, or Barrel distortion correction. Perhaps someone knows of something like a simpler Cubic function for cylindrical use (like Mitchell in resize).

NicolasRobidoux · Post by **NicolasRobidoux** » 2010-09-11T06:58:15-07:00

P.S. My suggestion below is a BAD IDEA. Ignore it!

anthony wrote:However while lots of information is available for 2-pass orthogonal filters and the many alternatives, very little is available on 2D cylindrical variants. For example a 2D Lanczos equivalent (Bessel windowed Bessel???) could generate much better and less blurry image distortions (such as perspective correction, or Barrel distortion correction. Perhaps someone knows of something like a simpler Cubic function for cylindrical use (like Mitchell in resize).

Sounds like I have my next Masters student's topic.

I don't know enough to make a really educated suggestion, but nonetheless how about you simply use lanczos3(r) instead of a tensor product of lanczos3 in x and y (r is the distance to the center)? (lanczos2 would also be a good choice, I would guess.) To make a more educated choice, I'd need to do some careful 2D spectral analysis, and this is not likely to happen for quite a while (unless I can find someone to do it for me, like a Masters student

).

Of course, Chantal Racette and I can provide very cheap polynomial approximations of the lanczos2/3 kernels!

NicolasRobidoux · Post by **NicolasRobidoux** » 2010-09-11T07:23:23-07:00

A "less bad" idea:

Try lanczos2(r^2) with support up to r^2=2.

(Again, my understanding of cylindrical is a bit primitive, which is why I'd need a new grad student to explain things to me, but if it's easy to try things...)

And, of course, Chantal and I can provide ridiculously cheap accurate polynomial approximations of lanczos2(x) (into which you'd plug r^2).

I'd also suggest to simply filter, as opposed to doing an (approximate) exact area computation (if I get the drift without reading more about this). In my experience, there is little benefit to using an exact area approach. And I've done it: http://registry.gimp.org/node/19582

NicolasRobidoux · Post by **NicolasRobidoux** » 2010-09-11T07:27:05-07:00

anthony wrote: As such for 'typical' resizing and distorts, the caching vastly reduces the cost of trig functions.
Only in the extreme case where 'caching' provides little benefit, does the polynomial form show a major improvement. However that does not mean we would not want to make use of the improvements. Especially when the error difference is so low. We may even make the polynomial function the default!

May I suggest calling the polynomial approximations "Sinc" and their more accurate versions "SincTrigonometric"? (!)

Legacy ImageMagick Discussions Archive

Cheaper sinc computation for resize.c

Re: Cheaper sinc computation for resize.c

Re: Cheaper sinc computation for resize.c

Re: Cheaper sinc computation for resize.c

Re: Cheaper sinc computation for resize.c

Re: Cheaper sinc computation for resize.c

Re: Cheaper sinc computation for resize.c

Re: Cheaper sinc computation for resize.c

Re: Cheaper sinc computation for resize.c

Re: Cheaper sinc computation for resize.c

Re: Cheaper sinc computation for resize.c

Re: Cheaper sinc computation for resize.c

Re: Cheaper sinc computation for resize.c

Re: Cheaper sinc computation for resize.c

Re: Cheaper sinc computation for resize.c

Re: Cheaper sinc computation for resize.c