Legacy ImageMagick Discussions Archive

Hi!
Can anyone suggest how to perform the following distortion with IM?
I have a spectrogram of a piece of music (produced by sndfile-spectrogram) that has a linear frequency axis, which means that adjacent semitones are close and closer together as you go into the bass range, while the top half of the gram is mostly empty empty. What I would like is a spectrogram with a log frequency axis so that semitones are the same distance apart in the bass, midrange and treble, to facilitate deriving and typesetting the score.
I've studied the "distort" manual http://www.imagemagick.org/Usage/distorts/ but can't see one that does what I need. The closest ones seem to be some kind of perspective mapping ("laying it flat on the ground") followed by cropping it and doing another distortion to restore the sides to vertical, or maybe applying some polynomial to approximate a logarithmic scale, neither of which really apply a logarithmic scale to the vertical axis, just something similar.
Is there something I'm missing here, some general way of inputting a formula for the reverse mapping function, or shall I go hack the sources to add my own "logvscale" distort function?

Here's a sample image to explain the kind of source file I'm talking about

though the real thing will be using a much larger input file

Thanks for any suggestions

M

Please clarify. Your axes are x=time and y=decibels (attenuation/amplitude). Is your x axis equivalent to frequency? Are you asking to spread the x or y axis logarithmically.

Either way, the only way that I know about is the very slow -fx function or by using displacement maps. see
http://www.imagemagick.org/script/fx.php
http://www.imagemagick.org/Usage/transform/#fx
http://www.imagemagick.org/Usage/mappin ... rtion_maps
http://www.imagemagick.org/Usage/mappin ... ement_maps
http://www.imagemagick.org/Usage/mapping/#displace_2d

Many thanks for the reply and pointers into the docs.

The x axis of the spectrogram represents time progressing through the piece of music (2m56s or 176 seconds in the example image) and the y axis is frequency from 0 (at the bottom) to 22050Hz (at the top) while the brightness of each pixel in any vertical column of pixels represents the energy in the sound at that instant at each of the frequencies.

In the output image, there would be no distortion horizontally but the bottom row of pixels should represent the lowest interesting frequency, about 50Hz and the top row the highest, about 3150Hz in such a way that the same vertical distance represents the same musical interval, i.e. going up by one pixel row represents a change from N Hz to N*x Hz instead of from N to N+x as it does in the input image. Does that make any sense?
A low-res example would be, if the output were 7 pixels high, for them to represent 50, 100, 200, 400, 800, 1600 and 3200 Hz.

Yes, it looks like this can be done with the fx operator. Let me see if I can get my head round fx and associated concepts, round the exact math I need and I'll get back to you for improvements. No, I'm not expecting it to be fast...

Wow, IM is certainly a very powerful tool! Thanks again

You can probably speed it up once you figure out your formula by using fx to convert a one-column gradient into a displacement map. That can then be replicated to fill the same thing out for the full width of your image by using -scale. Then you can use the displacement map with -fx to more rapidly process your image. Get back to us if you have any questions once you know what formula you need to apply.

Sounds more like a specialised distortion. But would be difficult to get proper resampling unless it is actually made into a real distortion, rather than using FX or a Distortion map.

Thanks. I'm still deciding whether it's best to do this or modify the spectrogram program, which already has an unimplemented --log-freq option.
But I'll take your idea into account. I had wondered if such a thing were doable, having seen passing references to it in the docs

Cheers, I'll let you know how it turns out...

M

OK, the math is this:

the input is a borderless WxH spectrogram as per above, where the vertical axis represents from 0 to 22500Hz linearly.
the output should have a logarithmic vertical axis from 50Hz to 3150 (well, log_min and log_max, let's say, and min_freq_in and max_freq_in for those others)

The y coordinate of a pixel in the input image is called y_in and is from 0 to y_max, where 0 is at the bottom,
so the frequency it represents, freq_in = y_in * max_freq_in / y_max_in

in the output image, a pixel's y coordinate, y_log, should represent a frequency from 50 to 3150 on a logarithmic scale

when freq = min_freq_out, y_log = 0
when freq = max_freq_out, y_log = y_max_out
we know that log(1) = 0 and log(base) = 1
and the result it shouldn't matter what base logarithm you use (the curve is the same for all).

going from pixel 0 to pixel y_max_out, each step should add (log(freq/50)/log()/y_max_out

if freq < 50 or freq > 3150 do nothing
else
y_out = (log(freq/50)/log(base * freq/3150)) {gives 0..1} * y_max_out
// freq/50 gives you 1 at freq==50, so log of that is 0
// freq/3150 is 1.0 at 3150Hz, so multiplying that by base gives log(base) which is 1.
// I think this can be rearranged to
// y_out = (log(freq/50)/(base + log(freq/3150))) {gives 0..1} * y_max_out
// to save a multiplication
end

y_out = (log(freq/50)/log(base * freq/3150)) {gives 0..1} * y_max_out
where
freq = freq_in = y_in * max_freq_in / y_max_in

Er, so the reverse mapping function would be... er...

> going from pixel 0 to pixel y_max_out, each step should add (log(freq/50)/log()/y_max_out

This was garbage, please ignore

M

> y_out = (log(freq/50)/log(base * freq/3150)) {gives 0..1

this was also garbage. the top of the division is 0 at f=50 and the bottom is 1.0 at f=3150, but the result of the division is eomething else.

flail, flail

ok, (50 gives 1 and 3150 gives base) logs to 0..1

try
(freq - 50) * base / (3150 - 50)

when freq == 50, this gives 0
when freq == 3150, this gives base * (3150 - 50) / (3150 - 50)
(3150 - 50) / (3150 - 50) = 1 so that times base gives base.

Does that sound right?

i.e. to map from input pixel coords to output pixel coords it would be

freq = y_in * max_freq_in / y_max_in
if freq < 50 or freq > 3150 skip
y_out = log((freq - 50) * base / (3150 - 50)) * y_max_out

?

[quote="martinwguy"]i.e. to map from input pixel coords to output pixel coords it would be

Inlining freq and rearranging that...

y_out = log(((y_in * max_freq_in / y_max_in) - 50) * base / (3150 - 50)) * y_max_out
exp(y_out / y_max_out) = ((y_in * max_freq_in / y_max_in) - 50) * base / (3150 - 50)

so the reverse mapping is:

y_in = (exp(y_out / y_max_out) * (3150 - 50) / base + 50) * y_max_in / max_freq_in

substituting min_freq_out and max_freq_out for 50 and 3150

Oh. I just tried that:
$ cat >> logaxis.c << \EOF
#include <stdlib.h>
#include <stdio.h>
#include <math.h>

main(int argc, char **argv)
{
int max_freq_in = 22050;
int max_y_in = 8191;
double min_freq_out = 50;
double max_freq_out = 3150;
int max_y_out = 1024;
int y;

printf(" Out In\n");
for (y = 0; y <= max_y_out; y += 100) {
printf("%4d %4ld\n", y,
lrint((exp((double)y / max_y_out) * (max_freq_out - min_freq_out)
/ M_E + min_freq_out) * max_y_in / max_freq_in));
}
exit(0);
}
EOF
$ cc -o logaxis logaxis.c -lm
$ ./logaxis
Out In
0 442
100 486
200 534
300 586
400 645
500 709
600 780
700 858
800 944
900 1039
1000 1143

Let's see what those limiting input frequencies are. The input axis is 8192 high and represents 0-22050Hz

$ bc -l
bc 1.06.95
1143*22050/8192
3076.55639648437500000000
442*22050/8192
1189.70947265625000000000

So it's outputting logarithmically from 1190 to 3100. Hum. At least one is right...

Anyone spot the error?

Yes. log(freq-50) goes to log(0), not log(1).

I'll spare you the workings, but this seems to be the correct reverse mapping:

y_in = (((exp(y_out / max_y_out) - 1) / (base - 1)) * (max_freq_out - min_freq_out) + min_freq_out) * max_y_in / max_freq_in

for which a test run from 0 to 1023 step 93, with max_y_in = 8191 and max_y_out = 1023 targetting 50..3150Hz gives:

Out In Freq
0 19 50
93 82 222
186 152 410
279 229 616
372 312 841
465 404 1088
558 505 1359
651 615 1655
744 735 1979
837 867 2335
930 1012 2724
1023 1170 3150

but a simpler way is to work backwards from the output y coordinate
freq = 50 * (3150-50)^(y_out / max_y_out)
y_in = max_y_in * (freq / max_freq_in)

giving

convert -size $(WIDTH)x$(LOG_HEIGHT) xc: pattern.png \
-virtual-pixel White \
-interpolate NearestNeighbor \
-fx "freq = $(MIN_FREQ_OUT) * pow($(MAX_FREQ_OUT) / $(MIN_FREQ_OUT), ($(MAX_Y_OUT) - j) / $(MAX_Y_OUT)); \
yy = $(MAX_Y_IN) - freq * $(MAX_Y_IN) / $(MAX_FREQ_IN); \
v.p{i,yy}" \
pattern-log.png

a 1024x1024 target takes 97 seconds on this box, a dual 1.8GHz Pentium thing.
BTW Congratulations to ImageMagick for using both cores (197% cpu) without
having to specify any flags!
The real thing, 8192x1024, takes 13m45s, which is quite acceptable

See http://wiki.delia-derbyshire.net/wiki/T ... rn_Emerges for the audio

Suggestions to speed it up or to improve the smoothing of the interpolation,
other than -Interpolate Mesh?

M

martinwguy wrote: convert -size $(WIDTH)x$(LOG_HEIGHT) xc: pattern.png \
-virtual-pixel White \
-interpolate NearestNeighbor \
-fx "freq = $(MIN_FREQ_OUT) * pow($(MAX_FREQ_OUT) / $(MIN_FREQ_OUT), ($(MAX_Y_OUT) - j) / $(MAX_Y_OUT)); \
yy = $(MAX_Y_IN) - freq * $(MAX_Y_IN) / $(MAX_FREQ_IN); \
v.p{i,yy}" \
pattern-log.png

Sorry, that's a Makefile fragment. The whole Makefile is

Code: Select all

WIDTH=8192
LIN_HEIGHT=8192
LOG_HEIGHT=1024

MAX_Y_IN=`expr $(LIN_HEIGHT) - 1`
MAX_Y_OUT=`expr $(LOG_HEIGHT) - 1`
MAX_FREQ_IN=22050
MIN_FREQ_OUT=50
MAX_FREQ_OUT=3200

pattern-log.png: pattern.png Makefile
        -rm -f $@
        time convert -size $(WIDTH)x$(LOG_HEIGHT) xc: $< \
                -virtual-pixel White \
                -interpolate Mesh \
                -fx "freq = $(MIN_FREQ_OUT) * pow($(MAX_FREQ_OUT) / $(MIN_FREQ_OUT), ($(MAX_Y_OUT) - j) / $(MAX_Y_OUT)); \
                     yy = $(MAX_Y_IN) - freq * $(MAX_Y_IN) / $(MAX_FREQ_IN); \
                     v.p{i,yy}" \
                $@

pattern.png: pattern.wav
        sndfile-spectrogram --dyn-range=90 --no-border $< \
                $(WIDTH) $(LIN_HEIGHT) $@

Legacy ImageMagick Discussions Archive

How to perform logarithmic distortion on one axis

How to perform logarithmic distortion on one axis

Re: How to perform logarithmic distortion on one axis

Re: How to perform logarithmic distortion on one axis

Re: How to perform logarithmic distortion on one axis

Re: How to perform logarithmic distortion on one axis

Re: How to perform logarithmic distortion on one axis

Re: How to perform logarithmic distortion on one axis

Re: How to perform logarithmic distortion on one axis

Re: How to perform logarithmic distortion on one axis

Re: How to perform logarithmic distortion on one axis

Re: How to perform logarithmic distortion on one axis

Re: How to perform logarithmic distortion on one axis

Re: How to perform logarithmic distortion on one axis

Re: How to perform logarithmic distortion on one axis

Re: How to perform logarithmic distortion on one axis