@fmw42 I just got around to doing some timings, with some pretty unexpected results.
Not a real benchmark because I didn't rerun many commands, so any difference below a factor of two is probably meaningless. Still, I got interesting results.
FYI files are now 00-raw-scans/NN/page-MMMM.tif, where NN ranges from 00 to 15 and MMMM ranges from 0000 and 0089. Files with the same page-MMMM.tif name are supposed to be composited in the end.
Here we do:
Code: Select all
# Printing the pixel size of each file
time \
for f in $(find 00-raw-scans -type f -name "page-*.tif" | sort); do
echo $f: $(convert $f -format %P info:)
done
# Gives: real 0m43,240s, user 0m23,273s, sys 0m20,096s
An earlier run gave me something around 50 seconds.
Could be the filesystem cache warming up, or a Youtube video that stopped playing, or just chance.
10% variation isn't too uncommon though if you do not do a carefully controlled benchmark, so this isn't too surprising.
Code: Select all
# Printing the pixel size of each file using -ping
time \
for f in $(find 00-raw-scans -type f -name "page-*.tif" | sort); do
echo $f: $(convert -ping $f -format %P info:)
done
# Gives: real 0m11,971s, user, 0m6,313s, sys 0m5,839s
-ping is considerably faster as expected.
I would have expected a better speed-up though. Seems like IM is doing a lot more than reading just the width and the height. Well, whatever IM thinks is "reading image characteristics"... okay, noted, I won't be able to improve anything on that, moving on.
Code: Select all
# Listing all scans for each page
time \
for f in $(find 00-raw-scans -type f -name "page-*.tif" -printf "%f\n" | sort | uniq); do
echo $f: $(find 00-raw-scans -type f -name $f | sort)
done
# Gives: real 0m0,509s, user 0m0,352s, sys 0m0,308s
# Again: real 0m0,718s, user 0m0,490s, sys 0m0,426s
Just a controlling measurement to get an idea how much time the various find commands are actually taking up.
It's a bit sluggish for what it's doing, so I'll probably want to be smarter about getting the list of pages, but I guess that's not of much interest to the IM community
Code: Select all
# Printing the bounding box size for each page
time \
for f in $(find 00-raw-scans -type f -name "page-*.tif" -printf "%f\n" | sort | uniq); do
echo $f: $(convert $(find 00-raw-scans -type f -name $f) -layers trim-bounds -delete 1--1 -format %P info:)
done
# Gives: real 0m39,348s, user 0m17,141s, sys 0m22,219s
# Again: real 0m39,298s, user 0m17,135s, sys 0m22,165s
I varied this a bit with bigger and smaller terminal windows. It used to be a thing that scrolling could place a serious burden on a CPU.
Doesn't seem to be the case anymore. Either because I have enough CPU cores, or because the pixel shoving is offloaded to the GPU.
Code: Select all
# Printing the bounding box size for each page, let's try -ping this time
time \
for f in $(find 00-raw-scans -type f -name "page-*.tif" -printf "%f\n" | sort | uniq); do
echo $f: $(convert -ping $(find 00-raw-scans -type f -name $f) -layers trim-bounds -delete 1--1 -format %P info:)
done
# Gives: real 0m2,175s, user 0m1,230s, sys 0m0,980s
Seems like -layers trim-bounds isn't reading pixels either.
Now that's a nice find.