Hi! I'm totally new to IM and truly amazed and overwhelmed by its richness.
I want to scan several textbooks for my kid & convert to Kindle PDF, since her backpack is way too heavy.
I need to:
1. crop (I think I'll do it with interactive software since the pages are not well aligned)
2. split the page X.jpg in two, producing two pages (eg X0.jpg and X1.jpg)
3. reduce to 4-bit gray (I'll try to scan better for the next book)
4. What's the best image format to embed in PDF? Maybe PNG?
Could you please give me some pointers? For 3 I guess I need to read http://www.imagemagick.org/Usage/quantize/
An example page is at http://personal.sirma.bg/vladimir/page000.jpg
How to split page in two and reduce color?
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: How to split page in two and reduce color?
valexiev wrote:Hi! I'm totally new to IM and truly amazed and overwhelmed by its richness.
I want to scan several textbooks for my kid & convert to Kindle PDF, since her backpack is way too heavy.
I need to:
1. crop (I think I'll do it with interactive software since the pages are not well aligned)
2. split the page X.jpg in two, producing two pages (eg X0.jpg and X1.jpg)
3. reduce to 4-bit gray (I'll try to scan better for the next book)
4. What's the best image format to embed in PDF? Maybe PNG?
Could you please give me some pointers? For 3 I guess I need to read http://www.imagemagick.org/Usage/quantize/
An example page is at http://personal.sirma.bg/vladimir/page000.jpg
Are the pages well enough aligned that you can find some common crop position to split the images in two non-overlapping parts at the same X position? If so then you can write a script to loop over every file, crop, trim excess white and pad with a small amount of white if desired, reduce to 4-bit gray and then convert to PDF without having to save the file in any specific format if you don't want to.
See
http://www.imagemagick.org/Usage/crop/#crop
http://www.imagemagick.org/Usage/crop/#trim
http://www.imagemagick.org/Usage/crop/#border
http://www.imagemagick.org/script/comma ... colorspace
http://www.imagemagick.org/script/comma ... php#colors
http://www.imagemagick.org/Usage/quantize/#colors
http://www.imagemagick.org/script/comma ... ns.php#lat
Try something like one of these two, which split the image equally in two halves. The first leaves it as 4 grayshades. The second binarizes to b/w but is able to remove most of the background gray.
convert page000.jpg -crop 2x1@ +repage -colorspace gray +dither -colors 4 page000a_%d.png
or
convert page000.jpg -crop 2x1@ +repage -colorspace gray -negate -lat 15x15+10% -negate page000b_%d.png
OR to PDF
convert page000.jpg -crop 2x1@ +repage -colorspace gray +dither -colors 4 page000a.pdf
or
convert page000.jpg -crop 2x1@ +repage -colorspace gray -negate -lat 15x15+10% -negate page000b.pdf
- anthony
- Posts: 8883
- Joined: 2004-05-31T19:27:03-07:00
- Authentication code: 8675308
- Location: Brisbane, Australia
Re: How to split page in two and reduce color?
Also as the pages are scanned, they will be in raster image form, even when later saved as PDF.
See A word about Vector Image formats
http://www.imagemagick.org/Usage/formats/#vector
As such you may like to consider exactly what the resolution and pixel size for the final PDF (containing raster images) is most appropriate kindle.
We can not help you with the kindle itself, but if you do discover that information, adding that information here, or a pointer to that information for future reader would be helpful.
See A word about Vector Image formats
http://www.imagemagick.org/Usage/formats/#vector
As such you may like to consider exactly what the resolution and pixel size for the final PDF (containing raster images) is most appropriate kindle.
We can not help you with the kindle itself, but if you do discover that information, adding that information here, or a pointer to that information for future reader would be helpful.
Anthony Thyssen -- Webmaster for ImageMagick Example Pages
https://imagemagick.org/Usage/
https://imagemagick.org/Usage/
Re: How to split page in two and reduce color?
Thanks for the quick reply!
It has a very decent PDF reader by Adobe that does a good job at scaling.
It's important to cut 2-up pages and crop them to the bare bones, to get maximum reading size.
I even crop out the page number, and use Acrobat's "Number pages" to put the same numbers as in the original.
The jpeg above clearly uses bad scanning choices, but my kid scanned 160 pages so I don't want to throw them away.
I'm on Windows7 64-bit and have CYGWIN_NT-6.1-WOW64 1.7.9(0.237/5/3) 2011-03-29.
That includes package ImageMagick-6.4.0.6 (there isn't a more recent at cygwin.com).
The command eats up a lot of memory, then gives an error:
Which binary distrib should I upgrade to? I find these at ftp://gd.tuwien.ac.at/pub/graphics/Imag ... /binaries/
What's Q8 vs Q16?
I'll now try ImageMagick-6.7.2-7-Q16-windows-x64-dll.exe
The Kindle DX is 825 x 1200 px, 150 dpi, 4-bit gray (E-Ink, 9.7" diagonal, 5.5 x 8 in, 140 x 203 mm, 0.682 aspect).what resolution and pixel size is most appropriate for kindle
It has a very decent PDF reader by Adobe that does a good job at scaling.
It's important to cut 2-up pages and crop them to the bare bones, to get maximum reading size.
I even crop out the page number, and use Acrobat's "Number pages" to put the same numbers as in the original.
The jpeg above clearly uses bad scanning choices, but my kid scanned 160 pages so I don't want to throw them away.
I'm on Windows7 64-bit and have CYGWIN_NT-6.1-WOW64 1.7.9(0.237/5/3) 2011-03-29.
That includes package ImageMagick-6.4.0.6 (there isn't a more recent at cygwin.com).
The command eats up a lot of memory, then gives an error:
Code: Select all
$ convert page000.jpg -crop 2x1@ +repage -colorspace gray +dither -colors 4 page000a_%d.png
convert: UnableToConcatenateString `Cannot allocate memory'.
Warning: recursive semaphore lock detected!
Code: Select all
ImageMagick-6.7.2-7-Q16-windows-dll.exe 16.8 MB 9/17/11 4:45:00 PM
ImageMagick-6.7.2-7-Q16-windows-static.exe 34.6 MB 9/17/11 4:46:00 PM
ImageMagick-6.7.2-7-Q16-windows-x64-dll.exe 16.5 MB 9/17/11 4:47:00 PM
ImageMagick-6.7.2-7-Q16-windows-x64-static.exe 36.5 MB 9/17/11 4:48:00 PM
ImageMagick-6.7.2-7-Q8-windows-dll.exe 16.8 MB 9/17/11 4:49:00 PM
ImageMagick-6.7.2-7-Q8-windows-static.exe 34.5 MB 9/17/11 4:50:00 PM
ImageMagick-6.7.2-Q16-windows.zip 43.1 MB 9/17/11 7:13:00 PM
ImageMagick-i686-pc-cygwin.tar.gz 43.0 MB 9/10/11 4:29:00 PM
I'll now try ImageMagick-6.7.2-7-Q16-windows-x64-dll.exe
Re: How to split page in two and reduce color?
ImageMagick-6.7.2-7-Q16-windows-x64-dll.exe worked like a charm.
The result is satisfactory using both methods, and I have to try it on Kindle to choose one.
Original jpeg: 1.5M: http://personal.sirma.bg/vladimir/page000.jpg
+dither (gray): 200k: http://personal.sirma.bg/vladimir/page000a_1.png
-negate (binarized): 60k: http://personal.sirma.bg/vladimir/page000b_1.png
A minor point: page000a_1.png is reported as 8 bpp, not 4bpp:
- imagine.exe: 8 BPP
- identify -verbose: Depth: 8-bit; Channel depth: gray: 8-bit; Colors: 4
I guess this has minimal effect on file size since the empty bits are compressed away?
I'll now read up on the documentation to figure out the commands.
I may also try expanding to 300dpi before binarizing, to see if Kindle's PDF reader can do something good with the extra pixels.
Thanks for your help!!
The result is satisfactory using both methods, and I have to try it on Kindle to choose one.
Original jpeg: 1.5M: http://personal.sirma.bg/vladimir/page000.jpg
+dither (gray): 200k: http://personal.sirma.bg/vladimir/page000a_1.png
-negate (binarized): 60k: http://personal.sirma.bg/vladimir/page000b_1.png
A minor point: page000a_1.png is reported as 8 bpp, not 4bpp:
- imagine.exe: 8 BPP
- identify -verbose: Depth: 8-bit; Channel depth: gray: 8-bit; Colors: 4
I guess this has minimal effect on file size since the empty bits are compressed away?
I'll now read up on the documentation to figure out the commands.
I may also try expanding to 300dpi before binarizing, to see if Kindle's PDF reader can do something good with the extra pixels.
Thanks for your help!!
- anthony
- Posts: 8883
- Joined: 2004-05-31T19:27:03-07:00
- Authentication code: 8675308
- Location: Brisbane, Australia
Re: How to split page in two and reduce color?
I would probably try not to threshold the image, but preserve the anti-aliased edges.
Also the reason the image is 8bpp is that it is grayscale not binary, 1bpp is binary.
That first page however has color in it which make it non-grayscale.
What I would do is first try to clean up the background. For example see Composite Division
http://www.imagemagick.org/Usage/compose/#divide
I may also at this point try to remove any extra scan noise by using -morphology Smooth Square. (or perhaps just open or close instead of smooth. Yes morphology was designed with binary images in mind but it works well with greyscale images too.
Now to separate the images I would use a technique of vertical compression. That is use -resize {width}x1\! where {width} is the current image width. The resulting image is a simple line of pixels that should let you algorithmically determine the gap between the two pages so you can separate them.
After that it is just a optional -deskew and saving the page images as you like.
NOTE all the above has been added to specialised page scanning software. ImageMagick provides low level tools to DIY thing exactly as you like, but other software may be more suited to the more specialised task. And yes there are free versions too.
One free version I have found is scantailor
http://scantailor.sourceforge.net/
I have not tried it but it seems to be something like what you are after.
Also see the DIY Book scan Forum for Scan Tailor (or other book scanning software!)
http://www.diybookscanner.org/forum/viewforum.php?f=8
I have noted that Imagemagick is mentioned regularly in those forums, as a low level image processor, that a number of book scanners use to do there tasks
Also the reason the image is 8bpp is that it is grayscale not binary, 1bpp is binary.
That first page however has color in it which make it non-grayscale.
What I would do is first try to clean up the background. For example see Composite Division
http://www.imagemagick.org/Usage/compose/#divide
I may also at this point try to remove any extra scan noise by using -morphology Smooth Square. (or perhaps just open or close instead of smooth. Yes morphology was designed with binary images in mind but it works well with greyscale images too.
Now to separate the images I would use a technique of vertical compression. That is use -resize {width}x1\! where {width} is the current image width. The resulting image is a simple line of pixels that should let you algorithmically determine the gap between the two pages so you can separate them.
After that it is just a optional -deskew and saving the page images as you like.
NOTE all the above has been added to specialised page scanning software. ImageMagick provides low level tools to DIY thing exactly as you like, but other software may be more suited to the more specialised task. And yes there are free versions too.
One free version I have found is scantailor
http://scantailor.sourceforge.net/
I have not tried it but it seems to be something like what you are after.
Also see the DIY Book scan Forum for Scan Tailor (or other book scanning software!)
http://www.diybookscanner.org/forum/viewforum.php?f=8
I have noted that Imagemagick is mentioned regularly in those forums, as a low level image processor, that a number of book scanners use to do there tasks
Anthony Thyssen -- Webmaster for ImageMagick Example Pages
https://imagemagick.org/Usage/
https://imagemagick.org/Usage/
Re: How to split page in two and reduce color?
ScanTailor worked perfectly!