I have a book scan (.pdf) (scanned at 300 dpi) that I want to chop the edges off. Normally I use:
convert -gravity West -chop 150x0 test.pdf test2.pdf
However, for this particular file, a nice-looking scan is turned into such a poor-looking result, it is difficult to read and impossible to do OCR on.
I have checked the DPI of the original and the result file like this:
identify -format "%w x %h %x x %y" test.pdf
identify -format "%w x %h %x x %y" test2.pdf
and the original is: 1190 x 841 72 x 72
the result is: 1040 x 841 72 x 72
Which I guess is as expected (although I don't understand the 72 as I scanned at 300). But still the result has very poor DPI, it looks even poorer than the 200dpi 'abcde' on this page:
https://www.library.cornell.edu/preserv ... on-04.html
Any help is very much appreciated! I am a beginner with IM so maybe this is a stupid question, but I really don't understand what is going on.
convert changes resolution unwanted
-
- Posts: 12159
- Joined: 2010-01-23T23:01:33-07:00
- Authentication code: 1151
- Location: England, UK
Re: convert changes resolution unwanted
A number of points.
1. If you do the scanning, don't wrap the bitmaps inside a PDF.
2. When reading PDFs with IM, you almost certainly need a "-density" setting.
3. If you have bitmaps wrapped in a PDF, the easiest way to get them out is with pdf images. This can give you the exact pixels that are in the embedded bitmap, without messing around with "-density".
4. As you are reading and writing PDFs, working directly with Ghostscript may be easier. But these are not vector images, so you won't lose quality by working with bitmaps.
Identify shows "72 dpi" for the PDF wrapper, not for any embedded bitmaps. And the size it shows is for the PDF page at the specified density, NOT the number of pixels in any embedded bitmaps.
1. If you do the scanning, don't wrap the bitmaps inside a PDF.
2. When reading PDFs with IM, you almost certainly need a "-density" setting.
3. If you have bitmaps wrapped in a PDF, the easiest way to get them out is with pdf images. This can give you the exact pixels that are in the embedded bitmap, without messing around with "-density".
4. As you are reading and writing PDFs, working directly with Ghostscript may be easier. But these are not vector images, so you won't lose quality by working with bitmaps.
Identify shows "72 dpi" for the PDF wrapper, not for any embedded bitmaps. And the size it shows is for the PDF page at the specified density, NOT the number of pixels in any embedded bitmaps.
snibgo's IM pages: im.snibgo.com
Re: convert changes resolution unwanted
Thanks snibgo.
Creating PDFs is the only way my current facilities allow me to scan. I have fiddled around with -density (72x72, 300x300), but to no avail. I've installed and tried Ghostscript yesterday, but it is not obvious for me how to use the relevant -dUseCropBox command. I am on a Mac and documentation is limited.
For instance, this command basically goes through each page of my file and saves it as cropped.pdf:
gs -sDEVICE=pdfwrite -dUseCropBox -sOutputFile=cropped.pdf - < to_be_cropped.pdf
But obviously doesn't crop because I haven't specified what to crop.
For now, I am cropping in Preview and then run it through Ghostscript in the hope that that actually gets rid of the black photo-copy margins (In Preview, crop just hides the margins). I would really like to be able to do it in IM or Ghostscript. It's really frustrating that my previous solution doesn't work for this file, it makes me apprehensive about finding another solution because this will happen more often.
Creating PDFs is the only way my current facilities allow me to scan. I have fiddled around with -density (72x72, 300x300), but to no avail. I've installed and tried Ghostscript yesterday, but it is not obvious for me how to use the relevant -dUseCropBox command. I am on a Mac and documentation is limited.
For instance, this command basically goes through each page of my file and saves it as cropped.pdf:
gs -sDEVICE=pdfwrite -dUseCropBox -sOutputFile=cropped.pdf - < to_be_cropped.pdf
But obviously doesn't crop because I haven't specified what to crop.
For now, I am cropping in Preview and then run it through Ghostscript in the hope that that actually gets rid of the black photo-copy margins (In Preview, crop just hides the margins). I would really like to be able to do it in IM or Ghostscript. It's really frustrating that my previous solution doesn't work for this file, it makes me apprehensive about finding another solution because this will happen more often.
-
- Posts: 12159
- Joined: 2010-01-23T23:01:33-07:00
- Authentication code: 1151
- Location: England, UK
Re: convert changes resolution unwanted
If you put an example PDF somewhere like dropbox.com and paste a link here, someone can take a look.
If they are scanned at one bitmap per page, the easiest and most effective processing may be to extract them with pdfimages, then manipulate them with IM. If you want to bundle them back into a PDF, you could use IM (driving Ghostscript) for that.
If they are scanned at one bitmap per page, the easiest and most effective processing may be to extract them with pdfimages, then manipulate them with IM. If you want to bundle them back into a PDF, you could use IM (driving Ghostscript) for that.
snibgo's IM pages: im.snibgo.com
Re: convert changes resolution unwanted
Sorry for getting back to this after all this time. I get the same problem with all scans made by this scanner.
I've put an example here:
original:
https://www.dropbox.com/s/0k575roq7bp4i ... W.pdf?dl=0
and the output that I get with simply trying: convert Welsh_AiW.pdf test.pdf
https://www.dropbox.com/s/cwr3379uzk7wyhl/test.pdf?dl=0
If anybody can help with this issue, that would be wonderful.
(For now I have cropped the file in Preview, but that isn't a solution as the margins may pop up elsewhere.)
I've put an example here:
original:
https://www.dropbox.com/s/0k575roq7bp4i ... W.pdf?dl=0
and the output that I get with simply trying: convert Welsh_AiW.pdf test.pdf
https://www.dropbox.com/s/cwr3379uzk7wyhl/test.pdf?dl=0
If anybody can help with this issue, that would be wonderful.
(For now I have cropped the file in Preview, but that isn't a solution as the margins may pop up elsewhere.)