Page 1 of 1

reducing "size"/complexity of scanned books in PDF format ?

Posted: 2016-02-23T04:30:07-07:00
by derek.eder
Problem: Often old books on internet "libraries" such as Project Gutenberg are scanned in color to create bitmap based PDF files. These are large in size and practically unreadable on devices like Android tablets because of the extremly long processing / loading times for each page.

Question: Is ImageMagick an appropriate tool to reduce the size and computational foot print of such a PDF document?

I imagine that even a grey scale conversion of the colorspace would be a good start. Bitmap to vector graphics? ...

Thank you.

Re: reducing "size"/complexity of scanned books in PDF format ?

Posted: 2016-02-23T10:02:23-07:00
by snibgo
Personally, I dislike images stored as one image per page in PDF documents (if that is what these are). It adds an extra layer of complexity, with no benefit, and makes it harder to see what is really happening. So the first step is to extract them into image files, probably with "pdfimages".

Converting to grayscale may reduce size or processing time. Perhaps the images are high quality, lossless compressed. If so, converting to lower quality JPEG may dramatically improve performance, with no noticable loss of quality.