reducing "size"/complexity of scanned books in PDF format ?

Questions and postings pertaining to the usage of ImageMagick regardless of the interface. This includes the command-line utilities, as well as the C and C++ APIs. Usage questions are like "How do I use ImageMagick to create drop shadows?".
Post Reply
derek.eder
Posts: 1
Joined: 2016-02-23T03:21:11-07:00
Authentication code: 1151

reducing "size"/complexity of scanned books in PDF format ?

Post by derek.eder »

Problem: Often old books on internet "libraries" such as Project Gutenberg are scanned in color to create bitmap based PDF files. These are large in size and practically unreadable on devices like Android tablets because of the extremly long processing / loading times for each page.

Question: Is ImageMagick an appropriate tool to reduce the size and computational foot print of such a PDF document?

I imagine that even a grey scale conversion of the colorspace would be a good start. Bitmap to vector graphics? ...

Thank you.
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: reducing "size"/complexity of scanned books in PDF format ?

Post by snibgo »

Personally, I dislike images stored as one image per page in PDF documents (if that is what these are). It adds an extra layer of complexity, with no benefit, and makes it harder to see what is really happening. So the first step is to extract them into image files, probably with "pdfimages".

Converting to grayscale may reduce size or processing time. Perhaps the images are high quality, lossless compressed. If so, converting to lower quality JPEG may dramatically improve performance, with no noticable loss of quality.
snibgo's IM pages: im.snibgo.com
Post Reply