OCR layer in converted PDF?

Questions and postings pertaining to the usage of ImageMagick regardless of the interface. This includes the command-line utilities, as well as the C and C++ APIs. Usage questions are like "How do I use ImageMagick to create drop shadows?".
Post Reply
jpaxton
Posts: 1
Joined: 2013-05-18T11:18:53-07:00
Authentication code: 6789

OCR layer in converted PDF?

Post by jpaxton »

Hi All,

Is it possible to keep an OCR layer in a converted PDF?

More specifically, I have a hi-res PDF that contains an OCR layer. I would like to convert the PDF to a lower resolution, while keeping the OCR layer that was generated based on the hi-res version. I tried the following command:

$ convert -density 300x300 -quality 5 -compress jpeg file.pdf newFile.pdf

However, this seems to strip out the OCR layer. Is there any command line switch that will tell convert to keep the OCR? Or do I have to generate a new OCR layer after performing the conversion?

Thanks!
-Joe Paxton
User avatar
anthony
Posts: 8883
Joined: 2004-05-31T19:27:03-07:00
Authentication code: 8675308
Location: Brisbane, Australia

Re: OCR layer in converted PDF?

Post by anthony »

ImageMagick uses Ghostscript to interpret a PDF image (whcih is typically a vector image)
As such it does not even receive the 'OCR' information, or if it did, it converted back to a raster!

Which it received I have no idea.

Basically you are looking at the wrong tool for PDF's. Though it is the Right tool if you want to modify the scanned raster image, either before or after OCR. It just does not understand PDF files as PDF.

Perhaps if you kept the two parts separate and re-combined them again later.
Anthony Thyssen -- Webmaster for ImageMagick Example Pages
https://imagemagick.org/Usage/
Post Reply