Page 1 of 1

OCR layer in converted PDF?

Posted: 2013-05-18T11:26:46-07:00
by jpaxton
Hi All,

Is it possible to keep an OCR layer in a converted PDF?

More specifically, I have a hi-res PDF that contains an OCR layer. I would like to convert the PDF to a lower resolution, while keeping the OCR layer that was generated based on the hi-res version. I tried the following command:

$ convert -density 300x300 -quality 5 -compress jpeg file.pdf newFile.pdf

However, this seems to strip out the OCR layer. Is there any command line switch that will tell convert to keep the OCR? Or do I have to generate a new OCR layer after performing the conversion?

Thanks!
-Joe Paxton

Re: OCR layer in converted PDF?

Posted: 2013-05-20T00:00:14-07:00
by anthony
ImageMagick uses Ghostscript to interpret a PDF image (whcih is typically a vector image)
As such it does not even receive the 'OCR' information, or if it did, it converted back to a raster!

Which it received I have no idea.

Basically you are looking at the wrong tool for PDF's. Though it is the Right tool if you want to modify the scanned raster image, either before or after OCR. It just does not understand PDF files as PDF.

Perhaps if you kept the two parts separate and re-combined them again later.