Hi All,
Is it possible to keep an OCR layer in a converted PDF?
More specifically, I have a hi-res PDF that contains an OCR layer. I would like to convert the PDF to a lower resolution, while keeping the OCR layer that was generated based on the hi-res version. I tried the following command:
$ convert -density 300x300 -quality 5 -compress jpeg file.pdf newFile.pdf
However, this seems to strip out the OCR layer. Is there any command line switch that will tell convert to keep the OCR? Or do I have to generate a new OCR layer after performing the conversion?
Thanks!
-Joe Paxton
OCR layer in converted PDF?
- anthony
- Posts: 8883
- Joined: 2004-05-31T19:27:03-07:00
- Authentication code: 8675308
- Location: Brisbane, Australia
Re: OCR layer in converted PDF?
ImageMagick uses Ghostscript to interpret a PDF image (whcih is typically a vector image)
As such it does not even receive the 'OCR' information, or if it did, it converted back to a raster!
Which it received I have no idea.
Basically you are looking at the wrong tool for PDF's. Though it is the Right tool if you want to modify the scanned raster image, either before or after OCR. It just does not understand PDF files as PDF.
Perhaps if you kept the two parts separate and re-combined them again later.
As such it does not even receive the 'OCR' information, or if it did, it converted back to a raster!
Which it received I have no idea.
Basically you are looking at the wrong tool for PDF's. Though it is the Right tool if you want to modify the scanned raster image, either before or after OCR. It just does not understand PDF files as PDF.
Perhaps if you kept the two parts separate and re-combined them again later.
Anthony Thyssen -- Webmaster for ImageMagick Example Pages
https://imagemagick.org/Usage/
https://imagemagick.org/Usage/