Hi,
i have to extract some page from a pdf file for the company i'm working for to tiff. The command
convert -compress zip -density 300x300 +adjoin file.pdf[1] output1.tif
results in a tif file with a frame of transparent around the rest of the image. I think this is because of the OCR acrobat did. The source was a scanned image that was imported to acrobat and used ocr to make the pdf searchable. Acrobat also straightened the site.
How can i prevent that every time i convert a pdf to tif?
I#M on ubuntu and i ise image magick 6.9.7.
Disable transparent background
-
- Posts: 12159
- Joined: 2010-01-23T23:01:33-07:00
- Authentication code: 1151
- Location: England, UK
Re: Disable transparent background
You have "a frame of transparent around the rest of the image". What do you want instead? You might "-trim" to remove it. Or flatten against a white background (or any colour you want): "-background white -layers flatten".
snibgo's IM pages: im.snibgo.com
Re: Disable transparent background
When i open the PDF there is no such frame. The Background is white like the test of the page (except the black text). What i want is to output the page as i see it in the PDF.
-
- Posts: 12159
- Joined: 2010-01-23T23:01:33-07:00
- Authentication code: 1151
- Location: England, UK
Re: Disable transparent background
You might try one of the pdf defines, eg use-trimbox. See http://www.imagemagick.org/script/comma ... php#define
snibgo's IM pages: im.snibgo.com
Re: Disable transparent background
What works is pdf:use-cropbox=true but i don't understand why. There isn't anything cropped.
-
- Posts: 12159
- Joined: 2010-01-23T23:01:33-07:00
- Authentication code: 1151
- Location: England, UK
Re: Disable transparent background
I suspect there is, and that if you read the PDF with a text editor you will see a "/CropBox" specification.Flokker wrote:There isn't anything cropped.
snibgo's IM pages: im.snibgo.com
Re: Disable transparent background
I cannot open the PDF with a text editor. its a pdf not a text file.
Is there no other way to simply extract the pdf "as they are"?
Is there no other way to simply extract the pdf "as they are"?
-
- Posts: 12159
- Joined: 2010-01-23T23:01:33-07:00
- Authentication code: 1151
- Location: England, UK
Re: Disable transparent background
In Windows, PDF files can be opened with Microsoft Wordpad to view the file as raw text. I expect Unix has similar tools.
I don't know what "as they are" means. If the PDF has a cropbox, but also has content outside the cropbox, which version is the "real" one? The content might be registration marks that would be cut off a printed paper version. IM gives you the choice: use a cropbox (if the PDF has one) or don't.
And, of course, PDF files are vector. There is no definitive raster version.
If you need to convert PDF files, I suggest you read up Ghostscript documentation. You might decide to use Ghostscript directly.
I don't know what "as they are" means. If the PDF has a cropbox, but also has content outside the cropbox, which version is the "real" one? The content might be registration marks that would be cut off a printed paper version. IM gives you the choice: use a cropbox (if the PDF has one) or don't.
And, of course, PDF files are vector. There is no definitive raster version.
If you need to convert PDF files, I suggest you read up Ghostscript documentation. You might decide to use Ghostscript directly.
snibgo's IM pages: im.snibgo.com
Re: Disable transparent background
Works with -alpha remove
What i mean is that i want to extract every page as an single image so that the image looks like the page when i open it with a pdf viewer. like when i take a screenshot from the page.
What i mean is that i want to extract every page as an single image so that the image looks like the page when i open it with a pdf viewer. like when i take a screenshot from the page.
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: Disable transparent background
What are you asking? What do you mean by "as they are"?Is there no other way to simply extract the pdf "as they are"?
Some PDF file are totally vector files. Some are raster files imbedded in a vector PDF shell. Imagemagick is a raster only processor. It uses Ghostscript to rasterize any PDF. Thus no vectors remain, only pixels.
You can extract every page of a PNG into individual images.
convert image.pdf +adjoin image.suffix
where suffix can be JPG or PNG, etc.
If you want raw editable text, then use some other tool.