Page 1 of 2

Crop image from pdf

Posted: 2016-10-10T13:07:23-07:00
by ggoutfitters
Hello!

I am trying to crop an image from a pdf and want to figure out the best way to do so.

PDF1: https://drive.google.com/open?id=0B1r4d ... U9OUG1XMGM
PDF2: https://drive.google.com/open?id=0B1r4d ... mRpTGhJbW8

How could I crop out the logo (with the best quality possible) and center it both vertically and horizontally?

Any help on this would be great!

Thanks!

Re: Crop image from pdf

Posted: 2016-10-10T14:18:46-07:00
by fmw42
If I assume all your PDF files are the same format, then try in unix syntax. If windows, then replace \ with ^ and you may have to double the %% if in a batch script.

Code: Select all

convert -density 288 "EMBARTWORK-120530 BLACKSHORE LOGO.PDF" \
-crop 2446x1600+0+275 +repage -trim +repage -resize 25% "EMBARTWORK-120530 BLACKSHORE LOGO.PNG"

convert -density 288 "EMBARTWORK-124627 ENGAGE US 15.PDF" \
-crop 2446x1600+0+275 +repage -trim +repage -resize 25% "EMBARTWORK-124627 ENGAGE US 15.PNG"

Please, always provide your IM version and platform when asking questions, since syntax may differ. Also provide your exact command line and if possible your images,

See the top-most post in this forum "IMPORTANT: Please Read This FIRST Before Posting" at viewtopic.php?f=1&t=9620

For novices, see

viewtopic.php?f=1&t=9620
http://www.imagemagick.org/script/comma ... essing.php
http://www.imagemagick.org/Usage/reference.html
http://www.imagemagick.org/Usage/


EDIT: I forgot to add the following and was reminded by snibgo's message below:

Since vector files have no size, you have to specify a density. The higher the density the larger the PNG will be. So you can add -density > 72 to get a larger file and more detail. If you want to keep about the same size as you see in the PDF, then multiply the density value by some factor (here 4) to get 288, then resize by the inverse (1/4=25%). The larger the density value the more detail, but the longer the processing time. So try different values.

EDIT2: Note that the crop dimensions used above are based upon using -density 288. If you change the density, then you will have to compute new crop dimensions.

Re: Crop image from pdf

Posted: 2016-10-10T14:36:38-07:00
by snibgo
As Fred says. However, "with the best quality possible" is fairly meaningless, as the image is vector with lots of detail. More detail is revealed with "-density 600".

Re: Crop image from pdf

Posted: 2016-10-10T19:43:01-07:00
by fmw42
See my edits in my earlier post above.

Re: Crop image from pdf

Posted: 2016-10-11T06:50:43-07:00
by ggoutfitters
Awesome, the solution above gives me the end result I was looking for!

Now is there a way to do the same thing using python instead of the command line?

I currently have imagemagick (Version: 8:6.7.7.10-6ubuntu3.1) installed.

Code: Select all

Package: imagemagick
Status: install ok installed
Priority: optional
Section: graphics
Installed-Size: 448
Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
Architecture: amd64
Multi-Arch: foreign
Version: 8:6.7.7.10-6ubuntu3.1
Depends: libc6 (>= 2.2.5), libmagickcore5 (>= 8:6.7.7.10), libmagickwand5 (>= 8:6.7.7.10), hicolor-icon-theme
Recommends: libmagickcore5-extra, ghostscript, netpbm
Suggests: imagemagick-doc, autotrace, cups-bsd | lpr | lprng, curl, enscript, ffmpeg, gimp, gnuplot, grads, groff-base, hp2xx, html2ps, libwmf-bin, mplayer, povray, radiance, sane-utils, texlive-base-bin, transfig, xdg-utils, ufraw-batch
Description: image manipulation programs
 ImageMagick is a software suite to create, edit, and compose bitmap images.
 It can read, convert and write images in a variety of formats (over 100)
 including DPX, EXR, GIF, JPEG, JPEG-2000, PDF, PhotoCD, PNG, Postscript,
 SVG, and TIFF. Use ImageMagick to translate, flip, mirror, rotate, scale,
 shear and transform images, adjust image colors, apply various special
 effects, or draw text, lines, polygons, ellipses and Bézier curves.
 All manipulations can be achieved through shell commands as well as through
 an X11 graphical interface (display).
Homepage: http://www.imagemagick.org/
Original-Maintainer: ImageMagick Packaging Team <pkg-gmagick-im-team@lists.alioth.debian.org>

Re: Crop image from pdf

Posted: 2016-10-11T08:50:42-07:00
by fmw42
You can use PythonMagick or use some python call to run the IM command line.

Re: Crop image from pdf

Posted: 2016-10-11T13:21:30-07:00
by ggoutfitters
Thanks, fmw42 and snibgo

I was able to crop the image properly in python! However, I ran into a problem as I tested more and more PDFs - Not all of the them have the same format. Some of them are 2 pages and the logo area is not always the same size

Here are 3 PDFs I was having trouble with:
https://drive.google.com/open?id=0B1r4d ... 0ZtSldDX2s
https://drive.google.com/open?id=0B1r4d ... VozUkR2ejg
https://drive.google.com/open?id=0B1r4d ... 1p3cjg1TGc

Is there a way to scan the PDF from top to bottom (looking for the 2 horizontal lines above and below the logo) and crop the area between them?

Thanks again for your help!

Re: Crop image from pdf

Posted: 2016-10-11T14:51:58-07:00
by fmw42
If we assume that your logo is on the first page and is contiguous, then you can use connected components to find the largest non-white area and extract the bounding box and use that to crop your image. If you are satisfied with the quality at the default 72 dpi, then in Unix syntax:

Code: Select all

infile="boston.pdf"
bbox=`convert "$infile[0]" -background white -flatten -alpha off -negate -threshold 0 \
-define connected-components:verbose=true -connected-components 8 null: |\
sed -n '3p' | sed 's/^[ ]*//' | cut -d\  -f2`
echo "$bbox"
convert "$infile[0]" -crop "$bbox" +repage result.png
Sorry, I cannot translate the unix part of the code (after the null:) to Windows syntax. It is just getting the second line of the connected components list and extracting the bounding box. See http://magick.imagemagick.org/script/co ... onents.php

Re: Crop image from pdf

Posted: 2016-10-12T06:38:41-07:00
by ggoutfitters
I am getting this error when I run the code you provided:

convert.im6: unrecognized option `-connected-components' @ error/convert.c/ConvertImageCommand/1107

Any ideas why?

Re: Crop image from pdf

Posted: 2016-10-12T09:16:02-07:00
by fmw42
Your 6.7.7.10 is an older version of IM before -connected-components was introduced at 6.8.9.10. See viewtopic.php?f=4&t=26493

Re: Crop image from pdf

Posted: 2016-10-12T16:40:08-07:00
by ggoutfitters
Thanks, I upgraded IM to the latest version (7.0.3-4) and it fixed the issue.

The code you posted worked great for the boston.pdf but cropped the wrong area for the other two.

Re: Crop image from pdf

Posted: 2016-10-13T09:50:11-07:00
by fmw42
Sorry, I see that there are more larger black areas than the one white one for the object. So we have to search the list to fine the first (and largest) white one. Try this:

Code: Select all

infile="fifty.pdf"
convert "$infile[0]" -background white -flatten -alpha off -negate -threshold 0 \
-define connected-components:verbose=true -connected-components 8 null: |\
tail -n +2 | while read line; do
bbox=`echo "$line" | sed 's/^[ ]*//' | cut -d\  -f2`
color=`echo "$line" | sed 's/^[ ]*//' | cut -d\  -f5`
#echo "$bbox=$bbox; color=$color;"
[ "$color" = "srgb(100%,100%,100%)" ] && break
done
echo "$bbox"
convert "$infile[0]" -crop "$bbox" +repage result.png

Re: Crop image from pdf

Posted: 2016-10-17T07:21:37-07:00
by ggoutfitters
Thanks, I got this error when trying to run the code:

convert: invalid argument for option '-crop': @ error/convert.c/ConvertImageCommand/1215.

I also noticed that (echo "$bbox") did not echo anything which is probably the reason for this error

Re: Crop image from pdf

Posted: 2016-10-17T09:34:03-07:00
by fmw42
Try this

Code: Select all

infile="fifty.pdf"
convert "$infile[0]" -background white -flatten -alpha off -negate -threshold 0 \
-define connected-components:verbose=true -connected-components 8 null: |\
tail -n +2 | while read line; do
bbox=`echo "$line" | sed 's/^[ ]*//' | cut -d\  -f2`
color=`echo "$line" | sed 's/^[ ]*//' | cut -d\  -f5`
#echo "bbox=$bbox; color=$color;"
if [ "$color" = "srgb(100%,100%,100%)" ]; then
convert "$infile[0]" -crop "$bbox" +repage result.png
break
fi
done

Re: Crop image from pdf

Posted: 2016-10-21T11:37:24-07:00
by ggoutfitters
Thanks for all your help fmw42! This seems to do the job.

The only thing that needed to be changed from the code above was line 8
from:

Code: Select all

if [ "$color" = "srgb(100%,100%,100%)" ]; then
to:

Code: Select all

if [ "$color" = "srgb(100.001%,100.001%,100.001%)" ] || [ "$color" = "srgb(100.002%,100.002%,100.002%)" ]; then
If there is a more efficient way of doing this, let me know.

Thanks again!!!