again thanks for the valueable infomations, I took a day to read/learn/test what you guys mentioned
so it all looks, my current issues are that my files are all in the PDF format, not easy for IM to process
so my issues are:
1. is it really that impossible (although hard) to achieve my goal based on PDF file, using IM?
2. shall I convert my PDF to what other format to proceed ? if yes, how about IM's convert command?
## for 1 , I did some testing:
according to [this link](
http://www.imagemagick.org/Usage/formats/#vector) you pointed:
> Consequently if you are trying to convert a image from a vector format, to another vector format, IM will essentially rasterize this image at the currently defined resolution or density which will hopefully (but unlikely) be suitable for the output device you intend to use it on.
so I did a test:
for [this file](
http://pinggit.github.io/images/xixiboo ... tate90.pdf)
$ identify a2-IMG_1626-1024x768-rotate90.pdf
a2-IMG_1626-1024x768-rotate90.pdf PDF 768x1024 768x1024+0+0 16-bit Bilevel DirectClass 98.4KB 0.000u 0:00.000
I use this command:
convert a2-IMG_1626-1024x768-rotate90.pdf -density 93x88 a2-IMG_1626-1024x768-rotate90-density.pdf
and I got [this new pdf](
http://pinggit.github.io/images/xixiboo ... tate90.pdf)
$ identify a2-IMG_1626-1024x768-rotate90-density.pdf
a2-IMG_1626-1024x768-rotate90-density.pdf PDF 595x838 595x838+0+0 16-bit Bilevel DirectClass 62.9KB 0.010u 0:00.010
the `-density` value (93x88) are calculated based on the theory I learned from fmw42, so it's (1024/11.69 x 768/8.27)
I look at it, and print them to 2 A4 paper, for me basically, it looks the new file achieved what I hoped:
the size was shrunk to fit in an A4 paper for printing, but still looks as clear as the original one (not sure if it's more clearer)
these are good, but the only thing bad is, the new file is much *bigger (7 times in Bytes!)* than the original old one:
2330996 May 20 20:35 a2-IMG_1626-1024x768-rotate90-density.pdf <-- new file
335121 May 9 16:27 a2-IMG_1626-1024x768-rotate90.pdf
I'm guessing this was due to the internal tricks that IM is currently using , to convert between PDF (vector) images?
another thing I noticed, is that the DPI value we got from `identify` command seems to be 72DPI forever, even for the new file, which should be with DPI of 93x88, so it *IS* a bogus value, although my pdf image just contains one single picture in it.
but then even the pixel dimension it displayed (768x1024) is also bogus?
## for 2, I also did some test
I tested with 2 methods:
1. convert to *jpg* with IM's convert command
2. convert via [pdfimage tool](
http://poppler.freedesktop.org/)
### first test: real picture
I first tried to convert [the same pdf photo](
http://pinggit.github.io/images/xixiboo ... tate90.pdf) as above
$ pdfimages -j a2-IMG_1626-1024x768-rotate90.pdf test
$ convert a2-IMG_1626-1024x768-rotate90.pdf test2.jpg
$ identify test-000.jpg
test-000.jpg JPEG 768x1024 768x1024+0+0 8-bit DirectClass 327KB 0.000u 0:00.000
$ identify test2.jpg
test2.jpg JPEG 768x1024 768x1024+0+0 8-bit DirectClass 227KB 0.000u 0:00.000
$ identify -format "%x x %y" test-000.jpg
72 PixelsPerInch x 72 PixelsPerInch
$ identify -format "%x x %y" test2.jpg
72 PixelsPerInch x 72 PixelsPerInch
the generated files are here:
[via pdfimage](
http://pinggit.github.io/images/xixiboo ... st-000.jpg)
[via convert](
http://pinggit.github.io/images/xixibook-test/test2.jpg)
as I can tell:
* there are no difference, in terms of pixel size and resolution
* both picture looks as clear as the origial one
* just the IM version is 1/3 smaller than pdfimage version
### second test, drawings
I then did another test, this time the PDF under my test is not a real picture, but some drawings:
[oringal file](
http://pinggit.github.io/images/xixiboo ... g_0113.pdf)
again I convert them into jpg via both tools.
interestingly, this time pdfimage generate a lot of files, while IM's convert only generate one file.
$ identify c-pg_0113.pdf
**** Warning: considering '0000000000 XXXXX n' as a free entry.
**** This file had errors that were repaired or ignored.
**** The file was produced by:
**** >>>> itext-paulo-155 (itextpdf.sf.net-lowagie.com) <<<<
**** Please notify the author of the software that produced this
**** file that it does not conform to Adobe's published PDF
**** specification.
c-pg_0113.pdf PDF 612x792 612x792+0+0 16-bit Bilevel DirectClass 61KB 0.000u 0:00.000
$ pdfimages -j c-pg_0113.pdf c-pg_0113-pdfimage
Error (425): Command token too long
$ convert c-pg_0113.pdf -density 74x68 c-pg_0113-density74x68.jpg
**** Warning: considering '0000000000 XXXXX n' as a free entry.
**** This file had errors that were repaired or ignored.
**** The file was produced by:
**** >>>> itext-paulo-155 (itextpdf.sf.net-lowagie.com) <<<<
**** Please notify the author of the software that produced this
**** file that it does not conform to Adobe's published PDF
**** specification.
$ identify c-pg_0113-density74x68.jpg
c-pg_0113.jpg JPEG 612x792 612x792+0+0 8-bit DirectClass 104KB 0.000u 0:00.000
$ identify -format "%x x %y" c-pg_0113-density74x68.jpg
74 PixelsPerInch x 68 PixelsPerInch
generated files:
$ ls -l
total 12704
131928 May 21 17:32 c-pg_0113-pdfimage-000.jpg
6323873 May 21 17:20 c-pg_0113-pdfimage-000.ppm
1053989 May 21 17:32 c-pg_0113-pdfimage-001.pbm
1053989 May 21 17:32 c-pg_0113-pdfimage-002.pbm
1053989 May 21 17:32 c-pg_0113-pdfimage-003.pbm
1053989 May 21 17:32 c-pg_0113-pdfimage-004.pbm
37708 May 21 17:32 c-pg_0113-pdfimage-005.pbm
1053989 May 21 17:32 c-pg_0113-pdfimage-006.pbm
159420 May 21 17:32 c-pg_0113-pdfimage-007.pbm
1053989 May 21 17:32 c-pg_0113-pdfimage-008.pbm
103740 May 21 20:09 c-pg_0113-density74x68.jpg
they can be accessed in here:
[original](
http://pinggit.github.io/images/xixiboo ... g_0113.pdf)
[pdfimage converted]
http://pinggit.github.io/images/xixiboo ... ge-000.jpg
http://pinggit.github.io/images/xixiboo ... ge-001.jpg
http://pinggit.github.io/images/xixiboo ... ge-002.jpg
http://pinggit.github.io/images/xixiboo ... ge-003.jpg
http://pinggit.github.io/images/xixiboo ... ge-004.jpg
http://pinggit.github.io/images/xixiboo ... ge-005.jpg
http://pinggit.github.io/images/xixiboo ... ge-006.jpg
http://pinggit.github.io/images/xixiboo ... ge-007.jpg
http://pinggit.github.io/images/xixiboo ... ge-008.jpg
[IM's convert]
http://pinggit.github.io/images/xixiboo ... y74x68.jpg
the interesting thing is, if you look at the jpg file that `pdfimage` extracted
from the original pdf, some parts of the drawing are missing... and they are put
in other `ppm` files.
My guess, based on the reading, is that the original drawing, when be scanned by
the printer, my printer happened to recognized some part of the drawing and
match them into some kinds of vector objects and hence stored in a different
format as a seperated objects, in the same PDF picture. that's why `pdfimage`
extract them into different files?