PDF redaction problem

Questions and postings pertaining to the usage of ImageMagick regardless of the interface. This includes the command-line utilities, as well as the C and C++ APIs. Usage questions are like "How do I use ImageMagick to create drop shadows?".
Post Reply
luxgladius
Posts: 2
Joined: 2008-09-20T11:33:00-07:00

PDF redaction problem

Post by luxgladius »

Hi, first time poster here. My wife's has been trying to set up a database of previous tests for her school, so we scanned in a bunch of tests into PDF format. Only afterward she found out that she has to strip them of identifying information. Not wanting to use a marker to black out the tests and rescan them, I created a script with ImageMagick to allow for electronic redaction. My problem is that the size is being quite inflated by the process, by a factor of 3-4. I'm hoping that somebody might be able to give me some tips on how to reduce size while maintaining quality.

Here's the series of command I've been using:
pdftk burst test.pdf
foreach(pg_xxxx.pdf)
gswin32c -dBATCH -dNOPAUSE -sDEVICE=png256 -r200x200 -sOutputFile=temp.png pg_xxxx.pdf (I found the r200x200 spec necessary to maintain quality)
(Launch viewer program to find redaction area coordinates. Currently MSPaint since it loads quickly and displays coordinates by default)
convert temp.png -fill black -draw "rectangle 100,100 200,200" temp.png -resize 850x1100 -units PixelsPerInch -density 100 -compress jpeg pg_xxxx.pdf
end foreach
pdftk (all page filenames) cat output redacted_filename.pdf

I chose to resize to 850x1100 because that was the native size within the PDF files and I thought that would yield approximately equal size. I'm still seeing a size increase of about 3x-4x though. Any suggestions? Thanks!
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: PDF redaction problem

Post by fmw42 »

I am not sure about all of what you are showing, but I would question your IM command line:

convert temp.png -fill black -draw "rectangle 100,100 200,200" temp.png -resize 850x1100 -units PixelsPerInch -density 100 -compress jpeg pg_xxxx.pdf

This is does not make sense to me. You cannot have temp.png in two places in the command line as you are showing it. What are you trying to do? Do you want to draw a black rectangle at input dimensions or at -resized dimensions? I thought your input was pdf. But you seem to have it as the output. Please clarify.


However, if you are running IM Q16, then add -depth 8 to your command as IM will make a 16-bit result and your input may have been 8-bits from your scanner. Going from 16 to 8 should cut the size by about 1/2. Also you may want to set the jpeg quality. IM has a default of -quality 85, which may be higher than you want or need. So add -quality ## to your command line also.
luxgladius
Posts: 2
Joined: 2008-09-20T11:33:00-07:00

Re: PDF redaction problem

Post by luxgladius »

Right, sorry about that, I was copying from my script file and I had to edit to show what was happening and looks like I messed it up a bit. The actual command has just the first instance of temp.png, e.g. 'convert temp.png -fill black -draw "rectangle 100,100 200,200" -resize 850x1100 -units PixelsPerInch -density 100 -compress jpeg pg_xxxx.pdf'

I'm drawing the rectangle in the old coordinates since that what it's displayed in, and then resizing. The first input is PDF, but I have to convert it to to another format so I can view it and figure out the coordinates where I need to draw the rectangle(s). I do that with the first ghostscript command. After that it is output back to a PDF so that it can concatenated and produce an output file that is identical to the previous file, but with the redaction rectangles in.

Thanks for your suggestions, I will give those a try!
Post Reply