PDF redaction problem
Posted: 2008-09-20T12:22:26-07:00
Hi, first time poster here. My wife's has been trying to set up a database of previous tests for her school, so we scanned in a bunch of tests into PDF format. Only afterward she found out that she has to strip them of identifying information. Not wanting to use a marker to black out the tests and rescan them, I created a script with ImageMagick to allow for electronic redaction. My problem is that the size is being quite inflated by the process, by a factor of 3-4. I'm hoping that somebody might be able to give me some tips on how to reduce size while maintaining quality.
Here's the series of command I've been using:
pdftk burst test.pdf
foreach(pg_xxxx.pdf)
gswin32c -dBATCH -dNOPAUSE -sDEVICE=png256 -r200x200 -sOutputFile=temp.png pg_xxxx.pdf (I found the r200x200 spec necessary to maintain quality)
(Launch viewer program to find redaction area coordinates. Currently MSPaint since it loads quickly and displays coordinates by default)
convert temp.png -fill black -draw "rectangle 100,100 200,200" temp.png -resize 850x1100 -units PixelsPerInch -density 100 -compress jpeg pg_xxxx.pdf
end foreach
pdftk (all page filenames) cat output redacted_filename.pdf
I chose to resize to 850x1100 because that was the native size within the PDF files and I thought that would yield approximately equal size. I'm still seeing a size increase of about 3x-4x though. Any suggestions? Thanks!
Here's the series of command I've been using:
pdftk burst test.pdf
foreach(pg_xxxx.pdf)
gswin32c -dBATCH -dNOPAUSE -sDEVICE=png256 -r200x200 -sOutputFile=temp.png pg_xxxx.pdf (I found the r200x200 spec necessary to maintain quality)
(Launch viewer program to find redaction area coordinates. Currently MSPaint since it loads quickly and displays coordinates by default)
convert temp.png -fill black -draw "rectangle 100,100 200,200" temp.png -resize 850x1100 -units PixelsPerInch -density 100 -compress jpeg pg_xxxx.pdf
end foreach
pdftk (all page filenames) cat output redacted_filename.pdf
I chose to resize to 850x1100 because that was the native size within the PDF files and I thought that would yield approximately equal size. I'm still seeing a size increase of about 3x-4x though. Any suggestions? Thanks!