Convert slow down?
Posted: 2016-01-04T14:44:59-07:00
I have a problem and I don't even know where to quite start.
I have a script running on Amazon EC2 linux micro instances which does the following:
-Downloads a PDF
-Uses Ghostscript to convert PDF to TIFF
-Uses IM convert to deskew the TIFF
-Runs Tesseract OCR on the TIFF
The script will run flawlessly for about 45 minutes. Then, inevitably, the ImageMagick deskew step will dramatically slow down. It normally takes 20-30sec, then it will jump up to 5-6 minutes. I've tried updating ImageMagick and compiling as Q8 (had a slightly older version with Q16) and that did nothing to change my problem. I've also checked the PDF/TIFFs themselves, there is nothing different about them, they are all 3-10 pages or so.
Version: ImageMagick 6.9.3-0 Q8 x86_64 2016-01-04 http://www.imagemagick.org
Copyright: Copyright (C) 1999-2016 ImageMagick Studio LLC
License: http://www.imagemagick.org/script/license.php
Features: Cipher DPC OpenMP
Delegates (built-in): freetype jng jpeg png tiff web lib
I examined the free disk space and "top" command while the problem was happening and nothing was unusual - whether or not convert is taking 20 sec or 6 minutes - the free space is 66% and the convert process uses nearly 100% CPU and 50% memory.
The T2.micro instance I am running on has 1 core as far as I can tell, and I did find references online to funny stuff with OpenMP in the past, but I don't think that would be it. I'm running everything from a Python script, invoking the shell using subprocess.Popen. I tried to include -debug cache in the convert command but the output is not captured...
command_line_process = subprocess.Popen(
command_line_args,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
)
process_output, process_error = command_line_process.communicate()
I realize this may not be an ImageMagick problem per se, but I would love any ideas as I'm not really sure where to start. I may just kill my instance every hour and start a new one.
I have a script running on Amazon EC2 linux micro instances which does the following:
-Downloads a PDF
-Uses Ghostscript to convert PDF to TIFF
-Uses IM convert to deskew the TIFF
-Runs Tesseract OCR on the TIFF
The script will run flawlessly for about 45 minutes. Then, inevitably, the ImageMagick deskew step will dramatically slow down. It normally takes 20-30sec, then it will jump up to 5-6 minutes. I've tried updating ImageMagick and compiling as Q8 (had a slightly older version with Q16) and that did nothing to change my problem. I've also checked the PDF/TIFFs themselves, there is nothing different about them, they are all 3-10 pages or so.
Version: ImageMagick 6.9.3-0 Q8 x86_64 2016-01-04 http://www.imagemagick.org
Copyright: Copyright (C) 1999-2016 ImageMagick Studio LLC
License: http://www.imagemagick.org/script/license.php
Features: Cipher DPC OpenMP
Delegates (built-in): freetype jng jpeg png tiff web lib
I examined the free disk space and "top" command while the problem was happening and nothing was unusual - whether or not convert is taking 20 sec or 6 minutes - the free space is 66% and the convert process uses nearly 100% CPU and 50% memory.
The T2.micro instance I am running on has 1 core as far as I can tell, and I did find references online to funny stuff with OpenMP in the past, but I don't think that would be it. I'm running everything from a Python script, invoking the shell using subprocess.Popen. I tried to include -debug cache in the convert command but the output is not captured...
command_line_process = subprocess.Popen(
command_line_args,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
)
process_output, process_error = command_line_process.communicate()
I realize this may not be an ImageMagick problem per se, but I would love any ideas as I'm not really sure where to start. I may just kill my instance every hour and start a new one.