Page 1 of 1

Convert slow down?

Posted: 2016-01-04T14:44:59-07:00
by mcnnr27
I have a problem and I don't even know where to quite start.

I have a script running on Amazon EC2 linux micro instances which does the following:

-Downloads a PDF
-Uses Ghostscript to convert PDF to TIFF
-Uses IM convert to deskew the TIFF
-Runs Tesseract OCR on the TIFF

The script will run flawlessly for about 45 minutes. Then, inevitably, the ImageMagick deskew step will dramatically slow down. It normally takes 20-30sec, then it will jump up to 5-6 minutes. I've tried updating ImageMagick and compiling as Q8 (had a slightly older version with Q16) and that did nothing to change my problem. I've also checked the PDF/TIFFs themselves, there is nothing different about them, they are all 3-10 pages or so.

Version: ImageMagick 6.9.3-0 Q8 x86_64 2016-01-04 http://www.imagemagick.org
Copyright: Copyright (C) 1999-2016 ImageMagick Studio LLC
License: http://www.imagemagick.org/script/license.php
Features: Cipher DPC OpenMP
Delegates (built-in): freetype jng jpeg png tiff web lib

I examined the free disk space and "top" command while the problem was happening and nothing was unusual - whether or not convert is taking 20 sec or 6 minutes - the free space is 66% and the convert process uses nearly 100% CPU and 50% memory.

The T2.micro instance I am running on has 1 core as far as I can tell, and I did find references online to funny stuff with OpenMP in the past, but I don't think that would be it. I'm running everything from a Python script, invoking the shell using subprocess.Popen. I tried to include -debug cache in the convert command but the output is not captured...

command_line_process = subprocess.Popen(
command_line_args,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
)

process_output, process_error = command_line_process.communicate()


I realize this may not be an ImageMagick problem per se, but I would love any ideas as I'm not really sure where to start. I may just kill my instance every hour and start a new one.

Re: Convert slow down?

Posted: 2016-01-04T15:14:55-07:00
by snibgo
mcnnr27 wrote: It normally takes 20-30sec, then it will jump up to 5-6 minutes.
When IM tries to store an image in memory but there isn't enough free memory then it will use disk which is far slower. While it is running, you might look for files named magick* in the temporary directory.

If this occurs for a particular PDF, but stopping the job and restarting at that PDF cures the problem, this would strongly suggest a memory leakage somewhere. I don't know enough about Python etc to suggest where this might be happening.

Re: Convert slow down?

Posted: 2016-01-04T16:06:42-07:00
by mcnnr27
No magick* files. Yes the problem goes away with a restart, picking up with the same PDFs it was choking on before. Sounds like a python memory issue then... Thanks.