Page 1 of 1

Identify with RegEx + Strange Errors

Posted: 2012-05-05T18:46:43-07:00
by jonaz
Hi All,

I have two problems I was hoping to get help with. Both pertain to the attempted execution of the following commands:
(1):

Code: Select all

identify -format "%f %w %h %b \n" 'jewelclub_[a-z0-9]{1,}_(small|medium|large|zoom)\.jpg'
(2):

Code: Select all

identify -format "%f %w %h %b \n" 'jewelclub_*\.jpg'
Sample filenames:
Want to match: jewelclub_abc123_large.jpg
Want to ignore: jewelclub_ABc324_large.jpg, random_abc123_large.jpg, garbled-CrAp___.jpg

First, was hoping to get guidance on using a regular expression inside the identify command. It seems some expressions work (#2) while others don't (#1) and I'm not sure if the command (or the bash shell) takes POSIX (BRE? ERE?) or Perl regular expressions. Either way, was hoping to get guidance on the right syntax to use.

Second, when executing identify on a set of images I'm getting some strange errors. They look like this:

Code: Select all

identify: delegate failed `"html2ps" -U -o "%o" "%i"' @ error/delegate.c/InvokeDelegate/1055.
identify: unable to open image `/var/folders/nb/d4q1xhk9305gzqw6d43flx1c00010n/T/magick-thKKWnCo': No such file or directory @ error/blob.c/OpenBlob/2617.
identify: unable to open file `/var/folders/nb/d4q1xhk9305gzqw6d43flx1c00010n/T/magick-thKKWnCo': No such file or directory @ error/constitute.c/ReadImage/583.
Any advice on either of these issues?

Thanks in advance!

Re: Identify with RegEx + Strange Errors

Posted: 2012-05-05T18:57:19-07:00
by fmw42
I am not expert on most of your questions. Wait for Anthony to reply as he is the unix guru.
identify: delegate failed `"html2ps" -U -o "%o" "%i"' @ error/delegate.c/InvokeDelegate/1055.
This would seem to imply that you need to install some html2ps related delegate or that the delegate does not recognize your string formats.

What version of IM are you using and on what platform. If on Windows, then you probably need to escape % with %%.

See
http://www.imagemagick.org/Usage/windows/
http://www.imagemagick.org/Usage/windows/#conversion

Re: Identify with RegEx + Strange Errors

Posted: 2012-05-05T20:58:57-07:00
by jonaz
fmw42 wrote: This would seem to imply that you need to install some html2ps related delegate or that the delegate does not recognize your string formats.

What version of IM are you using and on what platform. If on Windows, then you probably need to escape % with %%.

See
http://www.imagemagick.org/Usage/windows/
http://www.imagemagick.org/Usage/windows/#conversion
Hey fmw42-

I'm running the latest repository build installed with yum on the Amazon AWS flavor of CentOS UNIX, so I was assuming it was built with all resource dependencies including html2ps, but looks like it wasn't. Here's the full version info:

Code: Select all

$ identify --version
Version: ImageMagick 6.5.4-7 2011-01-17 Q16 OpenMP http://www.imagemagick.org
Copyright: Copyright (C) 1999-2009 ImageMagick Studio LLC
Let me try to rerun the job with html2ps now installed and try again in the morning. Appreciate your help!

Re: Identify with RegEx + Strange Errors

Posted: 2012-05-05T22:24:29-07:00
by fmw42
I am not an expert on this. So I am not even sure about what delegate does that. html2ps does not even show in my list of delegates (convert -list configure). But on my IM 6.7.6.9beta Q16 Mac OSX Snow Leopard, the following works.


convert rose: rose.jpg
convert rose: rose.gif
convert rose: rose.png
identify -format "%f %w %h %b \n" 'rose.*'
rose.gif 70 46 4113B
rose.jpg 70 46 2074B
rose.png 70 46 7009B


You probably should wait to hear from the IM developers on this.

My delegates are:

DELEGATES bzlib fftw fontconfig freetype gs jpeg jng jp2 lcms2 lqr openexr pango png rsvg tiff x11 xml zlib


Your version is somewhat old, but that should not be the issue. But you might re-install a newer version and see if that helps.

Re: Identify with RegEx + Strange Errors

Posted: 2012-05-06T07:12:56-07:00
by jonaz
Thanks for the help fmw42! I reran after installing html2ps and works like a charm (no errors). I'm guessing some of my jpegs in that path were html content with the wrong file extension or something. Now I'm just left with figuring out the right RegEx syntax accepted by identify.

Does anyone know what RegEx flavor identify accepts? Given that /jewelclub*.jpg/ processes fine but /jewelclub_[a-z0-9]{1,}_(small|medium|large|zoom)\.jpg/ is failing I'm assuming it's POSIX BRE or ERE? Any RegEx pros around that can help with my syntax?

Re: Identify with RegEx + Strange Errors

Posted: 2012-05-09T19:50:17-07:00
by anthony
The file name 'globbing', as this is more correctly called, in ImageMagick is restricted to ? and *.
Also [...] is used for a read_modifier postfix extension to the filename, and not for character selection.

IM isn't a BASH shell! But if you don't quote the filename, you can use the shell to expand the glob.


Gobbing is provided for Window DOS users as the DOS shell does not provide globbing of arguments, and for situations where the filename has a CODER: prefix, or a [...] read_modifier postfix. Also IM allows the use of a printf style %d format in the filename for enumerated multi-file reads, and a '@' prefix to read a list of filenames from a file!

In summary...

Code: Select all

      filelist       @filelist.txt
      globbing       use of * and ?
      coder          CODER: prefix
      read_mods      [...] postfix
      enumeration    %d formating
See IM examples, File Handling, Reading.
http://www.imagemagick.org/Usage/files/#read


Future...
In IMv7 globbing is (it has been done) further restricted to image read filenames only, to prevent incorrect usage in other arguments (like label strings), and in implicit write where globbing may cause a file to otherwise overwrite an existing file by accident.

Also I plan on adding use of %[filename:...] image property substitution (percent escape)
But only ones specifically prefixed with 'filename:' just as currently used in write.

I am also thinking of adding some setting to disable any and all these (6) filename modifiers, allowing you to make a given filename, be treated as literal, when security is an issue.

Re: Identify with RegEx + Strange Errors

Posted: 2012-05-10T05:04:34-07:00
by jonaz
anthony wrote:Gobbing is provided for Window DOS users as the DOS shell does not provide globbing of arguments, and for situations where the filename has a CODER: prefix, or a [...] read_modifier postfix. Also IM allows the use of a printf style %d format in the filename for enumerated multi-file reads, and a '@' prefix to read a list of filenames from a file!
Thanks, Anthony! I wasn't aware that I could execute on a file list (should have RTFM). Seems like the easiest solution to my issue would be to just generate a file list with my RegEx engine of choice and run IM using the @ modifier.