Page 1 of 1
[Windows] Can not open/read files with non-asci filenames
Posted: 2011-08-05T04:18:35-07:00
by bananas2
1. Set windows locale to japanese and try to process some images (png for example) with japanese filename
2. imagemagick will report improper header (due to ReadBlob count 0)
This happens because utf8->utf16(widechar) conversion is used, BUT windows does not use utf8 for CLI and imagemagick is not even compiled with unicode(utf16) support (no conversion is needed in this case).
argv are encoded with system default for non-unicode apps. So there are 2 ways:
1. compile with unicode support (see msdn wmain probably, i dont know)
2. use MultiByteToWideChar function instead of ConvertUTF8ToUTF16
Code: Select all
//tested with 932 codepage
wchars_num = MultiByteToWideChar(CP_ACP , 0 , path , -1, NULL , 0 );
unicode_path=(wchar_t *) AcquireQuantumMemory(wchars_num, sizeof(wchar_t));
MultiByteToWideChar( CP_ACP , 0 , path , -1, unicode_path , wchars_num );
see openmagickstream and getpathattributes, probably there are some other places where ConvertUTF8ToUTF16 is used.
Re: [Windows] Can not open/read files with non-asci filename
Posted: 2011-08-05T06:24:11-07:00
by magick
We'll get your patch into ImageMagick 6.7.1-3 Beta by sometime tomorrow. Thanks.
Re: [Windows] Can not open/read files with non-asci filename
Posted: 2011-08-09T19:08:52-07:00
by Jason S
This change just seems like a bad idea to me. Having a function like OpenMagickStream() support Unicode filenames is a good thing, and you've broken that.
I would either
1) Implement wmain() instead of main(), and convert the (UTF-16) command-line parameters to UTF-8 using WideCharToMultiByte(CP_UTF8, ...).
or
2) Convert the command-line parameters to UTF-8 using MultiByteToWideChar(CP_ACP, ...) followed by WideCharToMultiByte(CP_UTF8, ...). This is worse than option (1) because you still aren't supporting characters that aren't in the user's current codepage. But it's a step in the right direction, and it can easily be improved later.
Re: [Windows] Can not open/read files with non-asci filename
Posted: 2011-08-09T19:50:30-07:00
by magick
Thanks. We'll revert the patch and rethink file handling in Windows. We're primarily Linux developers and have less confidence when coding for Windows. Patches from the Windows user community are welcome.
Re: [Windows] Can not open/read files with non-asci filename
Posted: 2011-08-10T05:23:52-07:00
by bananas2
i agree that wmain is the only right way to go, but it is open source so it was enough for me to support other locales at least with MultiByteToWideChar.
Was it really possible to open files (standalone identify and convert) with unicode filename under windows? Command line uses system default codepage for non unicode apps if main is implemented (utf8 is meaningless here), we launch imagemagick from java and it also fails to open japanese files (probably internally arguments are converted to utf16/system default but due to sub main utf8 is wrong again). So I dont think that this patch has broken anything.
Convert the command-line parameters to UTF-8 using MultiByteToWideChar(CP_ACP, ...) followed by WideCharToMultiByte(CP_UTF8, ...).
UTF-8 probably should only be used only to convert label/meta-data commands
It would be good to know how it really works.
Re: [Windows] Can not open/read files with non-asci filename
Posted: 2011-08-10T07:18:26-07:00
by magick
Consider coding up a wmain() that converts the argv to UTF8 which we can then pass to ImageMagick. If post it here, we will get the patch into the next release of ImageMagick.
Re: [Windows] Can not open/read files with non-asci filename
Posted: 2011-08-10T08:39:07-07:00
by Jason S
bananas2 wrote:
Was it really possible to open files (standalone identify and convert) with unicode filename under windows?
No; not by using 'convert' or 'identify', anyway. If you wrote your own program that calls OpenMagickStream() directly, then you could have done it by encoding the filename in UTF-8.
The patch does improve the behavior of 'convert', etc. But I don't know whether it breaks anything else. And it's sort of a step away from the full solution.
It occurred to me that the patch, by converting from "ANSI" to UTF-16 and then calling _wfopen, is probably doing exactly what fopen does. You could probably compile IM with MAGICKCORE_HAVE__WFOPEN undefined, and get the same result.
Somebody went to the trouble of writing the code in the "#if defined(MAGICKCORE_HAVE__WFOPEN)" sections, but then apparently didn't make the necessary changes elsewhere to make it useful. Strange.
UTF-8 probably should only be used only to convert label/meta-data commands
Admittedly, it would be hard to make everything work perfectly (what if a filename needs to be printed to the terminal?). But if you have to support Unicode filenames, storing them internally as UTF-8 make sense in this application, simply because all the other options are worse.
magick wrote:Consider coding up a wmain() that converts the argv to UTF8 which we can then pass to ImageMagick.
Although it was one of the things I suggested, I have growing concerns that this could open up a can of worms, and cause any number of subtle compatibility problems. I don't really know what to recommend. I may try it, but don't expect something that can be immediately released.
Re: [Windows] Can not open/read files with non-asci filename
Posted: 2011-08-10T10:46:33-07:00
by bananas2
i think this is simplest and less error prone (as long as linux version uses utf8 argv) way:
Code: Select all
//change to wmain(int argc, wchar_t **argv)
//FYI wmain is not supported by mingw
//another way szArglist = CommandLineToArgvW(GetCommandLineW(), &nArgs);
//but it seems to have some issues
int main(int argc,char **argv)
{
char
*metadata;
ExceptionInfo
*exception;
ImageInfo
*image_info;
MagickBooleanType
status;
//convert args UTF16->UTF8 using WideCharToMultiByte
//pass new args array (char) and let im do ConvertUTF8ToUTF16 as it was before
MagickCoreGenesis(*argv,MagickTrue);
exception=AcquireExceptionInfo();
image_info=AcquireImageInfo();
metadata=(char *) NULL;
status=MagickCommandGenesis(image_info,IdentifyImageCommand,argc,argv,
&metadata,exception);
if (metadata != (char *) NULL)
metadata=DestroyString(metadata);
image_info=DestroyImageInfo(image_info);
exception=DestroyExceptionInfo(exception);
MagickCoreTerminus();
return(status);
}
printing to console is also quite tricky, i've seen at least 3 ways of doing it
Re: [Windows] Can not open/read files with non-asci filename
Posted: 2011-08-12T07:34:20-07:00
by bananas2
@Jason, Magick, what do you think?
Re: [Windows] Can not open/read files with non-asci filename
Posted: 2011-08-12T19:57:33-07:00
by Jason S
bananas2 wrote:@Jason, Magick, what do you think?
What I think is that I'm not qualified to figure out what problems this might cause. I don't know enough about how IM handles character encodings, or about all the different platforms and configurations that need to be reviewed.
But I went ahead and tried it. Before the change, this is what happened:
Code: Select all
C:\prj\ImageMagick-6.7.0\VisualMagick\bin>.\convert.exe testΔ☺.jpg out.png
Magick: unable to open image `test??.jpg': Invalid argument @ error/blob.c/OpenBlob/2588.
Magick: missing an image filename `out.png' @ error/convert.c/ConvertImageCommand/3015.
Then I changed convert.c as follows:
Code: Select all
#define NEWSTUFF
#ifdef NEWSTUFF
int wmain(int argc, wchar_t **argvW)
#else
int main(int argc,char **argv)
#endif
{
ExceptionInfo
*exception;
ImageInfo
*image_info;
MagickBooleanType
status;
#ifdef NEWSTUFF
char **argv;
int i, len;
argv = (char**)AcquireMagickMemory(argc*sizeof(char*));
for (i=0;i<argc;i++) {
// Calculate number of bytes needed for this UTF-8 arg.
len = WideCharToMultiByte(CP_UTF8,0,argvW[i],-1,NULL,0,NULL,NULL);
// Allocate memory for the UTF-8 arg.
argv[i] = (char*)AcquireMagickMemory(len*sizeof(char));
// Convert arg to UTF-8.
WideCharToMultiByte(CP_UTF8,0,argvW[i],-1,argv[i],len,NULL,NULL);
}
#endif
MagickCoreGenesis(*argv,MagickTrue);
[...]
MagickCoreTerminus();
#ifdef NEWSTUFF
for (i=0;i<argc;i++) {
RelinquishMagickMemory((void*)argv[i]);
}
#endif
return(status);
}
And now here's what happens (on my computer):
Code: Select all
C:\prj\ImageMagick-6.7.0\VisualMagick\bin>.\convert.exe testΔ☺.jpg out.png
C:\prj\ImageMagick-6.7.0\VisualMagick\bin>
(It works.)
As expected, it causes cosmetic problems with terminal output. I know how to fix this in general, but I don't know how hard it would be in IM's case.
Code: Select all
C:\prj\ImageMagick-6.7.0\VisualMagick\bin>.\convert.exe notexistΔ☺.jpg out.png
Magick: unable to open image `notexistI"â~º.jpg': No such file or directory @ error/blob.c/OpenBlob/2588.
Magick: missing an image filename `out.png' @ error/convert.c/ConvertImageCommand/3015.
Re: [Windows] Can not open/read files with non-asci filename
Posted: 2011-08-13T08:10:40-07:00
by magick
We'll get your patch into ImageMagick 6.7.1-6 by sometime tomorrow. Thanks.