Page 1 of 1

Using unicode from a file with label:@

Posted: 2011-03-26T16:01:00-07:00
by el_supremo
I can't get IM to generate text from a file containing unicode characters referenced from label:@. If I use a file containing the equivalent UTF-8 string it works fine.
For example, I created utf-8 and unicode text files which in Times-New-Roman font contain four symbols: upper case greek PI, upper case greek sigma, upper half of the integral sign and the lower half of the integral sign.
The command using utf-8 is:

Code: Select all

convert -font Times-New-Roman -pointsize 36  "label:@times_utf.txt" times_utf.png
which creates this image: http://members.shaw.ca/el.supremo/times_utf.png
but the unicode command:

Code: Select all

convert -font Times-New-Roman -pointsize 36 -encoding Unicode "label:@times_unicode_hi.txt" times_uni_hi.png
only produces this: http://members.shaw.ca/el.supremo/times_uni_hi.png

I thought it might be a problem with little/big-endianness so I swapped each pair of bytes in the file to produce times_unicode_lo.txt:

Code: Select all

convert -font Times-New-Roman -pointsize 36 -encoding Unicode "label:@times_unicode_hi.txt" times_uni_lo.png
but this is no better: http://members.shaw.ca/el.supremo/times_uni_lo.png
The text files are (use right click to download):
times_utf.txt
times_unicode_hi.txt
times_unicode_lo.txt
How do I get IM to handle unicode in a file?

Pete

Re: Using unicode from a file with label:@

Posted: 2011-03-26T18:39:42-07:00
by fmw42
the text editor you used to create the file needs to be UTF compliant.

I downloaded your file and checked and it appears fine to me on my Mac in BBEdit.

http://www.fmwconcepts.com/misc_tests/times_utf.txt


Then I ran:

convert -font TimesNewRoman -pointsize 36 "label:@times_utf.txt" times_utf.png

BUT try moving the quotes as follows:

convert -font TimesNewRoman -pointsize 36 label:"@times_utf.txt" times_utf.png

OR remove the quotes:

convert -font TimesNewRoman -pointsize 36 label:@times_utf.txt times_utf3.png

Note I had to rename the font to match mine. This is the result. Is that what you expected?

Image

IM 6.6.8.8 Q16 Mac OSX Tiger

Re: Using unicode from a file with label:@

Posted: 2011-03-26T18:56:12-07:00
by el_supremo
Hi Fred,
The first command using UTF-8 works fine as it did for you, but the 2nd and 3rd commands using the unicode file don't work.

Pete
P.S. this is with Windows distribution of ImageMagick 6.6.8-6 2011-03-21 Q16 with OpenMP
and delegates: bzlib freetype jpeg jp2 lcms png tiff x11 xml wmf zlib

Re: Using unicode from a file with label:@

Posted: 2011-03-26T18:58:17-07:00
by fmw42
el_supremo wrote:Hi Fred,
The first command using UTF-8 works fine as it did for you, but the 2nd and 3rd commands using the unicode file don't work.

Pete

Are you saying it is the files or the -encoding Unicode?

Your other files are not special unicode characters and I am not familiar with -encoding Unicode in that context.


Fred

Re: Using unicode from a file with label:@

Posted: 2011-03-26T19:02:23-07:00
by fmw42
P.S. this is with Windows distribution of ImageMagick 6.6.8-6 2011-03-21 Q16 with OpenMP
and delegates: bzlib freetype jpeg jp2 lcms png tiff x11 xml wmf zlib
Don't you need fontconfig to use font names only rather than the whole path to the font? On the Mac you need it. But I don't know about the PC?

Fred

Re: Using unicode from a file with label:@

Posted: 2011-03-26T19:13:04-07:00
by el_supremo
Your output for the first example is the same as I get.
The files used in the 2nd and 3rd examples are unicode (fixed 16-bits per code) - not utf (which is a variable number number of bytes). So I had assumed that the "-encoding unicode" would make them work, but it doesn't.
I forgot to mention that if I create a string (e.g. $str) containing the unicode characters and then use "label:$str" that works. What fails is referencing a file containing those same unicode characters.
Don't you need fontconfig to use font names only rather than the whole path to the font?
No. On Windows if the font is installed on the system, it is always installed in the same place (C:\Windows\Fonts) so you can refer to them by name without worrying about the path.

Pete

Re: Using unicode from a file with label:@

Posted: 2011-03-26T19:38:16-07:00
by fmw42
The files used in the 2nd and 3rd examples are unicode (fixed 16-bits per code) - not utf
Sorry, I missed that. I thought they were all UTF. I have never used -encoding Unicode. So you may have to wait for Anthony to respond. He knows more about fonts than I.


I tried your second example and got results that were very strange.


Fred

Re: Using unicode from a file with label:@

Posted: 2011-03-27T00:35:17-07:00
by anthony
Well you asked for it Fred... :lol:

utf16 or (raw unicode) is not generally used. Basically as the file does not permit a mix on normal ASCII characters and unicode 'multi-byte' characters. in utf16 ALL characters are 'multi-byte' characters.

ImageMagick does not handle utf16 format text files (you either handle utf16 or ASCII with utf8, you can do both!).

The later two example files are raw UTF16 the 'lo' is correct ordering for utf16 the 'hi' is reversed byte- order
I do not know how to convert the 'hi' unicode file, to swap the byte order to make it proper utf16
But for the 'lo' utf16 to utf8, and feeding the string into ImageMagick...

Code: Select all

iconv -f utf16 -t utf8 times_unicode_lo.txt | convert -font Mincho -pointsize 36 label:@- result.png
The PNG file shows the characters perfectly fine. So it is working.


Looking at the unicode codes and looking up on unicode charts

Code: Select all

od -t x2 times_unicode_lo.txt
0000000 220f 0020 2211 0020 2320 0020 2321
The first character is 220F or N-ARY Product Sign
next is 0020 or UTF16 space (Ascii 20 hex)
then 2211 or N-ARY Summation Sign...

Etcetera.

You may like to try the STIXGeneral.otf font (yes IM understands the OpenType Font Format, at least under linux)
It is a standard font package on my system! Or perhaps use the VERY complete unicode font DejaVu-Sans-Book. From the DejaVu reader font set (package).

I found the math symbols from these fonts particularly good, and use them to generate the math symbols in IM Examples... See some of the extracted symbols on
http://www.imagemagick.org/Usage/draw/#symbol_alts
and more directly in
http://www.imagemagick.org/Usage/img_www/INDEX.html

Re: Using unicode from a file with label:@

Posted: 2011-03-27T09:50:29-07:00
by el_supremo
The PNG file shows the characters perfectly fine. So it is working.
That only demonstrates that iconv can translate unicode to utf-8 and that IM uses UTF-8.

Let me turn the question around. What does the convert option "-encoding" do? It implies that if I use "-encoding Unicode" then the following string is interpreted as unicode rather than utf-8. If that is so, then why doesn't the following command work?

Code: Select all

convert -font Times-New-Roman -pointsize 36 -encoding Unicode label:@times_unicode_lo.txt times_uni_lo.png
Pete

Re: Using unicode from a file with label:@

Posted: 2011-03-27T15:42:30-07:00
by anthony
el_supremo wrote:Let me turn the question around. What does the convert option "-encoding" do? It implies that if I use "-encoding Unicode" then the following string is interpreted as unicode rather than utf-8. If that is so, then why doesn't the following command work?
Actually I'm not certain - I have not looked at that option yet.

Hmmm all the option does is save it into the draw_info structure 'encoding' which means save it for some type of draw operation (like annotate) via a graphic_context. That eventually is passed to the FreeType library after converting it to a "ft_encoding_..." setting for that library. Only those listed in options are translated, otherwise a error is generated. This however does not mean others are not posible, FreeType may have added other 'CharMaps'
http://imagemagick.org/script/command-l ... p#encoding

Eventually a call to FT_Select_Charmap() is made.
It does seem to be the text encoding.. As specified in part 6 of
http://www.freetype.org/freetype2/docs/ ... step1.html

Unicode however appears to mean UTF-8

I do know GB2312 is the official encoding used for documents in the People's Republic of China, as I have had to translate such text files for my wife (from mainland china), even though I can,t read it myself.

I do not know what other options are posible, or if any of them are specific UTF-16 (big or little endian)