Page 1 of 1
Annotate with utf-8 problem
Posted: 2007-11-23T14:02:54-07:00
by nitech
Hi,
I am trying to annotate an image with utf-8 text (russian) from an external text file. Problem is, I get a leading question mark (?) before the text.
Any idea why this happens? Any fix? My code (vbscript) is as follows (the writeUnicodeADODB writes a text file in UTF-8 format and then returns the path to the file in the following format: @c:\test.txt)
Code: Select all
strResult = img.Convert( _
"-size" , "200x200", _
"-font" , "Arial-Bold", _
"-pointsize" , "12", _
"-fill" , "#B6B6B6", _
"-annotate" , "0x0+25+18" ,writeUnicodeADODB(sText, strDefaultPath & "top_" & sFilename), _
"-trim", _
strDefaultPath & "template.png", _
strDefaultPath & "top_" & sFilename)
Re: Annotate with utf-8 problem
Posted: 2007-11-24T06:00:16-07:00
by anthony
Could you have some extra character in the UFT file. Some UFT files has a special prefix that may cause this, or prehaps a TAB. Control characters are known not to be handled well by the font drawing library.
Re: Annotate with utf-8 problem
Posted: 2007-11-25T23:52:59-07:00
by nitech
Hi Anthony, and thanks for your reply.
I thought the same as you, but I can't seem to confirm it. I should of course have provided a link to the input text file that I was using. Here goes:
http://www.avento.as/devold/text_images ... 83.png.txt
By the way, when I create a new UTF-8 text file from notepad, and run it as an input to the annotate command, the same problem occur. Like this example:
http://www.avento.as/devold/text_images/russian.txt
I thought this had something to do with the 8-bit versus 16-bit version of ImageMagick, so I installed the newest 8-bit installer. It did however not seem to have an effect.
Re: Annotate with utf-8 problem
Posted: 2007-11-26T17:24:20-07:00
by el_supremo
I did a hex dump of your text file and it starts with the three character sequence ef bb bf.
From the wikipedia entry for UTF-8:
Although not part of the standard, many Windows programs (including Windows Notepad) use the byte sequence EF BB BF at the beginning of a file to indicate that the file is encoded using UTF-8. This is the Byte Order Mark U+FEFF encoded in UTF-8
It would appear that Imagemagick does not recognize, and ignore, this sequence.
If that sequence is removed from the file, IM generates the correct annotation.
Pete
Re: Annotate with utf-8 problem
Posted: 2007-11-26T19:20:57-07:00
by anthony
Thanks 'el-supremo. I noticed the sequence but did not get the chance to analize it before you respoded. Seems to be a problem with the freetype library that some control characters and its handling of 'bad UTF charcater sequences' is just not done very well at all.
TAB characters is a case in point, this text just has a simular mis-handled sequence. At least however it did something more constructive (print a question mark). Most UTF code displys either ignore it completely, whcih mean you never know there was a problem with the input.
Re: Annotate with utf-8 problem
Posted: 2007-11-27T00:40:03-07:00
by nitech
It's impressive to see what you knowledgeable people find out.
I use the ADODB.Stream object to write the file. I guess it won't let me create it without the Byte Order Mark. I also guess this problem must be relevant to most languages that use Unicode encoding.
I know this is not a vbScript support forum, but still, you don't happen to know how I can save the file without the Byte Order Mark? My code as for today is something like:
Code: Select all
Function writeUnicodeADODB(txtInput,filePath)
Dim objStream
Set objStream = CreateObject("ADODB.Stream")
objStream.Position = 0
objStream.Charset = "UTF-8"
objStream.WriteText txtInput
objStream.SaveToFile filePath
writeUnicodeADODB = "@" & filePath
End Function
Regards,
nitech
Re: Annotate with utf-8 problem
Posted: 2007-11-27T02:14:52-07:00
by nitech
I found a way to remove the BOM (or - at least the question mark disappeared when I did so.) Here is the vbscript code:
Code: Select all
' Input is Unicode text and filePath is path to the image file we wish to create. First we create a utf-8 file
' named the same as the image file and then we use the utf-8 file as an input to when creating the image file.
Function writeUnicodeADODB(txtInput,filePath)
' Create and open stream
Dim objStream
Set objStream = CreateObject("ADODB.Stream")
objStream.Open
'Reset the position and indicate the charactor encoding
objStream.Position = 0
objStream.Charset = "UTF-8"
'Write to the steam
objStream.WriteText txtInput
'Save the stream to a file
filePath = filePath & ".txt"
objStream.SaveToFile filePath, 2 ' overwrite if exists
' Return filepath with an @ so that imagemagick understands that it's a file
writeUnicodeADODB = "@" & RemoveBOM(filePath)
' Kill stream
Set objStream = Nothing
End Function
' Removes the Byte Order Mark - BOM from a text file with UTF-8 encoding
' The BOM defines that the file was stored with an UTF-8 encoding.
Public function RemoveBOM(filePath)
' Create a reader and a writer
Dim writer,reader, fileSize
Set writer = CreateObject("Adodb.Stream")
Set reader = CreateObject("Adodb.Stream")
' Load from the text file we just wrote
reader.Open
reader.LoadFromFile filePath
' Copy all data from reader to writer, except the BOM
writer.Mode=3
writer.Type=1
writer.Open
reader.position=5
reader.copyto writer,-1
' Overwrite file
writer.SaveToFile filePath,2
' Return file name
RemoveBOM = filePath
' Kill objects
Set writer = Nothing
Set reader = Nothing
end function
As you can see, I first create the text file, based on the input, and I also set the character set to UTF-8. Then, before returning the file path to imagemagick, I run the file through RemoveBOM(filePath). the RemoveBOM function reads the text file, sets it's position to 5 and then copies everything from position five to another stream, which I again save by overwriting the text file we just read.
For any other that read this post, you will now see that my previously linked graphics file now display correctly:
The code I would use in vbscript to utilize these functions would be:
Code: Select all
strResult = img.Convert( _
"-size", "200x200", _
"-font", "Arial-Bold", _
"-pointsize", "12", _
"-fill", "#B6B6B6", _
"-annotate", "0x0+25+18", writeUnicodeADODB(UCase(sText), strDefaultPath & "top_" & sFilename), _
"-trim", _
strDefaultPath & "template.png", _
strDefaultPath & "top_" & sFilename)
Thanks for your help!
Kind regards,
nitech