Bug in handling of RLE compressed bitmaps
Posted: 2010-06-08T11:18:56-07:00
This happens when 8bit palette BMPs with RLE compression are created. In general, the format of the RLE compression looks odd - and erroneous pixels can pop up if you have image widths of 1 or 2.
This should create a RLE compressed 8bit BMP in all ImageMagick versions.
The pixel values of gradient.gif:
# ImageMagick pixel enumeration: 1,256,255,rgb
0,0: (255,255,255) #FFFFFF white
0,1: (254,254,254) #FEFEFE rgb(254,254,254)
0,2: (253,253,253) #FDFDFD rgb(253,253,253)
0,3: (252,252,252) #FCFCFC grey99
0,4: (251,251,251) #FBFBFB rgb(251,251,251)
0,5: (250,250,250) #FAFAFA grey98
...
The pixel values of gradient.bmp (what gets read by IM):
# ImageMagick pixel enumeration: 1,256,255,rgb
0,0: (255,255,255) #FFFFFF white
0,1: (255,255,255) #FFFFFF white
0,2: (255,255,255) #FFFFFF white
0,3: (252,252,252) #FCFCFC grey99
0,4: (251,251,251) #FBFBFB rgb(251,251,251)
0,5: (250,250,250) #FAFAFA grey98
...
An attempt for a more detailed description:
The last few bytes of the BMP file (the bitmap data of the five topmost pixels) could look like this (with the help of the Microsoft documentation here):
01 04 00 00 01 03 00 00 01 02 00 00 01 01 00 00 01 00 00 01
But ImageMagick stores it like that:
01 04 03 00 00 00 01 03 03 00 00 00 01 02 03 00 00 00 01 01 03 00 00 00 04 00 00 00 00 01
Perhaps IM first computes the whole bitmap data as if uncompressed 8bit values were needed. Normally the scanlines have to be aligned at four bytes - but I think this only applies to uncompressed data. Now, IM is run-length-encoding not only the color indices, but also the padding bytes.(?)
The real color value of the fifth highest scanline here is #FBFBFB. The uncompressed bitmap data would be '04 00 00 00' - 04 is the index for the color of that pixel, followed by three padding bytes. That's probably the reason why IM encodes that as '01 04 03 00' (followed by '00 00' for 'end of line'). The topmost scanline even gets encoded as '04 00' (because the uncompressed bitmap data would be '00 00 00 00' - the first 00 byte is the color index for #FFFFFF, again with three padding bytes).
Now to the reading part: I think, most programs just scrap the bitmap data that would be outside of the real image dimensions. So, most image viewers can read the BMPs - even with this strange RLE compression. But ImageMagick itself can have problems when it has to read its own run-length-encoding.
Let's start again at the fifth highest scanline. '01 04 03 00' could get read as: "one time index 04 and three times index 00". As the fifth highest scanline has only width 1 (applies to all scanlines in this example of course), the "three times index 00" gets interpreted as the index values of the pixels above that scanline. OK, but at this point it's not a problem yet. ImageMagick knows that it needs data for the remaining 4 scanlines. It has some (wrong) data for the next three pixels, but this isn't enough to fill the image, so it has to read more of the compressed bitmap data. The next bytes are '00 00' ('end of line' of the fifth highest scanline), and then '01 03 03 00'. Here, the data from the scanline below ("three times index 00") gets overwritten with "one time index 03 and three times index 00". So, the pixel value of the fourth highest scanline is again fine ("one time index 03" -> it's the index for #FCFCFC here). Now the problem arises: ImageMagick thinks that at this point, it has data for the next three pixels ("and three times index 00"). There are only 3 scanlines missing (with width 1). This data is enough to fill the image and ImageMagick probably doesn't read more of the compressed bitmap data. So, these scanlines get the value of the index 00 (which is the color #FFFFFF=white).
Some of this description is surely not spot-on.. But it's the most probable explanation of the errors in some compressed BMPs. As a result of these run-length-encoded padding bytes, with image width 1, the 3 topmost pixels can be lost. With image width 2, the 2 topmost pixels can be lost.
The padding bytes shouldn't be used at all - then the problem would be gone. (The only case within RLE in BMPs where alignment is needed seems to be absolute mode. There, the indices has to be aligned at WORD boundaries. But I don't think that ImageMagick uses this uncompressed absolute mode within RLE compression.)
By the way: I also think that the default 8bit palette BMP output should be uncompressed. Sometimes the compressed files are bigger than the same uncompressed files, so it's not very efficient anyway. (One could still write '-compress rle' if one wants this type of compression.)
Code: Select all
convert -size 1x256 gradient: gradient.gif
convert gradient.gif gradient.bmp
The pixel values of gradient.gif:
# ImageMagick pixel enumeration: 1,256,255,rgb
0,0: (255,255,255) #FFFFFF white
0,1: (254,254,254) #FEFEFE rgb(254,254,254)
0,2: (253,253,253) #FDFDFD rgb(253,253,253)
0,3: (252,252,252) #FCFCFC grey99
0,4: (251,251,251) #FBFBFB rgb(251,251,251)
0,5: (250,250,250) #FAFAFA grey98
...
The pixel values of gradient.bmp (what gets read by IM):
# ImageMagick pixel enumeration: 1,256,255,rgb
0,0: (255,255,255) #FFFFFF white
0,1: (255,255,255) #FFFFFF white
0,2: (255,255,255) #FFFFFF white
0,3: (252,252,252) #FCFCFC grey99
0,4: (251,251,251) #FBFBFB rgb(251,251,251)
0,5: (250,250,250) #FAFAFA grey98
...
An attempt for a more detailed description:
The last few bytes of the BMP file (the bitmap data of the five topmost pixels) could look like this (with the help of the Microsoft documentation here):
01 04 00 00 01 03 00 00 01 02 00 00 01 01 00 00 01 00 00 01
But ImageMagick stores it like that:
01 04 03 00 00 00 01 03 03 00 00 00 01 02 03 00 00 00 01 01 03 00 00 00 04 00 00 00 00 01
Perhaps IM first computes the whole bitmap data as if uncompressed 8bit values were needed. Normally the scanlines have to be aligned at four bytes - but I think this only applies to uncompressed data. Now, IM is run-length-encoding not only the color indices, but also the padding bytes.(?)
The real color value of the fifth highest scanline here is #FBFBFB. The uncompressed bitmap data would be '04 00 00 00' - 04 is the index for the color of that pixel, followed by three padding bytes. That's probably the reason why IM encodes that as '01 04 03 00' (followed by '00 00' for 'end of line'). The topmost scanline even gets encoded as '04 00' (because the uncompressed bitmap data would be '00 00 00 00' - the first 00 byte is the color index for #FFFFFF, again with three padding bytes).
Now to the reading part: I think, most programs just scrap the bitmap data that would be outside of the real image dimensions. So, most image viewers can read the BMPs - even with this strange RLE compression. But ImageMagick itself can have problems when it has to read its own run-length-encoding.
Let's start again at the fifth highest scanline. '01 04 03 00' could get read as: "one time index 04 and three times index 00". As the fifth highest scanline has only width 1 (applies to all scanlines in this example of course), the "three times index 00" gets interpreted as the index values of the pixels above that scanline. OK, but at this point it's not a problem yet. ImageMagick knows that it needs data for the remaining 4 scanlines. It has some (wrong) data for the next three pixels, but this isn't enough to fill the image, so it has to read more of the compressed bitmap data. The next bytes are '00 00' ('end of line' of the fifth highest scanline), and then '01 03 03 00'. Here, the data from the scanline below ("three times index 00") gets overwritten with "one time index 03 and three times index 00". So, the pixel value of the fourth highest scanline is again fine ("one time index 03" -> it's the index for #FCFCFC here). Now the problem arises: ImageMagick thinks that at this point, it has data for the next three pixels ("and three times index 00"). There are only 3 scanlines missing (with width 1). This data is enough to fill the image and ImageMagick probably doesn't read more of the compressed bitmap data. So, these scanlines get the value of the index 00 (which is the color #FFFFFF=white).
Some of this description is surely not spot-on.. But it's the most probable explanation of the errors in some compressed BMPs. As a result of these run-length-encoded padding bytes, with image width 1, the 3 topmost pixels can be lost. With image width 2, the 2 topmost pixels can be lost.
The padding bytes shouldn't be used at all - then the problem would be gone. (The only case within RLE in BMPs where alignment is needed seems to be absolute mode. There, the indices has to be aligned at WORD boundaries. But I don't think that ImageMagick uses this uncompressed absolute mode within RLE compression.)
By the way: I also think that the default 8bit palette BMP output should be uncompressed. Sometimes the compressed files are bigger than the same uncompressed files, so it's not very efficient anyway. (One could still write '-compress rle' if one wants this type of compression.)