Page 1 of 1

Race condition in exception handling

Posted: 2009-03-19T05:08:18-07:00
by egb
I've seen a lot of failed asserts in a multithreaded environment like this one:
  • assertion=0xf797f128 "list_info != (LinkedListInfo *) ((void *)0)",
    file=0xf797f115 "magick/hashmap.c", line=1972,
    function=0xf797f1f8 "ResetLinkedListIterator"
which happen during function calls like this one:

Code: Select all

#1  0xf78d3e86 in ResetLinkedListIterator (list_info=0x0)
    at magick/hashmap.c:1972
#2  0xf78bf85c in InheritException (exception=0xd84fff04, relative=0xb17144c)
    at magick/exception.c:599
#3  0xf78db082 in CloneImage (image=0xb16e290, columns=122, rows=152, 
    orphan=MagickTrue, exception=0xe3efd670) at magick/image.c:1100
#4  0xf78a9f0a in DespeckleImage (image=0xb16e290, exception=0xe3efd670)
    at magick/effect.c:1429
#5  0xf7ad2fc0 in Magick::Image::despeckle () from /usr/lib/libMagick++.so.2
My opinion is that the locking of the ExceptionInfo during the InheritException call is insufficient:
It only locks the semaphore for the destination ExceptionInfo, but does nothing to prevent that the source ExceptionInfo is deleted during the run.
But since the invalid ExceptionInfo is a member of the source image struct, the problem could as well be that the source image is deleted during the "despeckle" run.
Also possible: the function invoked just before the despeckle (Zoom or Scale) somehow destroyed the image, and the next CloneImage call is not guilty at all, it just happens to notice it first.

In the backtrace above, the souce image object is only used inside one thread, and can't be concurrently accessed by a different one.

Unfortunatly, I couldn't reproduce this crash with a small test programm only executing the despeckle-method, so I don't have more detailed information. Except maybe that at the moment of the crash two other threads were also inside "Despeckle", but were working on different Images.

Re: Race condition in exception handling

Posted: 2009-03-19T07:31:53-07:00
by magick
This one has us perplexed. The exception list should be valid except if one thread destroys an image why another thread is accessing it. We'll investigate and see if we can spot the problem.

Re: Race condition in exception handling

Posted: 2009-03-20T16:52:24-07:00
by egb
Possibly related:
When using helgrind on a threaded program using imagemagick, it finds an unlocked read in DeleteNodeByValueFromSplayTree:

Code: Select all

==11048== Possible data race during read of size 4 at 0x88a0368 by thread #4
==11048==    at 0x60F2D57: DeleteNodeByValueFromSplayTree (splay-tree.c:525)
==11048==    by 0x5FFF8E6: DestroyPixelCacheInfo (cache.c:1499)
==11048==    by 0x5FFFBDD: DestroyPixelCache (cache.c:1414)
==11048==    by 0x5FFEDC8: DestroyImagePixels (cache.c:1381)
==11048==    by 0x6085D5A: DestroyImage (image.c:1623)
==11048==    by 0x60E1109: ResizeImage (resize.c:2237)
==11048==    by 0x60E1364: ZoomImage (resize.c:2985)
==11048==  This conflicts with a previous write of size 4 by thread #27
==11048==    at 0x60F3050: AddValueToSplayTree (splay-tree.c:211)
==11048==    by 0x5FFFF19: AcquirePixelCacheInfo (cache.c:258)
==11048==    by 0x6005D00: GetImagePixelCache (cache.c:2093)
==11048==    by 0x6007EAF: QueueCacheViewAuthenticPixels (cache-view.c:836)
==11048==    by 0x60DD162: VerticalFilter (resize.c:2023)
==11048==    by 0x60E11C7: ResizeImage (resize.c:2229)
==11048==    by 0x60E1364: ZoomImage (resize.c:2985)
But thats only the check for splay_tree->root == NULL, and probably harmless.

And another one here:

Code: Select all

==11048== Possible data race during read of size 4 at 0x88e2040 by thread #3
==11048==    at 0x6005CA7: GetImagePixelCache (cache.c:2078)
==11048==    by 0x6007EAF: QueueCacheViewAuthenticPixels (cache-view.c:836)
==11048==    by 0x60E0243: HorizontalFilter (resize.c:1782)
==11048==    by 0x60E11FA: ResizeImage (resize.c:2231)
==11048==    by 0x60E1364: ZoomImage (resize.c:2985)
==11048==  This conflicts with a previous write of size 4 by thread #25
==11048==    at 0x6005D74: GetImagePixelCache (cache.c:2105)
==11048==    by 0x6007EAF: QueueCacheViewAuthenticPixels (cache-view.c:836)
==11048==    by 0x60DD162: VerticalFilter (resize.c:2023)
==11048==    by 0x60E11C7: ResizeImage (resize.c:2229)
==11048==    by 0x60E1364: ZoomImage (resize.c:2985)
Here, the cache_info->reference_count is read without lock on cache_info->semaphore, and it looks like two threads could enter with cache_info->reference_count==1, and both skip the Copy-on-Write part.

Maybe a few asserts on the reference_count variable could help to catch this sort of problem earlier if it really exist and isn't just a false report by helgrind.
For Example, GetImagePixelCache could assert (image->cache->reference_count==1) before leaving the lock and returning.
And ReferencePixelCache could assert (cache_info->reference_count!=0) before incrementing it.

Re: Race condition in exception handling

Posted: 2009-03-20T17:26:42-07:00
by magick
Good catch. We'll have a patch by sometime tomorrow.

Re: Race condition in exception handling

Posted: 2009-03-21T16:32:03-07:00
by Armani
during my tests I git this helgrind output, that could belong to the assertion Ernst already reported.

I think there is a lock missing in GetNextValueInLinkedList
  • ==6135== Possible data race during write of size 4 at 0x6e8e9a0 by thread #21
    ==6135== at 0x5F3A593: ResetLinkedListIterator (hashmap.c:1977)
    ==6135== by 0x5F25FA2: InheritException (exception.c:605)
    ==6135== by 0x5F417D1: CloneImage (image.c:1100)
    ==6135== by 0x5F9A1CD: ScaleImage (resize.c:2485)
    ==6135== by 0x5BE7AD3: Magick::Image::scale(Magick::Geometry const&) (in /usr/lib/libMagick++.so.2.0.0)
    ==6135== by 0x80493D1: testthread(void*) (in /root/magick_test/test)
    ==6135== by 0xB5: ???
    ==6135== This conflicts with a previous read of size 4 by thread #4
    ==6135== at 0x5F3A9A9: GetNextValueInLinkedList (hashmap.c:635)
    ==6135== by 0x5F25FAD: InheritException (exception.c:606)
    ==6135== by 0x5F417D1: CloneImage (image.c:1100)
    ==6135== by 0x5F9A1CD: ScaleImage (resize.c:2485)
    ==6135== by 0x5BE7AD3: Magick::Image::scale(Magick::Geometry const&) (in /usr/lib/libMagick++.so.2.0.0)
    ==6135== by 0x80493D1: testthread(void*) (in /root/magick_test/test)
    ==6135== by 0xAE: ???

Re: Race condition in exception handling

Posted: 2009-03-21T17:25:34-07:00
by magick
We have a patch for the problem you reported. Look for it in the ImageMAgick 6.5.0-4 release later on this evening. Thanks.