monit-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: SIGSEGV problem


From: Jan-Henrik Haukeland
Subject: Re: SIGSEGV problem
Date: Thu, 14 Aug 2003 02:11:50 +0200
User-agent: Gnus/5.1002 (Gnus v5.10.2) XEmacs/21.4 (Civil Service, linux)

I ran a fast test with efence and managed to reproduce the SIGSEGV (it
may be more). SIGSEGV is thrown in process/common.c:connectchild()
from this line:

  parent->children[parent->children_num - 1] = (struct myprocesstree *) child;


>From my gdb/efence session:

  Program received signal SIGSEGV, Segmentation fault.
  [Switching to Thread 1024 (LWP 1269)]
  0x0805b340 in connectchild (parent=0x41143fa0, child=0x41144740)
      at process/common.c:232
  
  (gdb) p *parent->children
  Cannot access memory at address 0x41365fcc
  
  (gdb) p parent->children[parent->children_num - 1]
  Cannot access memory at address 0x41365ffc

I suspect it's caused by trying to access something outside the
array. Maybe Christian can debug this since it's his code :) I'm of to
bed, it's late.

BTW, Christian why do you use cast in this code!? I'm thinking about
stuff like (struct myprocesstree ..) It is *not* necessary.


Martin Pala <address@hidden> writes:

> Another SIGSEGV occures  if monitored process has pidfile, but process
> with given pid doesn't exist. It fails in the same thread as described
> bellow, but later. See another trace.
>
> Martin
>
> Martin Pala wrote:
>
>> Hi,
>>
>> during tests occasionaly one of monit threads receives SIGSEGV. It
>> happens in the case, that the monitored process is not running.
>>
>> the path to the SIGSEGV is as follows:
>>
>> 1.) we're waiting for process to start in main thread after spawning
>> process start method:
>> ...
>> thread is status= pthread_create(&thread, NULL, wait_start, s);
>> ...
>>
>> 2.) in new thread (in wait_start) we detach and looking for process
>> to start or timeout:
>> ...
>> if(is_process_running(s))
>> break;
>> ...
>>
>> 3.) the thread crashed rigth after first call of
>> is_process_running(s) -> get_pid(s->path) -> exist_file(char *file)
>> -> stat(file, &buf) -
>> see strace output for complete trace:
>> ...
>> 3788 stat64("XE^^G^H/run/slapd.pid", <unfinished ...>
>> ...
>>
>>
>> As you can see, it seems that the pidfile path pointer points to
>> strange place.
>>
>> This problem happens very occasionaly (cca 5% of test attempts
>> failed on this error - others were OK).
>>
>> I tried to include some debug tags to trace it - something like
>> fprintf(stderr, "mark1"); etc., but as soon as i did it, i was not
>> able to replicate the problem at all. I tied it many times again
>> with and without these tags and the result was the same - with tags
>> it worked well, without tags it failed => probably there is some
>> race condition, maybe outside monit (in libs).
>>
>> Any ideas?
>>
>> Martin
>>
>>
>>
>>
>>------------------------------------------------------------------------
>>
>>_______________________________________________
>>monit-dev mailing list
>>address@hidden
>>http://mail.nongnu.org/mailman/listinfo/monit-dev
>>
>
>
> _______________________________________________
> monit-dev mailing list
> address@hidden
> http://mail.nongnu.org/mailman/listinfo/monit-dev

-- 
Jan-Henrik Haukeland




reply via email to

[Prev in Thread] Current Thread [Next in Thread]