monit-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Handle multiple branches of the dependency tree


From: Yiwen Jiang
Subject: Handle multiple branches of the dependency tree
Date: Mon, 18 Apr 2005 12:43:43 -0400

Hi there,

I have a suspicion that v4.4 of monit does not support multiple branches in the dependency tree. I am wondering if anyone on the list can verify my suspicion.

Say I have my dependency tree as follows,
E->D->C->B->A
F->H->I->B->A
G->A

Where A is the root most process, and if A dies, E, F, and G will all die.  Depends on the relative location of E, F, and G in the servicelist (the linked list that validate.c::validate() traverses in determine process statuses), different processes will be started/restarted, when A fails.

If E is at the beginning of the servicelist (relative to F, and G), because it has no dependents (i.e. no stop of dependents is required), E will be started (line 156 of control.c). Function do_start() recursively parses through the dependent tree of (E->D->C->B->A) to start A, B, C, D, E in that order. However, branch processes I, H, F, G that also depends on A, which in theory needs to be restarted, are not touched by monit.

Similarly, if F is at the beginning of the servicelist (relative to E and G), the processes that will be restarted are: A, B, I, H, F.

This is because servicelist is built with the least dependent process at the beginning of the list. When validate() checks for the processes, the least depending (leaf) process is checked first. Upon failure detection, monit tries to start it. During do_start(), it was determined that actually it was A that failed. Monit starts A. However, due to the recursive nature of do_start(), only the downed processes on the branch that led to the detection of A being down are started.

I emailed part of this findings to the mailing list on March 14th. Now, I have implemented a test case as Jan suggested in the response, with the most depending services first in the list.

I have tested my chages against a complex, branched dependency tree that I am currently using. And it looked like that the root most process (say process A) that was down was detected first. Based on the existing monit logic, monit stopped all the dependent processes of process A, and restart the dependency tree from the pross A and down, including all branches of the tree.

The code that I have changed is quite limited. Instead of using a single linked list, a double linked list is used for the servicelist data structure. Function validate() will use the servicelist_backward and prev pointer to check for process status. This means that the root most processes are checked before the leaf processes.

With this code change, it allows monit to support a wider scenario of cases (including the case that some dependent processes may exit once they detect the depending process has exited).  I do not know if this change affects any other scenarios that monit covers, but it covers all scenarios that I can think of in my particular case. 

My questions are:
        1. Do you think that the case I mentioned above is a valid scenario that monit should support?
        2. If so, do you have a more generic test driver where a regression test can be done for this set of changes? 
        3. Would you be interested to incoorporate this change into the monit code that you maintain? 

Thanks very much for your help.

Here are the changes (affecting three files):

1. p.y
$ diff monit-4.4/p.y monitchanged/p.y
2712c2712
<     for(d= depend_list; d; d= d->next_depend)
---
>     for(d= depend_list; d; d= d->next_depend) {
2714c2714,2719
<    
---
>       if (d->next_depend != NULL) {
>         d->next_depend->prev = d;
>         servicelist_backward = d->next_depend;
>       }
>     }
>

2. validate.c
$ diff monit-4.4/validate.c monitchanged/validate.c
141c141
<   for(s= servicelist; s; s= s->next) {
---
>   for(s= servicelist_backward; s; s= s->prev) {

3. monitor.h
$ diff monit-4.4/monitor.h monitchanged/monitor.h
617a618
>   struct myservice *prev;                         /**< prev service in chain */
631a633
> Service_T servicelist_backward;                /**< The service list (created in p.y) */



Cheers,
Yiwen

--
Yiwen Jiang
Nortel Networks
E-mail: address@hidden
Phone: (613) 763-4286
ESN: 393-4286
 


> -----Original Message-----
> From: Jiang, Yiwen [CAR:9D10:EXCH]
> Sent: Wednesday, March 23, 2005 12:06 PM
> To: 'This is the general mailing list for monit'
> Cc: 'address@hidden'
> Subject: RE: Question on the dependency of processes
>
>
> Hi there,
>
> Sorry for the delayed response.... Busy with product delivery dates.
>
> > On Mar 14, 2005, at 15:38, Yiwen Jiang wrote:
> >
> > > I am not sure if this is the proper news group that I
> > should post this
> > > question to, as there are monit implementation questions in this
> > > email...
> >
> > You should really take implementation issues to the monit-developer
> > list. But..
> K. This message is an attempt to bridge this topic over to
> the dev group.

> > > What I have found was that the order in the monitrc file for
> > > monitoring these proceeses generate different
> 'servicelist' content
> > > (in the source code). For example, the content of
> > servicelist (when in
> > > validate.c::validate() to check for zombie processes) is
> > different if
> > > the processes are listed in reverse order in the monitrc file.
> > >
> > > For example, say I have a service dependency tree like:
> > > E->D->C->B->A
> > > F->D->C->B->A
> > > G->A
> > > Where as A is the 'root of the tree.
> > >
> > >  In my monitrc file, I have 'check process' in the
> > following order: E,
> > > F, D, C, B, G, A.
> > >
> > >  If I turn debug on using -v option, the checks on the zombie
> > > processes are in the order of: G, F, E, D, C, B, A
> > >
> > >  If I reverse the order in the monitrc file, and restart
> monit using
> > > -v option, the checks on the zombie processes are in the
> > order of: E,
> > > F, D, C, B, G, A. This is in a different result than the
> > previous one.
> >
> > The list is initially built during parsing and reshuffled
> afterwards
> > if dependencies are present. Because of this the final list may look
> > different if you change the order of the service entries.
> > Note however
> > that in both cases the reshuffling is done so the leaf nodes
> > are first
> > in the list.
> >
> > > I went through the code, and noticed that the 'servicelist' is
> > > actually re-organized based on the dependencies after the
> > > configuration file is parsed.. However, the result yield the most
> > > visited process to be the last on the servicelist.
> > >
> > >  I don't quite understand why the the most visited
> process is not at
> > > the beginning of the list. If my understanding is
> correct, validate
> > > goes through the servicelist, to check process status every poll
> > > interval. If we think of a scenario where because process A
> > crashed,
> > > process G exited. The current behaviour will result in G being
> > > restarted before A, despite the dependency.
> >
> > Hmm you have a point there, although the end result should
> be the same
> > it seems that you got one unnecessary restart of G. Have
> you verified
> > that this is the case? Browsing the code it does indeed
> look that way.
>
> Well, it is not the unnecessary restart of G that I'm
> concerned about (actually, I didn't notice that one at all).
> You see, the product I am working on is heavily depends on
> the process dependencies and start up order. In this product,
> if G detects A is down, it will exit as well. The way monit
> works, if I understand correctly, is that it will detect G
> being down first, and restart G. I have observed that the
> startup time dramatically increases when G is started before
> A, even though it is A that crashed.  Is this the expected behaviour?
>
>
> > >  Would it not make more sense to have the servicelist
> > constructed the
> > > other way where the most dependent process be the first
> > process on the
> > > servicelist?
> > >
> > > Because of the dependencies between these processes, it
> really only
> > > make sense to me if monit would check for the 'root'
> > process first. Or
> > > am I mis-using monit?
> >
> > I don't remember why we ended up having the service list with
> > the least
> > depending services first. It may be other scenarios that
> justify this
> > design, although no one comes to mind right now. Could you
> > implement a
> > test case with the most depending services first in the list
> > and verify
> > that dependencies continue to work as described in the monit
> > manual? If
> This will take some time, due to the way my development
> environment works, plus work schedule. I will try though.
>
> > it does, we'll certainly reverse the service list and
> accept a patch
> > from you or fix it ourself.
> What would happen if the test case fails? Shouldn't monit
> behaves 'properly' (i.e. start the dead process that is
> closest to the trunk of the dependency tree first)?
>
> Thank you VERY MUCH for your help.

> > --
> > Jan-Henrik Haukeland
> > Mobil +47 97141255
> >
> >
> >
> > --
> > To unsubscribe:
> http://lists.nongnu.org/mailman/listinfo/monit> -general
> >
> >
>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]