guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

shepherd respawn frequency


From: Attila Lendvai
Subject: shepherd respawn frequency
Date: Sat, 27 May 2023 08:34:51 +0000

dear guix,

the issue at hand:

i have a daemon that simply quits when some of its running condition is not 
satisfied. this can be dependent on unpredictable external factors, like the 
temporary unreachability of a remote service.

shepherd respawns it immediately in RESPAWN-SERVICE, without any delay, which 
leads to a kind of a busy loop (i noticed this through the fan noise of the 
machine). i know that there's a stopgap measure to disable such services, but:

  1) some of these daemons struggle long enough before quitting that
     they do not trigger the default RESPAWN-LIMIT-HIT? stopgap
     measure

  2) i *do* want shepherd to keep restarting them indefinitely, but
     not immediately after their premature exit

proposed solution:

would the shepherd maintaners (looking at you Ludo :) accept a change that 
introduces a new field into <service> called RESPAWN-DELAY, and issue a fiber 
sleep in RESPAWN-SERVICE when it is not #false, and the daemon process quits 
unexpectedly?

in an initial commit i'd also turn the global variable called RESPAWN-LIMIT 
into a field of <service>, and make it take its default value from a properly 
named %RESPAWN-LIMIT global variable.

open questions:

 - what should be the default value of the respawn delay? i suggest 5
   seconds, and i'd argue against it being disabled by default:

    - premature exits happen more frequently at startup than in an
      already running process

    - an unwanted default respawn delay causes less headache than an
      unwanted busy loop.

 - if the respawn delay is set, then should respawn-limit be ignored?
   IOW, should the logic treat them as two independent variables, or
   should it not? and should there be some logic in how/where they
   take their defaults from?

   my pick: treat them as two independent variables, but when the user
   explicitly specifies a respawn delay for the service object, then
   there shouldn't be any respawn limit, unless the user also
   explicitly specifies it on the <service> object.

   corollary: the handling of defaults should be implemented so that
   the fields of <service> hold #false as default value, in which
   case the logic takes the default value from a global variable in
   shepherd.

 - should i bother with detecting a first respawn in a given past
   period (of e.g. 1 minute?), and do not apply any delay when this is
   the first respawn in that time window? this adds extra complexity,
   which may not be worth it. i'd go with a pass here.

 - after a cursory look, i don't understand the relationship between
   RESPAWNS and FAILURES. the former seems to be an endlessly growing
   list of timestamps, while the latter is a ring buffer of
   timestamps. it's not crucial for me to understand it, but i wonder
   if there's a bug lurking there that eats up the heap when a service
   keeps respawning without any delay?

i'm all ears for suggestions, and i'm also happy to hand over the 
implementation to someone else, who already had plans to do it, and knows the 
internals of shepherd better than me.

-- 
• attila lendvai
• PGP: 963F 5D5F 45C7 DFCD 0A39
--
“Looking back on a 30-year teaching career full of rewards and prizes, somehow 
I can't completely believe that I spent my time on earth institutionalized; I 
can't believe that centralized schooling is allowed to exist at all as a 
gigantic indoctrination and sorting machine, robbing people of their children. 
Did it really happen? Was this my life? God help me.”
        — John Taylor Gatto (1935–2018), Teacher of the Year, both in New York 
City and State, multiple times




reply via email to

[Prev in Thread] Current Thread [Next in Thread]