[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
shepherd respawn frequency
From: |
Attila Lendvai |
Subject: |
shepherd respawn frequency |
Date: |
Sat, 27 May 2023 08:34:51 +0000 |
dear guix,
the issue at hand:
i have a daemon that simply quits when some of its running condition is not
satisfied. this can be dependent on unpredictable external factors, like the
temporary unreachability of a remote service.
shepherd respawns it immediately in RESPAWN-SERVICE, without any delay, which
leads to a kind of a busy loop (i noticed this through the fan noise of the
machine). i know that there's a stopgap measure to disable such services, but:
1) some of these daemons struggle long enough before quitting that
they do not trigger the default RESPAWN-LIMIT-HIT? stopgap
measure
2) i *do* want shepherd to keep restarting them indefinitely, but
not immediately after their premature exit
proposed solution:
would the shepherd maintaners (looking at you Ludo :) accept a change that
introduces a new field into <service> called RESPAWN-DELAY, and issue a fiber
sleep in RESPAWN-SERVICE when it is not #false, and the daemon process quits
unexpectedly?
in an initial commit i'd also turn the global variable called RESPAWN-LIMIT
into a field of <service>, and make it take its default value from a properly
named %RESPAWN-LIMIT global variable.
open questions:
- what should be the default value of the respawn delay? i suggest 5
seconds, and i'd argue against it being disabled by default:
- premature exits happen more frequently at startup than in an
already running process
- an unwanted default respawn delay causes less headache than an
unwanted busy loop.
- if the respawn delay is set, then should respawn-limit be ignored?
IOW, should the logic treat them as two independent variables, or
should it not? and should there be some logic in how/where they
take their defaults from?
my pick: treat them as two independent variables, but when the user
explicitly specifies a respawn delay for the service object, then
there shouldn't be any respawn limit, unless the user also
explicitly specifies it on the <service> object.
corollary: the handling of defaults should be implemented so that
the fields of <service> hold #false as default value, in which
case the logic takes the default value from a global variable in
shepherd.
- should i bother with detecting a first respawn in a given past
period (of e.g. 1 minute?), and do not apply any delay when this is
the first respawn in that time window? this adds extra complexity,
which may not be worth it. i'd go with a pass here.
- after a cursory look, i don't understand the relationship between
RESPAWNS and FAILURES. the former seems to be an endlessly growing
list of timestamps, while the latter is a ring buffer of
timestamps. it's not crucial for me to understand it, but i wonder
if there's a bug lurking there that eats up the heap when a service
keeps respawning without any delay?
i'm all ears for suggestions, and i'm also happy to hand over the
implementation to someone else, who already had plans to do it, and knows the
internals of shepherd better than me.
--
• attila lendvai
• PGP: 963F 5D5F 45C7 DFCD 0A39
--
“Looking back on a 30-year teaching career full of rewards and prizes, somehow
I can't completely believe that I spent my time on earth institutionalized; I
can't believe that centralized schooling is allowed to exist at all as a
gigantic indoctrination and sorting machine, robbing people of their children.
Did it really happen? Was this my life? God help me.”
— John Taylor Gatto (1935–2018), Teacher of the Year, both in New York
City and State, multiple times
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- shepherd respawn frequency,
Attila Lendvai <=