[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[shepherd] 06/06: shepherd: Load config asynchronously and gracefully ha
From: |
Ludovic Courtès |
Subject: |
[shepherd] 06/06: shepherd: Load config asynchronously and gracefully handle errors. |
Date: |
Mon, 12 Jun 2023 09:39:21 -0400 (EDT) |
civodul pushed a commit to branch master
in repository shepherd.
commit 24c964021ebd3d63ce6e22808dd09dbe16116a6c
Author: Ludovic Courtès <ludo@gnu.org>
AuthorDate: Mon Jun 12 12:10:37 2023 +0200
shepherd: Load config asynchronously and gracefully handle errors.
Fixes <https://issues.guix.gnu.org/63982>.
Reported by Maxim Cournoyer <maxim.cournoyer@gmail.com>.
* modules/shepherd.scm (exception-with-kind-and-args?)
(configuration-file-loader): New procedures.
(run-daemon): Spawn 'configuration-file-loader' in a fiber.
(main): Remove "catch 'quit" around 'run-daemon' call: this was uncanny
because the 'primitive-exit' calls meant processes were left behind.
Additionally, exiting upon config file failures wasn't necessarily
desirable as it would prevent users from inspecting and reloading an
updated config file.
* tests/config-failure.sh: New file.
* Makefile.am (TESTS): Add it.
* tests/logging.sh: Wait for 'test-file-logging' to be running.
* tests/pid-file.sh: Wait for config file evaluation to complete.
* doc/shepherd.texi (Invoking shepherd): Document it.
* NEWS: Update.
---
Makefile.am | 1 +
NEWS | 16 ++++++++
doc/shepherd.texi | 12 ++++++
modules/shepherd.scm | 99 +++++++++++++++++++++++++++++++++----------------
tests/config-failure.sh | 77 ++++++++++++++++++++++++++++++++++++++
tests/logging.sh | 4 +-
tests/pid-file.sh | 9 ++++-
7 files changed, 184 insertions(+), 34 deletions(-)
diff --git a/Makefile.am b/Makefile.am
index fc53739..871b78d 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -246,6 +246,7 @@ SUFFIXES = .go
TESTS = \
tests/basic.sh \
+ tests/config-failure.sh \
tests/starting-status.sh \
tests/stopping-status.sh \
tests/startup-failure.sh \
diff --git a/NEWS b/NEWS
index c839187..05a3a1a 100644
--- a/NEWS
+++ b/NEWS
@@ -13,6 +13,22 @@ Please send Shepherd bug reports to bug-guix@gnu.org.
* Changes in 0.10.2
+** ‘shepherd’ loads configuration file asynchronously
+
+Up to 0.10.1, ‘shepherd’ would load the user-provided configuration file
+synchronously: it would write its PID file and start listening for incoming
+connections only after the configuration file has been loaded. The
+configuration file is now loaded in the background, letting users interact
+with shepherd (using the ‘herd’ command) early on.
+
+** ‘shepherd’ keeps going upon configuration file errors
+ (<https://issues.guix.gnu.org/63982>)
+
+Up to 0.10.1, ‘shepherd’ would abruptly exit when an error would occur while
+loading the configuration file—service startup failure, uncaught exception,
+etc. It now reports the error but keeps going, again letting users fix any
+problems dynamically.
+
** New #:respawn-limit parameter to ‘service’
The ‘service’ form supports a new #:respawn-limit parameter to specify
diff --git a/doc/shepherd.texi b/doc/shepherd.texi
index dbd0756..75fcb49 100644
--- a/doc/shepherd.texi
+++ b/doc/shepherd.texi
@@ -419,6 +419,18 @@ starting some or all of those services with
@xref{Service Examples}, for examples of what @var{file} might look
like.
+Note that @var{file} is evaluated @emph{asynchronously}:
+@command{shepherd} may start listening for client connections (with the
+@command{herd} command) before @var{file} has been fully loaded. Errors
+in @var{file} such as service startup failures or uncaught exceptions do
+@emph{not} cause @command{shepherd} to stop. Instead the error is
+reported, leaving users the opportunity to inspect service state and,
+for example, to load an updated config file with:
+
+@example
+herd load root @var{file}
+@end example
+
@item -I
@itemx --insecure
@cindex security
diff --git a/modules/shepherd.scm b/modules/shepherd.scm
index 7058e6b..1207964 100644
--- a/modules/shepherd.scm
+++ b/modules/shepherd.scm
@@ -139,6 +139,53 @@ already ~a threads running, disabling 'signalfd' support")
(fcntl fd F_SETFD (logior FD_CLOEXEC flags)))))
(loop (+ fd 1)))))
+(cond-expand
+ ((not guile-3)
+ (define exception-with-kind-and-args?
+ (const #f)))
+ (else
+ (define exception-with-kind-and-args?
+ (exception-predicate &exception-with-kind-and-args))))
+
+(define (configuration-file-loader file)
+ "Return a thunk that loads @var{file}, the user's configuration file."
+ (define (failure)
+ (report-error
+ (l10n "~s: exception thrown while loading configuration file~%")
+ file)
+ #f)
+
+ (define (handle-key-and-args-exception key args)
+ (match key
+ ('system-error
+ (local-output
+ (l10n "While loading configuration file '~a': ~a")
+ file (strerror (system-error-errno (cons key args)))))
+ (_
+ (local-output
+ (l10n "Uncaught exception while loading configuration file '~a': ~s")
+ file (cons key args)))))
+
+ (lambda ()
+ (guard (c ((action-runtime-error? c)
+ (local-output (l10n "action '~a' on service '~a' failed: ~s")
+ (action-runtime-error-action c)
+ (action-runtime-error-service c)
+ (cons (action-runtime-error-key c)
+ (action-runtime-error-arguments c)))
+ (failure))
+ ((exception-with-kind-and-args? c)
+ (handle-key-and-args-exception
+ (exception-kind c) (exception-args c))
+ (failure)))
+ (catch #t
+ (lambda ()
+ (load-in-user-module file)
+ (local-output (l10n "Configuration successfully loaded from '~a'.")
+ file))
+ (lambda (key . args) ;for Guile 2.2
+ (handle-key-and-args-exception key args))))))
+
(define* (run-daemon #:key (config-file (default-config-file))
socket-file pid-file signal-port poll-services?)
(define signal-handler
@@ -169,15 +216,10 @@ already ~a threads running, disabling 'signalfd' support")
(spawn-fiber
(essential-task-thunk 'signal-handler signal-handler))
- ;; This _must_ succeed. (We could also put the `catch' around
- ;; `main', but it is often useful to get the backtrace, and
- ;; `caught-error' does not do this yet.)
- (catch #t
- (lambda ()
- (load-in-user-module (or config-file (default-config-file))))
- (lambda (key . args)
- (caught-error key args)
- (quit 1)))
+ ;; Load CONFIG-FILE in another fiber. If loading fails, report it but keep
+ ;; going: the user can use 'herd load root' with a new config file if
+ ;; needed.
+ (spawn-fiber (configuration-file-loader config-file))
;; Ignore SIGPIPE so that we don't die if a client closes the connection
;; prematurely.
@@ -399,29 +441,22 @@ fork in the child process."
(register-services (list root-service))
(start-service root-service)
- (catch 'quit
- (lambda ()
- (with-process-monitor
- ;; Replace the default 'system*' binding with one that
- ;; cooperates instead of blocking on 'waitpid'. Replace
- ;; 'primitive-load' (in C as of 3.0.9) with one that does
- ;; not introduce a continuation barrier.
- (replace-core-bindings!
- (system* (lambda command
- (spawn-command command)))
- (system spawn-shell-command)
- (primitive-load primitive-load*))
-
- (run-daemon #:socket-file socket-file
- #:config-file config-file
- #:pid-file pid-file
- #:signal-port signal-port
- #:poll-services? poll-services?)))
- (case-lambda
- ((key value . _)
- (primitive-exit value))
- ((key)
- (primitive-exit 0))))))
+ (with-process-monitor
+ ;; Replace the default 'system*' binding with one that
+ ;; cooperates instead of blocking on 'waitpid'. Replace
+ ;; 'primitive-load' (in C as of 3.0.9) with one that does
+ ;; not introduce a continuation barrier.
+ (replace-core-bindings!
+ (system* (lambda command
+ (spawn-command command)))
+ (system spawn-shell-command)
+ (primitive-load primitive-load*))
+
+ (run-daemon #:socket-file socket-file
+ #:config-file (or config-file (default-config-file))
+ #:pid-file pid-file
+ #:signal-port signal-port
+ #:poll-services? poll-services?))))
#:parallelism 1 ;don't create POSIX threads
#:hz 0))))) ;disable preemption, which would require POSIX
threads
diff --git a/tests/config-failure.sh b/tests/config-failure.sh
new file mode 100644
index 0000000..1cea2ad
--- /dev/null
+++ b/tests/config-failure.sh
@@ -0,0 +1,77 @@
+# GNU Shepherd --- Test shepherd behavior when config file errors out.
+# Copyright © 2023 Ludovic Courtès <ludo@gnu.org>
+#
+# This file is part of the GNU Shepherd.
+#
+# The GNU Shepherd is free software; you can redistribute it and/or modify it
+# under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or (at
+# your option) any later version.
+#
+# The GNU Shepherd is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with the GNU Shepherd. If not, see <http://www.gnu.org/licenses/>.
+
+shepherd --version
+herd --version
+
+socket="t-socket-$$"
+conf="t-conf-$$"
+confdir="t-confdir-$$"
+datadir="t-datadir-$$"
+log="t-log-$$"
+stamp="t-stamp-$$"
+pid="t-pid-$$"
+child_pid="t-child-pid-$$"
+
+herd="herd -s $socket"
+
+trap "cat $log || true; rm -f $socket $conf $stamp $log $child_pid;
+ test -f $pid && kill \`cat $pid\` || true; rm -f $pid" EXIT
+
+cat > "$conf" <<EOF
+(register-services
+ (list (service
+ '(succeeding)
+ #:start (make-forkexec-constructor
+ '("$SHELL" "-c" "echo \$\$ > $PWD/$child_pid; exec sleep
300"))
+ #:stop (make-kill-destructor)
+ #:respawn? #f)
+ (service
+ '(failing)
+ #:requirement '(succeeding)
+ #:start (lambda _
+ (call-with-output-file "$stamp" (const #t))
+ (error "faileddddd!"))
+ #:stop (const #f)
+ #:respawn? #f)))
+
+(start-service (lookup-service 'failing))
+EOF
+
+rm -f "$pid" "$stamp"
+shepherd -I -s "$socket" -c "$conf" -l "$log" --pid="$pid" &
+
+# The 'succeeding' service should be up and running.
+while ! test -f "$child_pid" ; do sleep 0.3 ; done
+kill -0 "$(cat "$child_pid")"
+
+# Wait till it's ready.
+while ! test -f "$pid" ; do sleep 0.3 ; done
+shepherd_pid="$(cat $pid)"
+
+# Then the 'failing' service should fail.
+while ! test -f "$stamp" ; do sleep 0.3 ; done
+
+# Despite the failure while loading $conf, shepherd must be up and running.
+$herd status failing | grep "stopped"
+$herd status succeeding | grep "running"
+
+$herd stop root
+
+while kill -0 "$shepherd_pid" ; do sleep 0.3 ; done
+if kill -0 "$(cat "$child_pid")"; then false; else true; fi
diff --git a/tests/logging.sh b/tests/logging.sh
index 95384d5..961e5b5 100644
--- a/tests/logging.sh
+++ b/tests/logging.sh
@@ -77,8 +77,10 @@ while ! test -f "$pid" ; do sleep 0.3 ; done
shepherd_pid="`cat $pid`"
+while ! test -f "$service_pid" ; do sleep 0.3 ; done
+until $herd status test-file-logging | grep running; do sleep 1; done
+
cat "$service_log"
-$herd status test-file-logging | grep running
for message in "STARTING" "STARTED" "café" "latin1 garbage: .* alors"
do
grep -E '^2[0-9]{3}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}
'"$message" "$service_log"
diff --git a/tests/pid-file.sh b/tests/pid-file.sh
index b2d5bec..55c02fd 100644
--- a/tests/pid-file.sh
+++ b/tests/pid-file.sh
@@ -23,11 +23,12 @@ socket="t-socket-$$"
conf="t-conf-$$"
log="t-log-$$"
pid="t-pid-$$"
+stamp="t-stamp-$$"
service_pid="t-service-pid-$$"
herd="herd -s $socket"
-trap "cat $log || true; rm -f $socket $conf $service_pid $log;
+trap "cat $log || true; rm -f $socket $conf $service_pid $stamp $log;
test -f $pid && kill \`cat $pid\` || true; rm -f $pid" EXIT
cat > "$conf"<<EOF
@@ -97,6 +98,9 @@ cat > "$conf"<<EOF
;; Start it upfront. This ensures the whole machinery works even
;; when called in a non-suspendable context (continuation barrier).
(start-service (lookup-service 'test-works))
+
+;; Note that the config file has been evaluated.
+(call-with-output-file "$PWD/$stamp" (const #t))
EOF
rm -f "$pid"
@@ -107,6 +111,9 @@ while ! test -f "$pid" ; do sleep 0.3 ; done
shepherd_pid="`cat $pid`"
+# The config file is evaluated asynchronously, so wait until it's been loaded.
+until test -f "$stamp" ; do sleep 0.3 ; done
+
# This service should already be running.
$herd status test-works | grep running
test -f "$service_pid"
- [shepherd] branch master updated (0703196 -> 24c9640), Ludovic Courtès, 2023/06/12
- [shepherd] 04/06: comm: 'open-server-socket' deletes the file before binding., Ludovic Courtès, 2023/06/12
- [shepherd] 02/06: shepherd: Spawn signal-handling fiber early on., Ludovic Courtès, 2023/06/12
- [shepherd] 01/06: service: 'service' errors out when 'provision' is invalid., Ludovic Courtès, 2023/06/12
- [shepherd] 03/06: shepherd: Simplify 'call-with-server-socket'., Ludovic Courtès, 2023/06/12
- [shepherd] 05/06: service: Add 'respawn-limit' slot to <service>., Ludovic Courtès, 2023/06/12
- [shepherd] 06/06: shepherd: Load config asynchronously and gracefully handle errors.,
Ludovic Courtès <=