[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#28211: Stack marking issue in multi-threaded code
From: |
Ludovic Courtès |
Subject: |
bug#28211: Stack marking issue in multi-threaded code |
Date: |
Fri, 29 Jun 2018 17:03:42 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) |
Hey hey, comrades!
I have a fix for some (most?) of the crashes we were seeing while
running multi-threaded code such as (guix build compile), and,
presumably, the grafting code mentioned at the beginning of this bug
report, although I haven’t checked yet.
So, ‘scm_i_vm_mark_stack’ marks the stack precisely, but contrary to
what I suspected, precise marking is not at fault.
Instead, the problem has to do with the fact that some VM instructions
change the frame pointer (vp->fp) before they have set up the dynamic
link for that new frame.
As a consequence, if a stop-the-world GC is triggered after vp->fp has
been changed and before its dynamic link has been set, the stack-walking
loop in ‘scm_i_vm_mark_stack’ could stop very early, leaving a lot of
objects unmarked.
The patch below fixes the problem for me. \o/
I’m thinking we could perhaps add a compiler barrier before ‘vp->fp = new_fp’
statements, but in practice it’s not necessary here (x86_64, gcc 7).
Thoughts?
I’d like to push this real soon. I’ll also do more testing on real
workloads from Guix, and then I’d like to release 2.2.4, hopefully
within a few days.
Thank you and thanks Andy for the discussions on IRC!
Ludo’, who’s going to party all night long. :-)
diff --git a/libguile/vm-engine.c b/libguile/vm-engine.c
index 1aa4e9699..19ff3e498 100644
--- a/libguile/vm-engine.c
+++ b/libguile/vm-engine.c
@@ -548,7 +548,7 @@ VM_NAME (scm_i_thread *thread, struct scm_vm *vp,
VM_DEFINE_OP (1, call, "call", OP2 (X8_F24, X8_C24))
{
scm_t_uint32 proc, nlocals;
- union scm_vm_stack_element *old_fp;
+ union scm_vm_stack_element *old_fp, *new_fp;
UNPACK_24 (op, proc);
UNPACK_24 (ip[1], nlocals);
@@ -556,9 +556,10 @@ VM_NAME (scm_i_thread *thread, struct scm_vm *vp,
PUSH_CONTINUATION_HOOK ();
old_fp = vp->fp;
- vp->fp = SCM_FRAME_SLOT (old_fp, proc - 1);
- SCM_FRAME_SET_DYNAMIC_LINK (vp->fp, old_fp);
- SCM_FRAME_SET_RETURN_ADDRESS (vp->fp, ip + 2);
+ new_fp = SCM_FRAME_SLOT (old_fp, proc - 1);
+ SCM_FRAME_SET_DYNAMIC_LINK (new_fp, old_fp);
+ SCM_FRAME_SET_RETURN_ADDRESS (new_fp, ip + 2);
+ vp->fp = new_fp;
RESET_FRAME (nlocals);
@@ -586,7 +587,7 @@ VM_NAME (scm_i_thread *thread, struct scm_vm *vp,
{
scm_t_uint32 proc, nlocals;
scm_t_int32 label;
- union scm_vm_stack_element *old_fp;
+ union scm_vm_stack_element *old_fp, *new_fp;
UNPACK_24 (op, proc);
UNPACK_24 (ip[1], nlocals);
@@ -595,9 +596,10 @@ VM_NAME (scm_i_thread *thread, struct scm_vm *vp,
PUSH_CONTINUATION_HOOK ();
old_fp = vp->fp;
- vp->fp = SCM_FRAME_SLOT (old_fp, proc - 1);
- SCM_FRAME_SET_DYNAMIC_LINK (vp->fp, old_fp);
- SCM_FRAME_SET_RETURN_ADDRESS (vp->fp, ip + 3);
+ new_fp = SCM_FRAME_SLOT (old_fp, proc - 1);
+ SCM_FRAME_SET_DYNAMIC_LINK (new_fp, old_fp);
+ SCM_FRAME_SET_RETURN_ADDRESS (new_fp, ip + 3);
+ vp->fp = new_fp;
RESET_FRAME (nlocals);
@@ -3893,7 +3895,7 @@ VM_NAME (scm_i_thread *thread, struct scm_vm *vp,
NEXT (1);
{
- union scm_vm_stack_element *old_fp;
+ union scm_vm_stack_element *old_fp, *new_fp;
size_t old_frame_size = FRAME_LOCALS_COUNT ();
SCM proc = scm_i_async_pop (thread);
@@ -3907,9 +3909,10 @@ VM_NAME (scm_i_thread *thread, struct scm_vm *vp,
handle-interrupts opcode to handle any additional
interrupts. */
old_fp = vp->fp;
- vp->fp = SCM_FRAME_SLOT (old_fp, old_frame_size + 1);
- SCM_FRAME_SET_DYNAMIC_LINK (vp->fp, old_fp);
- SCM_FRAME_SET_RETURN_ADDRESS (vp->fp, ip);
+ new_fp = SCM_FRAME_SLOT (old_fp, old_frame_size + 1);
+ SCM_FRAME_SET_DYNAMIC_LINK (new_fp, old_fp);
+ SCM_FRAME_SET_RETURN_ADDRESS (new_fp, ip);
+ vp->fp = new_fp;
SP_SET (0, proc);
diff --git a/libguile/vm.c b/libguile/vm.c
index c8ec6e1b2..7749159e5 100644
--- a/libguile/vm.c
+++ b/libguile/vm.c
@@ -1011,6 +1011,18 @@ scm_i_vm_mark_stack (struct scm_vm *vp, struct
GC_ms_entry *mark_stack_ptr,
slot_map = find_slot_map (SCM_FRAME_RETURN_ADDRESS (fp), &cache);
}
+ size_t extra = 0;
+ for (; sp < vp->stack_top; sp++)
+ {
+ if (GC_is_heap_ptr (sp->as_ptr))
+ extra++;
+ }
+ if (extra)
+ {
+ printf ("%s extra: %zi\n", __func__, extra);
+ abort ();
+ }
+
return_unused_stack_to_os (vp);
return mark_stack_ptr;
- bug#28211: Stack marking issue in multi-threaded code,
Ludovic Courtès <=