bug-make
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug #64806] "invalid output sync mutex" on windows


From: Gergely Pinter
Subject: [bug #64806] "invalid output sync mutex" on windows
Date: Tue, 6 Feb 2024 13:52:23 -0500 (EST)

Follow-up Comment #29, bug#64806 (group make):

As proposed by Eli, I have investigated the process of mutex creation and
inheritance; please find below a brief summary regarding the implementation of
key functions in _posixos.c_ and _w32os.c_, a proposal for bringing the two
approaches closer to each other then finally eliminating the hang experienced
previously.

According to _os.h_, _osync_setup()_ is a function "called in the parent make
to set up output sync initially":
* In _posixos.c_ this function is implemented by opening a temporary file and
storing its descriptor and name in variables _osync_handle_ and
_osync_tmpfile_ respectively; the handle is configured for _not_ being
inherited.
* In _w32os.c_ this function is implemented by creating an unnamed mutex,
setting its handle to be _inherited_ and storing the handle in _osync_handle_
(since the mutex is unnamed, there is no variable equivalent to
_osync_tmpfile_).

According to _os.h_, _osync_get_mutex()_ is a function that "returns an
allocated buffer containing output sync info to pass to child instances, or
NULL if not needed":
* In _posixos.c_ this function returns a dynamically allocated copy of
_osync_tmpfile_ prefixed with _MUTEX_PREFIX_ ("fnm:"), i.e., a file name
prefixed with a known constant.
* In _w32os.c_ this function returns a dynamically allocated string that holds
_osync_handle_ printed in hexadecimal (note that apparently the _MUTEX_PREFIX_
macro is there in _w32os.c_, but not used).

According to _os.h_, _osync_parse_mutex()_ is a function "called in a child
instance to obtain info on the output sync mutex":
* In _posixos.c_ this function extracts the name of the temporary file
(created by _osync_setup()_ and name prefixed by _osync_get_mutex()_) and
opens it; note that this behavior does not fully conform to the description in
_os.h_, because here we not only "obtain info on" the mutex but actually _gain
access to it_ by opening the file.
* In _w32os.c_ this function extracts the numeric mutex handle (created by
_osync_setup()_ and written into string in hexadecimal by
_osync_get_mutex()_); since here we do not open any object just really extract
information about some inherited resource, this behavior seems to better match
the description in _os.h_.

According to _os.h_, _osync_clear()_ is a function to "clean up this
instance's output sync facilities":
* In _posixos.c_ this closes the handle; additionally if the actual process is
the topmost make process, also deletes the file.  Note that (unless we are in
the topmost make process), this function does not prevent children from using
the temporary file for synchronization since the file remains existing.
* In _w32os.c_ this function closes the handle of the mutex.  Note that,
closing the handle prevents its inheritance to children, furthermore, since we
only remembered the handle as a number (no name for the mutex), a child
process calling _osync_parse_mutex()_ will falsely consider the number
returned to be an inherited mutex handle; this is made even worse by the fact
that several pipes are opened upon child process creation thus the numeric
handle will likely be valid but refer to some pipe or file, not the mutex
(this is the explanation for apparent hang: child process tries to wait for a
"mutex" that is in reality some terminal stream).  This behavior is not a
problem, as long as no child process is created after calling _osync_clear()_
but triggered if a child process is started after _osync_clear()_ --
apparently this happens only under quite special circumstances, namely when a
Makefile includes another makefile (component.mk in my example) that triggers
inclusion of further generated makefiles (_.d_ files in my example).

Functions _osync_acquire()_ and _osync_release()_ are quite obvious:
* In _posixos.c osync_acquire()_ locks the temporary file, _osync_release()_
unlocks it.
* In _w32os.c osync_acquire()_ locks waits for the mutex
(_WaitForSingleObject()_), _osync_release()_ releases the mutex
(_ReleaseMutex()_).

Summary:
* _osync_setup()_ creates some object that processes can synchronize on (file
on POSIX, mutex on Windows).
* _osync_get_mutex()_ creates a string representation that can be used for
referring to the synchronization object and can be passed as command line
argument to child processes (prefixed file name on POSIX, handle number on
Windows).
* _osync_parse_mutex()_ extracts the information from the command line
argument and (i) on POSIX it additionally opens the temporary file while (ii)
on Windows the mutex is not opened but expected to be inherited.
* _osync_clear()_ closes the file or mutex; the POSIX implementation
explicitly deletes the temporary file in the topmost make process, while the
Windows implementation relies on the garbage collection upon exiting the last
process that had a handle on the unnamed mutex object.

The key difference between the two approaches is that _osync_parse_mutex()_ on
POSIX only expects that the file used for synchronization exists (i.e., has
been created by the parent make process) while on Windows it not only expects
that the mutex exists but also that the handle of that mutex is inherited --
this second assumption is broken if a child process is created after
_osync_clear()_.

> So I think the next step is to understand which call to osync_clear closes
the handle.  Maybe we shouldn't make that call, at least on Windows? 

Based on the observations above, I would not go in the direction proposed in
the most recent comment "[...] Maybe we shouldn't make that call, at least on
Windows? [...]".  I would say that this approach would further extend the
conceptual difference between POSIX and Windows implementations of the mutual
exclusion mechanism.  I would rather propose bringing them closer together by
not relying on inheritance of the mutex handle but -- similarly to named
temporary files on POSIX -- using a named mutex and instead of passing its
handle as a number to child processes (which will break if we close the handle
in _osync_clear()_), passing the _name_ of the mutex.

Please find below my proposals for implementation of mutex handling according
to this scheme:


/** Name prefix for alternative mutex usage (MUTEX_PREFIX was not used in the
original). */
#define MUTEX_ALT_PREFIX "make-mutex-"

/** Name of the mutex for alternative mutex usage. */
static char *osync_mutex_name = NULL;

/* Alternative mutex setup implementation: creates a mutex named
"make-mutex-<PID of the topmost make process>" */
void
osync_setup_alt()
{
  /* Construct name of the mutex by appending ID of the top-level make process
*/
  static const char * const format = MUTEX_ALT_PREFIX "-%" PRIu32;
  const uint32_t pid = (uint32_t) GetCurrentProcessId();
  const int name_len = 1 + snprintf(NULL, 0, format, pid);

  /* Remember name of the mutext */
  osync_mutex_name = xmalloc(name_len);
  snprintf(osync_mutex_name, name_len, format, pid);

  osync_handle = CreateMutex(NULL, FALSE, osync_mutex_name);
  if (0 == osync_handle) 
    {
      fprintf(stderr, "CreateMutex: error %lu\n", GetLastError());
      errno = ENOLCK;
    }
}



/* Alternative implementation for copying name of the mutext: copies
previously constructed name in allocated buffer */
char *
osync_get_mutex_alt()
{
  return osync_enabled() ? xstrdup(osync_mutex_name) : NULL;
}



/* Alternative implementation for osync_parse_mutex(): extract the name of the
mutex from command line argument and open it */
unsigned int
osync_parse_mutex_alt(const char *mutex)
{
  unsigned ret = 0;

  if (0 == strncmp(mutex, MUTEX_ALT_PREFIX, strlen(MUTEX_ALT_PREFIX)))
    {
      osync_handle = OpenMutex(MUTEX_ALL_ACCESS, FALSE, mutex);
      if (NULL != osync_handle)
        ret = 1;
      else
        fprintf(stderr, "Cannot open mutex: %s\n", mutex);
    }
  else
    {
      fprintf(stderr, "Ill-formed mutex name: %s\n", mutex);
    }

  return ret;
}


Having replaced _osync_setup()_, _osync_get_mutex()_ and _osync_parse_mutex()_
with these alternative implementations eliminated the hang experienced
previously.

Additionally, while analyzing the behavior of make regarding obtaining and
releasing mutexes, I experienced that _osync_parse_mutex()_ seems to be called
more than once.  Apparently _decode_output_sync_flags()_ does not only
"decode" the synchronization flags, but also calls _osync_parse_mutex()_ which
-- as highlighted above -- in contrary to its name, not only parses a string
but also opens a file on POSIX.  This was not a problem at least on Windows
until now, since _osync_parse_mutex()_ on Windows really only extracted a
number from a string (I suspect that the POSIX implementation wastes a few
file descriptors this way due to opening the same file several times, but have
not yet checked).  However when transitioning to the named mutex thus bringing
POSIX and Windows implementations closer, the mutex would be opened more then
once this way.  The reason for calling _osync_parse_mutex()_ in
_decode_output_sync_flags()_ is not clear for me, because
_osync_parse_mutex()_ function is called in _main()_ anyway.  Thus I would
propose removing the call to _osync_parse_mutex()_ from
_decode_output_sync_flags()_:


static void
decode_output_sync_flags (void)
{
#ifdef NO_OUTPUT_SYNC
  output_sync = OUTPUT_SYNC_NONE;
#else
  if (output_sync_option)
    {
      if (streq (output_sync_option, "none"))
        output_sync = OUTPUT_SYNC_NONE;
      else if (streq (output_sync_option, "line"))
        output_sync = OUTPUT_SYNC_LINE;
      else if (streq (output_sync_option, "target"))
        output_sync = OUTPUT_SYNC_TARGET;
      else if (streq (output_sync_option, "recurse"))
        output_sync = OUTPUT_SYNC_RECURSE;
      else
        OS (fatal, NILF,
            _("unknown output-sync type '%s'"), output_sync_option);
    }

#if 0 /* To be removed */
  if (sync_mutex)
    osync_parse_mutex (sync_mutex);
#endif /* To be removed */

#endif
}


With the above changes the hang experienced previously seems to have been
eliminated and the POSIX and Windows implementations are moved closed to each
other.  Unfortunately I am not aware of the reasons for having deviated from
the POSIX implementation originally (i.e., relying on inheritance instead of
opening a named synchronization object); this may be due to some performance
concerns which I am not familiar with.  Finally, _osync_parse_mutex()_ could
use some clarification: its name or at least the documentation comment should
reflect that here we not only do some string parsing but actually obtaining
resources.



    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?64806>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]