bug-make
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: make-3.79 on solaris8 broken


From: Howard Chu
Subject: RE: make-3.79 on solaris8 broken
Date: Mon, 19 Nov 2001 13:38:40 -0800

I've seen this kind of problem before in other programs, but usually only on
NFS-mounted filesystems. Generally on local UFS partitions the system calls
are atomic. It would be simpler if we could use sigaction() and set the
SA_RESTART flag for these signals, but the Solaris man pages don't mention
stat() as being one of the restartable system calls. (But I'd bet that it
is...)

  -- Howard Chu
  Chief Architect, Symas Corp.       Director, Highland Sun
  http://www.symas.com               http://highlandsun.com/hyc
  Symas: Premier OpenSource Development and Support

> -----Original Message-----
> From: address@hidden [mailto:address@hidden Behalf Of
> Kevin Nomura
> Sent: Monday, November 19, 2001 1:07 PM
> To: address@hidden
> Subject: make-3.79 on solaris8 broken
>
>
> Using make-3.79 under solaris 6 and solaris 8, I have been seeing
> two intermittent problems.  It seems to get worse with higher values
> of -j.   One is "No rule to make target xxx" when there is, in fact,
> a rule to make target xxx.  As befits an intermittent problem, the
> make succeeds if rerun with no changes.
>
> The second problem is more insidious: make *quietly* fails to rebuild
> some of its targets that are out of date.  The symptom is link errors
> with unsat symbols owing to the incomplete build.  Again, rerunning
> make picks these up and succeeds.  Since this is a chronic problem for
> us I spent this past weekend debugging it with make -d and have some
> theories to offer.
>
> The first problem seems due to the stat() in remake.c not being protected
> by a retry loop for EINTR.  stat() on solaris is documented as failing
> with EINTR.  So, I fixed this, actually implementing the "safe_stat()"
> function that has a prototype in make.h but no definition (!?).  This
> cleared up the "No rule" errors but not the unsat link problems.
>
> For the second problem with failed links, the -d trace surrounding one of
> the files that should have been remade (but was not) looked like:
>
>         Considering target file `../netcache/server/obj/td/wccp2.o'.
>          Looking for an implicit rule for
> `../netcache/server/obj/td/wccp2.o'.
>          Trying pattern rule with stem `wccp2'.
>          Trying implicit prerequisite `../netcache/server/obj/td/wccp2.r'.
> Got a SIGCHLD; 1 unreaped children.
> Got a SIGCHLD; 2 unreaped children.
>          Trying pattern rule with stem `wccp2'.
>          Trying implicit prerequisite `../netcache/server/obj/td/wccp2.f'.
>          Trying pattern rule with stem `wccp2'.
>          Trying implicit prerequisite `../netcache/server/wccp2.c'.
> Got a SIGCHLD; 3 unreaped children.
>          Trying pattern rule with stem `wccp2'.
>          Trying implicit prerequisite `../netcache/server/wccp2.cpp'.
>          Trying pattern rule with stem `wccp2'.
>          Trying implicit prerequisite `../netcache/server/wccp2.c'.
>          Trying pattern rule with stem `wccp2'.
>          Trying implicit prerequisite `../netcache/server/wccp2.c'.
>          Trying pattern rule with stem `wccp2'.
>          Trying implicit prerequisite `../netcache/server/obj/td/wccp2.c'.
>          Trying pattern rule with stem `wccp2'.
>          Trying implicit prerequisite
> `../netcache/server/obj/td/wccp2.cc'.
>          Trying pattern rule with stem `wccp2'.
> ...
>          No implicit rule found for `../netcache/server/obj/td/wccp2.o'.
> ...
>         No commands for `../netcache/server/obj/td/wccp2.o' and
> no prerequisites
>  actually changed.
>         No need to remake target `../netcache/server/obj/td/wccp2.o'.
>
> Seeing that a signal happened right about the time it was checking
> the prerequisite `../netcache/server/wccp2.c' (the source file, which
> does exist), I zeroed in on the readdir() in
> dir.c:dir_contents_file_exists_p().
> Now, readdir() is not documented in solaris 6 or solaris 8 to
> fail on EINTR.
> But I put in a retry loop anyway and CAUGHT readdir failing on
> EINTR, dozens
> of times in the build in fact.  So with stat() and readdir() (and
> opendir()
> and some others for good measure) guarded by retry loops, the
> problems have
> now subsided.
>
> So assuming these are in fact the causes of the problems I saw, I am
> wondering whether solaris is in error for returning EINTR (e.g. is this
> broken with respect to POSIX or some standard that Solaris claims
> adherence to)?  Should either or both of these be solved within make,
> at least as a practical issue?
>
> Kevin Nomura
> Network Appliance
>
> _______________________________________________
> Bug-make mailing list
> address@hidden
> http://mail.gnu.org/mailman/listinfo/bug-make
>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]