bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: a saner bootstrap script


From: Gary V. Vaughan
Subject: Re: a saner bootstrap script
Date: Mon, 26 Sep 2011 18:55:27 +0700

Hi Stefano!

[[Items I agree with elided]]

On 26 Sep 2011, at 17:27, Stefano Lattarini wrote:
> On Monday 26 September 2011, Gary V wrote:
>> On 25 Sep 2011, at 22:55, Stefano Lattarini wrote:
>>> On Thursday 22 September 2011, Gary V wrote:
>>>> Anyone:
>>>> 
>>>> It's beginning to look as though all this work is, once again, in very
>>>> real danger of just slipping quietly through the cracks.
>>>> 
>>> Hi Gary.  While I don't pretend to really understand the scope and purposes
>>> of your script, I might anyway throw in some few observations, ideas and
>>> nits; hopefully they will be helpful.
>> 
>> Thanks for the reviews, all feedback much appreciated :)  I've applied
>> upstream (the copy kept in Zile currently does double-duty as the master
>> copy while awaiting adoption into gnulib) wherever I didn't comment other-
>> wise below.
>> 
> Do you have a diff file to show what you've changed exactly?  I'd find that
> really helpful.

http://git.savannah.gnu.org/cgit/zile.git/diff/?id2=42498d861f5f5fbeb0a0b61d11c57a7553c855f1

>> Note also, there are many snippets of code donated from other scripts that
>> have been tested for years in other projects (like libtoolize) if at times
>> there is something that might look a little odd out of that context.
>> 
> I agree that keeping such snippets as unchanged as possible is a good idea.
> The real fix for the style inconsistencies they introduce is to add proper
> comments referencing the place from where any snipped had been copied.

Yes, I'm still kicking myself for not having kept track of that better.  Sorry.

>>>> # func_run_hooks FUNC_NAME [ARG]...
>>>> # ---------------------------------
>>>> # Run all hook functions registered to FUNC_NAME.
>>>> func_run_hooks ()
>>>> {
>>>>   $debug_cmd
>>>> 
>>>>   case " $hookable_funcs " in
>>>>     *" $1 "*) ;;
>>>>     *) func_fatal_error "Error: \`$1' does not support hook funcions.n" ;;
>>>>   esac
>>>> 
>>> Code duplication with 'func_add_hook' above, and with many other functions
>>> throughout the script.  Couldn't the logic to determine whether an item is
>>> already present in a space-separated list be factored out in its own
>>> subroutine?  E.g.,
>>> 
>>> # Usage: is_item_in_list ITEM [LIST ..]
>>> is_item_in_list ()
>>> {
>>>   item=$1; shift;
>>>   case " $* " in *" $item "*) return 0;; *) return 1;; esac
>>> }
>>> 
>>> Such a refactoring could as well be done in a follow-up patch, though
>>> (in fact, I'd personally prefer to do it that way).
>> 
>> I've avoided using return under the impression that it is not entirely
>> portable, although I would be happy to see evidence that there are no Bourne
>> shells with function support and yet with broken return support.
>> 
> Well, autoconf-generated configure script use "return", and no one has
> complained so far, so my guess is that their use is pretty safe.  And if
> you are going to worry about museum-piece shells that don't grasp `return',
> you should probably also worry about the fact that some of them won't
> preserve positional parameters after a call to a shell function; to quote
> the autoconf manual:
> 
>  Some ancient Bourne shell variants with function support did not reset
>  $1, $2, etc, upon function exit, so effectively the arguments of the
>  script were lost after the first function invocation. It is probably
>  not worth worrying about these shells any more.
> 
> And here <http://www.in-ulm.de/~mascheck/bourne/> (what an incredibly
> useful resource, BTW!) I read that shell functions (and the "return"
> built-in) had been introduced already in the SVR2 shell (1984) (although
> the positional parameters weren't local in that version yet; this
> limitation has been fixed only in the SVR3 shell (1986)).

I think perhaps I'm confusing my memory of slightly different things, and
on balance I've added another TODO item to factor in the use of return, and
return (0<=int<=127) as something that is extremely unlikely to be a problem
these days.

>>>>     # store returned options list back into positional
>>>>     # parameters for next `cmd' execution.
>>>>     eval set dummy "$func_run_hooks_result"; shift
>>>> 
>>> This (together with the `func_run_hooks_result' assignement above) will
>>> not proprly preserve spaces in positional arguments:
>>> 
>>> $ bash -c '(set "x  y"; a="$@"; eval set dummy "$a"; shift; echo "$@")'
>>> x y
>>> 
>>> I don't know whether this matters or not, but is something to consider.
>> 
>> I can't think of a situation where whitespace preservation would be
>> important in this case.
>> 
> OK, but I've now noticed that it gets worse.  Your implementation does in
> fact break up positional paramenters containing whitespace:
> 
> $ bash -c '(set "x y"; a="$@"; eval set dummy "$a"; shift; printf :%s:\\n 
> "$@")'
> :x:
> :y:
> 
> and worse again, expands wildcards in the positional paramenters:
> 
> $ cat /tmp && touch f1 f2 f3
> $ bash -c '(set "f*"; a="$@"; eval set dummy "$a"; shift; printf :%s:\\n 
> "$@")'
> :f1:
> :f2:
> :f3:
> 
> If these limitations are acceptable, you should state them explicitly in
> the comments, and could then simplify your code as follows:
> 
>  func_run_hooks_result=$*
>  ...
>  set dummy $func_run_hooks_result; shift

Good analysis, thanks.

I think, in practice, these limitations are not likely to cause problems with
whitespace delimited lists of legal hook function names.  However, if you can
suggest a better implementation that doesn't obfuscate the code any, I'd be
very happy to use something safer instead.

In the mean time, I'll use your simpler implementation, and add comments to
explain why the limitations are acceptable in this case.

>>>> # func_update_translations
>>>> # ------------------------
>>>> # Update package po files and translations.
>>>> func_hookable func_update_translations
>>>> func_update_translations ()
>>>> {
>>>>   $debug_cmd
>>>> 
>>>>   $opt_skip_po || {
>>>>     test -d po && {
>>>>       $require_package
>>>> 
>>>>       func_update_po_files po $package || exit $?
>>>>     }
>>>> 
>>>>     func_run_hooks func_update_translations
>>>>   }
>>>> }
>>>> 
>>> I personally find the code flow here quite unclear.  Something like
>>> this would be clearer IMHO:
>>> 
>>> $opt_skip_po && return
>>> if test -d po; then
>>>   $require_package
>>>   func_update_po_files po $package || exit $?
>>> fi
>>> func_run_hooks func_update_translations
>>> 
>>> The same goes for similar usages throughout the script.
>> 
>> I suppose it's just a matter of which idioms your eye have become accustomed
>> to.  Personally, I find the additional noise of 'if', '; then' and 'fi'
>> distracting, and have been happily using the braces idiom (when there's no
>> 'else' branch to consider) for many many years as a result (e.g. the shell
>> code in libtool).
>> 
>> If all that code churn and retesting is a prerequisite for acceptance into
>> gnulib,
>> 
> This is only for the gnulib maintainers to decide (I'm an "occasional
> small contributor" at the very most, so my opinion carries no weight in
> this regard).  What I can say is that the current style is very, *very*
> confusing to me, and the need to respect it would shy me away from the
> idea of contributing to the bootstrap script.

I'm not entirely opposed to changing it, but only if it is a show-stopper.
I can honestly say that I've never misspelled a closing '}', where I would
be quite rich if I had a penny for the number of times I'd typed endif or
some variant instead of 'fi', or forgotten the semicolon before 'then', or
forgotten to type 'then' entirely.

Once you start writing scripts in this style it's hard to go back to the
clumsy noisy 'if' '; then' 'fi' constructs.

Unless there are serious issues with a peculiarity of style, when accepting
contributions to my projects I tend to be sympathetic to the preferences of
the author and maintainer of a contributed piece of code... except of course
where it flies in the face of conventions set by the surrounding code.

Really, many stylistic conventions come down to programmer preferences,
although in common with many of the people who prefer conflicting conventions,
I can back up most of my choices with anti-examples of what can go wrong by
not following the conventions I've built up over the last quarter century of
writing shell scripts for the FSF :)

>> then I'll reluctantly go through and change the style to match...
>> but that will also prevent cross-patching with donor code in libtool and
>> various other scripts I have that share the current idioms.
>> 
> As I've said, occasional style inconsistencies are OK if they are meant to
> facilitate syncing with third-party projects (but I'd like to see big noisy
> comments about this fact, for every snippet of inconsistent code).

The main points of sharing are functions used between many of the scripts in
Libtool and to a lesser extent in M4, all of which I had the luxury of writing
in my preferred style since I was the lead maintainer and/or contributor to
those projects at the time.  I wouldn't dream of being arrogant enough to say
that the conventions I like and follow are any better than another set of
conflicting conventions such as those used in gnulib-tool... but I will say
that I'm very happy with mine, and that they have served me well in helping
to keep my shell-scripting productivity pretty high considering how fiddly and
annoying it can be to write clean elegant shell code, let alone portable shell
code! :-)

>>>> # require_gnulib_submodule
>>>> # ------------------------
>>>> # Ensure that there is a current gnulib submodule at `$gnulib_path'.
>>>> require_gnulib_submodule=func_require_gnulib_submodule
>>>> func_require_gnulib_submodule ()
>>>> {
>>>>   $debug_cmd
>>>> 
>>>> [SNIP]
>>>> 
>>>>     trap - 1 2 13 15
>>>> 
>>> Not portable to at least Solaris 10 /bin/sh; quoting the Autoconf manual:
>>> 
>>> Posix says that "trap - 1 2 13 15" resets the traps for the specified 
>>> signals to
>>> their default values, but many common shells (e.g., Solaris /bin/sh) 
>>> misinterpret
>>> this and attempt to execute a "command" named '-' when the specified 
>>> conditions
>>> arise. Posix 2008 also added a requirement to support "trap 1 2 13 15" to 
>>> reset
>>> traps, as this is supported by a larger set of shells, but there are still 
>>> shells
>>> like dash that mistakenly try to execute 1 instead of resetting the traps.
>>> Therefore, there is no portable workaround, except for "trap - 0", for which
>>> "trap '' 0" is a portable substitute.
>> 
>> You've lost me.  So, I should write:
>> 
>>  trap '' 0
>> 
>> instead of:
>> 
>>  trap - 1 2 13 15
>> 
>> ?
>> 
> No; the point is that you *can't* portably reset the signal handlers to their
> default.  I'm not offering a solution about this, since I don't know any.
> Maybe you could restructure the code to avoid having to deal with signal
> handlers altogether?  Or could you install proper handlers unconditionally?
> 
>> That seems like I might as well just miss out the signal resetting altogether
>> (effectively what trap '' 0 seems to do anyway), and let the gnulib cleanup 
>> function
>> fire impotently if those signals are trapped long after the window it would 
>> have been
>> useful.
>> 
> Could be a solution.  Or you might install a more general anf flexible
> exit trap earlier and unconditionally, and since you are at it, could
> make it user-extensible as well (isn't this the point of your rewrite
> of bootstrap? ;-)
> Of course, this could safely be done with a follow-up patch.

I suppose I must have been spared this particular oddity by carefully installing
bash and/or zsh on any solaris machines I have to spend any time with.

Another TODO item added to rethink signal handling in bootstrap first and 
foremost,
and if I ever get back to Libtool hacking, most likely in many places around 
there
too.  Thanks for bringing it to my attention.

>>>> # func_unset VAR
>>>> # --------------
>>>> # Portably unset VAR.
>>>> func_unset ()
>>>> {
>>>>   { eval $1=; unset $1; }
>>>> }
>>>> unset=func_unset
>>>> 
>>> What is the point of this function, if boostrap does not run under
>>> 'set -e'?  Using a simple 'unset' should be good enough, since even
>>> in shells where it returns a non-zero exit status for already-unset
>>> variables, it does not display any error message (at leat not that
>>> I know of).  Or are you just "erring on the side of safety", and
>>> deliberately?
>> 
>> It's boilerplate code that I have in all my scripts so that I can use
>> func_unset without having to stop and think, whether or not the script
>> is in set -e mode.
>> 
> Still, future users/maintainers of your script will have to stop and
> think about `func_unset', since they are not used to it, and its
> purpose might not be clear to them.  At the very list, you should
> add a more precise comment to the function definition, like this:
> 
>  Portably unset VAR.  In some shells, an `unset VAR' statement leaves
>  a non-zero return status if VAR is already unset, which might be
>  problematic if the statement is used at the end of a function (thus
>  poisoning its return value) or when `set -e' is active (causing even
>  a spurious abort of the script in this case).

Well, technically that would be $unset rather than func_unset, but I agree
and have added your comment text.

>>>> func_cmp_s ()
>>>> {
>>>>   $CMP "$@" >/dev/null 2>&1
>>>> }
>>>> func_grep_q ()
>>>> {
>>>>   $GREP "$@" >/dev/null 2>&1
>>>> 
>>> Why redirecting also the stderr of grep and cmp to /dev/null?  This will
>>> hide potential problems due to e.g., incorrect regular expressions or
>>> non-existent files ...
>> 
>> I could rename them to func_cmp_sshhh_dont_make_a_sound ;)
>> 
> I mean, is the silencing of such errors really wanted, or an unthought
> side effect?  Feature or bug?  If the former is true, then it should be
> explicitly stated in the functions' descriptions IMO.

True.  I've improved the comment to note that there really will be no output
whatsoever from these functions.

>> I inherited these problems from libtool, so I should go back and fix them
>> there too when I have time.
>> 
>>>> # func_show_eval CMD [FAIL_EXP]
>>>> # -----------------------------
>>>> # Unless opt_silent is true, then output CMD.  Then, if opt_dryrun is
>>>> # not true, evaluate CMD.  If the evaluation of CMD fails, and FAIL_EXP
>>>> # is given, then evaluate it.
>>>> func_show_eval ()
>>>> {
>>>>   $debug_cmd
>>>> 
>>>>   my_cmd="$1"
>>>>   my_fail_exp="${2-:}"
>>>> 
>>>>   ${opt_silent-false} || {
>>>>     func_quote_for_eval $my_cmd
>>>>     eval func_truncate_cmd $func_quote_for_eval_result
>>>>     func_echo "running: $func_truncate_cmd_result"
>>>>   }
>>>> 
>>>>   if ${opt_dry_run-false}; then :; else
>>>>     eval "$my_cmd"
>>>>     my_status=$?
>>>>     if test "$my_status" -eq 0;
>>>> 
>>> Useless use of quites around $my_status.    
>>> 
>>>>   then :; else
>>>>       eval "(exit $my_status); $my_fail_exp"
>>>>     fi
>>>>   fi
>>>> }
>>>> 
>> 
>> Actually, yuck for the mixed styles in that function.  Refactored the last
>> block to match the first:
>> 
>>    ${opt_dry_run-false} || {
>>      eval "$my_cmd"
>>      my_status=$?
>>      test 0 -eq $my_status || eval "(exit $my_status); $my_fail_exp"
>>    }
>> 
> Oh no!  Not these nested `||'!  They obfuscate a perfectly clear logic IMHO;
> couldn't you use this instead, please?
> 
>    ${opt_dry_run-false} || {
>      eval "$my_cmd"
>      my_status=$?
>      if test $my_status -ne 0; then
>         eval "(exit $my_status); $my_fail_exp"
>      fi
>    }

As much as you hate them that way, I find the opportunity to misspell a 'fi'
or forget the ';' before the 'then' equally painful.  Again though, it's just
a stylistic issue, and the apparent ugliness is just because of the difference
to what our eyes are used to seeing.

What could be clearer than 'try this' || 'if that failed, do this instead'? I
acknowledge that spending equal time programming C, Python, Lua, elisp and shell
has probably warped my sensibilities at least a little though.  Be thankful
that I never started to enjoy Perl, or the horrors would be so much worse for
us all!! :-)

>>>> # func_get_version APP
>>>> # --------------------
>>>> # echo the version number (if any) of APP, which is looked up along your
>>>> # PATH.
>>>> func_get_version ()
>>>> {
>>>>   $debug_cmd
>>>> 
>>>>   app=$1
>>>> 
>>>>   { $app --version || $app --version </dev/null; } >/dev/null 2>&1 \
>>>>     || return 1
>>>> 
>>> What's the point of the second `$app --version'?
>> 
>> To accomodate tools like git2cl, which error out with no stdin.
>> 
> Then why not a simpler:
>  $app --version </dev/null >/dev/null 2>&1
> instead?

Other applications (I think help2man was a culprit here) refuse to spit
out a version number when started with stdin pointing at /dev/null.

>>> Finally, a couple of more general observations:
>>> 
>>> 1. The bootstrap script is now complex enough to warrant the
>>>    introduction of a testsuite.
>> 
>> That's an excellent notion.  But after a year or more of prodding and
>> cajoling, I haven't even gotten the script itself accepted into gnulib.
>> I'm not ready to burn another month or two of my GNU hacking time yet,
>> at least until all the work I put into bootstrap itself is legitimized
>> by it's acceptance.
>> 
> Makes sense.  As long as you agree that adding a testsuite once your
> bootstrap rewrite has been accepted is something that should be done,
> I'm really fine.

Well, I won't promise to actually write the testsuite, but I do promise
that I would like to see it just as much as you, and will try to at
least put a framework in place for contributors to hang test cases on
not too long after bootstrap makes its way upstream.

>>> 2. IMHO, since bootstrap is a maintainer tool anyway, we should
>>>    assume that a POSIX shell is used to run it, thus allowing
>>>    various simplifications and optimizations.  If we want to be
>>>    helpful towards developer having an inferior default shell,
>>>    we could run a sanity check on the shell itself early in the
>>>    script, warning the user if his shell isn't able to handle
>>>    the required constructs and usages.
>> 
>> Because of the amount of boilerplate in this script (from my own code,
>> much of which I also share with libtool and m4) I'd rather not optimize
>> any of it for bootstraps particular use, since that reduces future
>> sharing opportunities.
>> 
> OK.  Still, I'm somewhat saddened by the fact that, when it comes to the
> shell, we are still coding as if we were in the late eighties :-(

You mean the late 90's, surely?  Before that, we still weren't comfortable
even using shell functions in portable shell scripts!!

Cheers,
-- 
Gary V. Vaughan (gary AT gnu DOT org)


reply via email to

[Prev in Thread] Current Thread [Next in Thread]