bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: "printf %d ''" should diagnose the empty string


From: Martin D Kealey
Subject: Re: "printf %d ''" should diagnose the empty string
Date: Fri, 22 Nov 2024 21:55:44 +1000

My main argument for the validity of empty==zero has nothing to do with
unset variables, and I'm fed up with people dismissing my position as
somehow invalid because "relying on unset variables is unsafe". I'm not, so
choose a more germane excuse if you're going to dismiss me.

For me, the most *logical* way to write zero is as the empty string, even
if that's not the *customary* way to write it. Feel free to disagree, but
be aware we're talking about opinions, not facts, and there are cogent
reasoned arguments to be made both ways.

On Fri, 22 Nov 2024 at 06:23, Paul Eggert <eggert@cs.ucla.edu> wrote:

> On 2024-11-20 23:25, Martin D Kealey wrote:
> > 2. There exist deployed scripts that rely on the current behaviour.
> Any such scripts won't work on other shell implementations that do conform
> to POSIX here.
>

Who said POSIX? My entire point has been to explain why the *non*-POSIX
behaviour should not change.

If someone *asks* for POSIX mode, then I agree with you: fix any cases
where Bash does not conform to the spec, and be picky about details that
won't work on other POSIX shells; in short, act like "lint for POSIX sh".

Where?


Who cares? They exist, they work, and we should not break them.
The fact that they're hard to find - and audit - is MORE reason not to
break them, not less. Even if it turns out that no scripts currently in use
actually rely on this feature, having to audit every script to be sure of
this would be an unreasonable impost.

If you mean "prove that they exist", then:

   1. I have numerous examples right here on my laptop, and also
   2. other places I can't tell you about because of NDAs and/or my feeble
   human memory; but
   3. scripts that I wrote 17+ years ago were still running at numerous
   clients when I last checked 7 years ago; I'm sure they will still be
   running at *some* clients, even though I'm no longer paid to support
   them.
   4. And since I often do a non-trivial amount of numerical calculation in
   my scripts, some significant portion of those will rely on this behaviour.

>From time to time the hosts on which these run have their OS (typically
RedHat or Ubuntu) updated, which can cause old scripts to find themselves
running on a newer version of Bash. There's no "release manager" to look
after these transitions, they're just expected to work.

Admittedly I'm a sample size of one, but this is one of those idioms that
is so obvious and so useful (when considering certain kinds of problems)
that I VERY much doubt I'm the only person ever to have made use of it.

I would guess perhaps 0.1% of Shell script writers have used this idiom at
least once, possibly inadvertently. Which means there could be tens or
hundreds of thousands of impacted scripts, and many of them doing their
work unattended, with no human who remembers they exist.

> This behaviour is entirely consistent with strtol(arg,&end,0) where you
> > only check *end==0 and don't check end>arg.
>
> Yes, the behavior is entirely consistent with a common misuse of strtol.
> :-)
>

Smiley noted. Just saying, use or misuse is a subjective judgement.

> At some point in the past it was deemed appropriate to add a check
> > tantamount to “if (end==arg && posix_mode) fail();”.
>

I was making this statement on the basis of an earlier comment which I
thought implied that Bash in POSIX mode already provided the enforcement.
Now that I'm at a PC and can check the actual behaviour, I see that's not
the case. Please consider this comment withdrawn.

That said, I would like to propose that any enforcement of "zero must have
at least one digit" should be gated on some equivalent of

  shopt -qo posix || shopt -q noemptyzero

-Martin

PS: Why do I think zero is most logically expressed as an empty string?

For most natural integers, the number of digits needed to write 10**N is
one more than the number of digits needed to write 10**N-1. Having to write
"0" (rather than use the empty string) to represent zero is an ugly wart
that messes with this otherwise elegant calculation. This observation also
provides a simple way to calculate floor(log(X)), and in that case
returning -1 when given 0 acts as a useful error indicator, as well as
being directly useful in some calculations. Working in the reverse
direction, the empty string as representation of zero is what you get by
stripping all leading 0's.

The only reason we *need* to write "0" in normal writing is because
handwritten words are delimited by random whitespace so it's impossible to
discern the presence of an empty string. But in most programming languages
strings *always* have delimiters, so there's no technical reason why a zero
inside a string needs *any* digits at all. Having a rule requiring at least
one digit is a sop to the exigencies of handwriting.

Of course I'm not saying that we should *never* write zero as "0" - that's
clearly necessary when it's unquoted - but *requiring* it to be *always*
written that way hinders rather than helps.

If you never use logarithms then I guess you're unlikely to find my
arguments compelling.
But I *do* find them useful, so please don't break my tools.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]