[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: xargs: doesn't fail on missing terminator (which may have severe con
From: |
Christoph Anton Mitterer |
Subject: |
Re: xargs: doesn't fail on missing terminator (which may have severe consequences) |
Date: |
Sat, 14 Dec 2024 19:55:33 +0100 |
User-agent: |
Evolution 3.54.2-1 |
Hey.
On Fri, 2024-12-13 at 23:05 +0100, Bernhard Voelker wrote:
> I never saw a practical example why it would be dangerous.
Well it seems to me, that in that case even a 1 in Million chance might
have too catastrophic consequences to wait for it happening in the
wild.
Again, consider the find ... | xargs rm -rf example, in which a
"line" is truncated to an incomplete "/".
> Usually a data producer is buffered, and therefore atomically
> outputting entries in a consistent way.
It may were well be not buffered, for example when someone uses stdbuf
with mode 0 to executed some utility which internally calls xargs.
> Is there proof from the wild that there was data loss?
Does one need proof to argue that a problem that is clearly there and
might cause severe problems should be fixed, even if it were extremely
unlikely to happen?
> Second, my main point, is that I believe that there is confusion
> about what -0, --null stands for.
> The usage output clarifies:
>
> -0, --null items are separated by a null, not
> whitespace;
> disables quote and backslash
> processing and
> logical EOF processing
>
> The crucial word is "separate" which means it is something in between
> 2 entries:
> entry1 <separator> entry2
> It is and was never a "terminator", i.e., something acknowledging
> that the previous
> entry is committed.
> entry1 <terminator>
> Consequently, the logical EOF processing is not neccessary and
> therefore
> disabled, as stated above.
Well the POSIX 2018 edition had no -0 option, and the 2024 edition,
uses the word "delimit", not "separate".
Though I'd also argue that "delimit" is more like "separate" and not
like "terminate".
However, as written in my previous mail, the current POSIX 2024 also
strongly recommends to "ignore" any lines a that are not NUL terminated
("xargs should ignore the trailing non-null bytes (as this can signal
incomplete data)") and says that in the future this may become a MUST.
And the Austin Group issue I've mentioned in the previous mail already
makes clear, that the technical corrigendum 1 for POSIX 2024 will
change the "ignore" to a "error in case of".
The xargs manpage (and info page) even says - contrary to the program
usage:
> -0, --null
> Input items are terminated by a null character instead of by
> whitespace, and the quotes and backslash are not special
> (every character is taken literally).
*terminated*, not *separated*
And for text files (i.e. without -0) it would have in principle always
been clear, that there must be a final \n .
> Third, a change like this one seems a tough one, because tons of
> scripts and users
> rely on existing behavior.
Indeed.
But at least I wouldn't want to explain to someone who lost all his
data, that this happened "by design".
One could also argue that from the contradicting usage / manpage
documentation no one could have ever really relied on the current
behaviour and that it was simply a bug.
> Finally, xargs(1) is not alone: there are several tools in the same
> boat which
> have an option to treat input separated/terminated by '\0', and which
> usually
> accept regular newline or whitespace-separated input.
> The latter usually mandates to have a terminating newline at least,
> because POSIX
> says that text files have to end on a newline; otherwise they'd be be
> treated
> as binary files.
> How about those?
Which tools are you thinking about?
When I think e.g. about grep, than of course, if the input is
incomplete, grep's output could be wrong, too, and that in turn could
lead again to very bad consequences.
But the difference to xargs is, that grep itself does nothing and
whoever called the foo|grep pipe, could still examine whether foo
succeeded (even before -o pipefail become part of POSIX 2024, this was
in principle already previously possible (in a portable way) with a
hacky construct of redirects).
For xargs, checking the exit status of foo afterwards, would already be
too late.
> After all, at least #3 (known behavior) can strike back quite hard.
> Therefore I suggest thinking well through all the possible cases, and
> their
> pros and cons.
Definitely. Which is why this should be made a bug to track the various
opinions.
Perhaps one could also announce that this is being considered in the
next release of findutils, and ask for input from the community.
Another idea would be to leave the behaviour undefined at the POSIX
level, and (also there) introduce yet another option, which enforces
that a non-(NUL/LF)-terminated "line" is ignored.
That would have the benefit that every implementation could stay
backwards-compatible, but still allow people to go the safe way. The
only downside of course being, that one doesn't get the safe way out of
the box.
Cheers,
Chris.