bug-findutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: xargs: doesn't fail on missing terminator (which may have severe con


From: Bernhard Voelker
Subject: Re: xargs: doesn't fail on missing terminator (which may have severe consequences)
Date: Sun, 15 Dec 2024 20:47:12 +0100
User-agent: Mozilla Thunderbird

Hi Chris,

On 12/14/24 19:55, Christoph Anton Mitterer wrote:
On Fri, 2024-12-13 at 23:05 +0100, Bernhard Voelker wrote:
I never saw a practical example why it would be dangerous.

Well it seems to me, that in that case even a 1 in Million chance might
have too catastrophic consequences to wait for it happening in the
wild.
Again, consider the  find ... | xargs rm -rf  example, in which a
"line" is truncated to an incomplete "/".

I'm interested in a real-life example, because a theoretical danger is
... theoretical.

WRT `rm -rf /` - this has become a famous one; I even have a t-shirt with it.
Clearly POSIX states that rm(1) shall not recursively delete "/", and the
GNU coreutils implementation - and probably all others as well - adheres to
that, and returns with an error diagnostic.

Obviously the situation dramatically changes with `rm -rf /opt` instead of
"/opticalmeasurementsresults", that is clear!

Usually a data producer is buffered, and therefore atomically
outputting entries in a consistent way.

It may were well be not buffered, for example when someone uses stdbuf
with mode 0 to executed some utility which internally calls xargs.

Ideally, - if the input is file names at all(!) - the producer should
atomically write(2) them into the pipe instead of assembling a file name
in several pieces.

Is there proof from the wild that there was data loss?

Does one need proof to argue that a problem that is clearly there and
might cause severe problems should be fixed, even if it were extremely
unlikely to happen?

yes, please.
Because the change would cause a change in behavior which users rely on,
and which may also result in data loss.  E.g. having a backup script
passing the entries to `xargs -0`.

  printf "data-A\0data-B" | xargs -0 mybackup --to=/some/where

With the requested change, "world" would not go into the backup anymore
(well, maybe with an error diagnostic no one ever will see, because the
script worked many years).  I'd call that data loss as well.

It's easy to find good or bad examples for either behavior.

Therefore, I see 4 things to consider (no order):

a) Existing behavior:

The current behavior is good in the sense that it doesn't matter if there's
a final '\0' or not.  This is "nice" for the user as it follows the principle
of least surprise.

Changing a behavior which exists "since the beginning", i.e., since the oldest
GNU findutils commit in 1996-02-04, will for sure surprise users.

There are several changes in tools here and there which many users later
object to.  And discussing with upset users is not easy; one needs very
good and reasonable examples ... upfront.
E.g. the change in ls(1) to use proper shell-quoting: there the avoided security
risk and convenience when copy/paste-ing ls(1) output outweigh the optical
inconvenience by far.  Still that was quite a long way.

b) Compatibility with other xargs(1) implementations.

If other xargs(1) implementations would diverge from GNU xargs and already
follow the requested behavior, then compatibility to that implementation
might add as a good argument for a change.
How do other implementations do?

c) Specification by POSIX.

GNU tries to follow POSIX as much as possible, unless there is a great
benefit for the user, or security considerations.

The current Base Specification Issue 8 documents and allows the behavior.

d) Common behavior in other tools.

If the same pattern is available for the user in N tools, then xargs(1)
should not behave different (without a good reason).

See below.

However, as written in my previous mail, the current POSIX 2024 also
strongly recommends to "ignore" any lines a that are not NUL terminated
("xargs should ignore the trailing non-null bytes (as this can signal
incomplete data)") and says that in the future this may become a MUST.
And the Austin Group issue I've mentioned in the previous mail already
makes clear, that the technical corrigendum 1 for POSIX 2024 will
change the "ignore" to a "error in case of".

An error diagnostic and therefore also a non-Zero exit code would be
necessary, indeed, because silently ignoring would simply be wrong.

The xargs manpage (and info page) even says - contrary to the program
usage:
-0, --null
        Input items are terminated by a null character instead of by
        whitespace, and the quotes and backslash are not special
        (every character is taken literally).

*terminated*, not *separated*

right, that's a documentation bug.

Finally, xargs(1) is not alone: there are several tools in the same
boat which
have an option to treat input separated/terminated by '\0', and which
usually
accept regular newline or whitespace-separated input.
The latter usually mandates to have a terminating newline at least,
because POSIX
says that text files have to end on a newline; otherwise they'd be be
treated
as binary files.
How about those?

Which tools are you thinking about?

`xargs -0` is similar to many tools allowing Zero-delimited input.
And often the last entry of input does not have to be Zero-terminated.
Here some examples from the GNU coreutils.

* `sort -z`:

Handles missing terminating '\0' as graceful as xargs,
but even adds a '\0' on output:

  $ printf 'hello\0world\0' | sort -z | od -tx1z
  0000000 68 65 6c 6c 6f 00 77 6f 72 6c 64 00              >hello.world.<
  0000014

  $ printf 'hello\0world' | sort -z | od -tx1z
  0000000 68 65 6c 6c 6f 00 77 6f 72 6c 64 00              >hello.world.<
  0000014

* `uniq -z`:

Like sort(1) above:

  $ printf 'hello\0world' | uniq -z | od -tx1z
  0000000 68 65 6c 6c 6f 00 77 6f 72 6c 64 00              >hello.world.<
  0000014

  $ printf 'hello\0world\0' | uniq -z | od -tx1z
  0000000 68 65 6c 6c 6f 00 77 6f 72 6c 64 00              >hello.world.<
  0000014

  $ printf 'hello\0hello\0' | uniq -z | od -tx1z
  0000000 68 65 6c 6c 6f 00                                >hello.<
  0000006

  $ printf 'hello\0hello' | uniq -z | od -tx1z
  0000000 68 65 6c 6c 6f 00                                >hello.<
  0000006

* `cut -z`

Like sort(1) above:

  $ printf 'hello X\0world Y\0' | cut -zd' ' -f1 | od -tx1z
  0000000 68 65 6c 6c 6f 00 77 6f 72 6c 64 00              >hello.world.<
  0000014

  $ printf 'hello X\0world Y' | cut -zd' ' -f1 | od -tx1z
  0000000 68 65 6c 6c 6f 00 77 6f 72 6c 64 00              >hello.world.<
  0000014

* `tail -z`:

Handles missing terminating '\0' as graceful as xargs,
but does not add a final '\0' to the output like sort(1):

  $ printf 'hello\0world\0' | tail -z -n 1 | od -tx1z
  0000000 77 6f 72 6c 64 00                                >world.<
  0000006

  $ printf 'hello\0world' | tail -z -n 1 | od -tx1z
  0000000 77 6f 72 6c 64                                   >world<
  0000005

The above shows that the behavior in tools is diverse.
I don't know any tool from the top off my head which would
reject non-Zero terminated input.

If this is something to be addressed for security reasons wrt/ a
pitfall with pipes, then IMO this needs to be done on POSIX level
... for all tools.

I don't categorically object to the change, all I would like to say
at this point it that we need all information for a) - d) on the table,
and that there's more than xargs(1) which might have to be considered.

Summary of pros - i.e., to change current behavior:
- a vague "might change" in POSIX's "future" section,
- a hand-crafted reproducer for an effect which has not been seen
  in the wild since at least 28 years.

Summary of cons - i.e., to keep current behavior:
a) user convenience, compatibility of existing scripts,
c) it is currently-conforming to POSIX Issue 8, and
d) there is greater precedence of gracefully processing
   non-Zero-terminated input in several other tools.

Neutral:
b) no information about the behavior of other xargs(1) implementations


So currently I'm 10:90 for changing this.
Please convince me further.

Have a nice day,
Berny




reply via email to

[Prev in Thread] Current Thread [Next in Thread]