bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: readarray leaves a NULL char embedded in each element


From: Greg Wooledge
Subject: Re: readarray leaves a NULL char embedded in each element
Date: Mon, 24 Jun 2024 14:06:39 -0400

On Mon, Jun 24, 2024 at 10:50:15 -0600, Rob Gardner wrote:
> Description:
>         When using space or newline as a delimiter with readarray -d,
> 
>         elements in the array have the delimiter replaced with NULL,
> 
>         which is left embedded in each element of the array.

This isn't possible.  Bash doesn't allow the storing of NUL bytes in
variables, and further, Unix/Linux doesn't permit passing NUL bytes as
command-line arguments to programs.

> This
>         causes incorrect behavior when using array elements as arguments to
> sub-processes.

(Bash cannot pass a NUL byte as an argument.)

>         I first noticed the problem when trying to use an array element as
> part of an
>         argument to sed:
>                 readarray -d ' ' x << "A B"
>                 sed -e s/X/${x[0]}/

First point, your readarray command is using the wrong redirection
operator.  I'm fairly sure you meant to write <<< instead of <<.  Using
the here-string operator <<<, we can see that the first array element
retains the space delimiter (because -t was not used), and the second
retains the newline character, which is added by <<<.

hobbit:~$ readarray -d ' ' x <<< "A B"
hobbit:~$ declare -p x
declare -a x=([0]="A " [1]=$'B\n')

Second point, your sed command is not using quotes.

>         This caused sed to complain "unterminated `s' command".

The space at the end of x[0] causes word splitting to occur, due to the
lack of quotes. The s/X/A part becomes one argument, and the / part
becomes a second argument.

>         Using "read -a" instead of readarray produces correct results.

That one uses IFS to separate and trim the input fields.  The default
IFS contains a space, so none of the array elements contains a space.
Therefore, your lack of quoting probably doesn't cause any additional
word splitting.

>         With a simple C program to print out the characters in argv[1], one
>         can see that a NULL character is left in the argument. Program:
> #include <stdio.h>
> #include <string.h>
> void main(int argc, char *argv[])
> {
>         int i, n;
>         if (argc > 1) {
>                 n = strlen(argv[1]);
>                 for (i=0; i<n+2; i++) printf("%d ", argv[1][i]);
>         }
> }

I'm not at all clear on what this C program is doing.  You're putting a
single character/byte on the stack for printf to process using the %d
operator, which... expects an integer?  And therefore reads more than
one byte from the stack?

Sorry, it's been ages since I did C.

> $ readarray -d ' ' X <<< "A B C"
> $ read -d ' ' -a   Y <<< "A B C"
> $ readarray -td ' ' Z <<< "A B C"
> $ ./printarg ${X[0]}A
> 65 0 65 $

In this command, ${X[0]} is a capital A plus a space character.  You're
not using quotes, so ${X[0]}A becomes the two argument words "A" and "A".

hobbit:~$ readarray -d ' ' X <<< "A B C"
hobbit:~$ declare -p X
declare -a X=([0]="A " [1]="B " [2]=$'C\n')
hobbit:~$ printf '<%s> ' ${X[0]}A ; echo
<A> <A> 

Your C program appears to look only at the first argument word, "A",
and ignores the second word.  It takes strlen("A"), which is 1, and
adds 2 to it, getting 3.  Thus, it loops 3 times, and thus, we see
the three numbers it writes to stdout.

The argument words are stored internally as NUL-terminated strings, so
it's no surprise that the second loop iteration prints a 0.  The
third loop iteration is printing random garbage from beyond the end
of the argument string, unless I'm misreading the situation.

> $ ./printarg ${Y[0]}A
> 65 65 0 83 $

Here, Y[0] contains "A", so you're passing "AA" as your sole argument.
The argument's string length is 2, so you're looping 4 times.  The
numbers 65 65 0 are from the internal storage of the argument words, and
the 83 is garbage from beyond the end of the string.

> $ ./printarg ${Z[0]}A
> 65 65 0 83 $

Here, Z[0] is "A" instead of "A ", because you used -t to trim the space.
So you're passing "AA" as your argument, just like the previous call.

So, in a nutshell, this is what I believe you need to see:

 1) readarray without -t retains the delimiter, even if it's a space
    or newline.  It does not convert the delimiter to a NUL byte.

 2) Unquoted ${X[0]} when X[0] ends with a space causes word splitting
    to occur, so anything after the ${X[0]} will become a new word
    (assuming IFS hasn't been modified).

 3) Arguments passed to a program via the Unix kernel are NUL-terminated
    strings.  Therefore, the NUL byte can't be part of the argument
    itself.  It's a signpost that the argument string has ended.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]