bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: (question) fast split/join of strings


From: Greg Wooledge
Subject: Re: (question) fast split/join of strings
Date: Tue, 17 Sep 2024 07:35:15 -0400

On Tue, Sep 17, 2024 at 02:56:05 -0400, Lawrence Velázquez wrote:
> This question is more appropriate for help-bash than bug-bash.
> 
> On Tue, Sep 17, 2024, at 2:21 AM, William Park wrote:
> > For splitting, I'm aware of
> >      old="a,b,c"
> >      IFS=, read -a arr <<< "$old"

<https://mywiki.wooledge.org/BashPitfalls#pf47> is relevant here.

The other thing you need to watch out for when using IFS and <<< to do
array splitting is an internal newline character.  The read command
will stop reading when it sees a newline.  The workaround for that is to
use -d '' to set the line delimiter to the NUL byte.  Without a trailing
NUL byte in the input stream, read returns a status of 1 (failure).
This is not a problem... unless you were insane enough to use set -e.

Of course, you can add a NUL byte to the input stream, but that
means you need to replace <<< "$old" with something like
< <(printf '%s\0' "$old") .

Another way to do splitting is to use readarray/mapfile with -d.
Pitfall 47 still applies here, as demonstrated:

    hobbit:~$ mapfile -t -d , array < <(printf %s "a,b,c,"); declare -p array
    declare -a array=([0]="a" [1]="b" [2]="c")

    hobbit:~$ input="a,b,c,"
    hobbit:~$ mapfile -t -d , array < <(printf %s, "$input"); declare -p array
    declare -a array=([0]="a" [1]="b" [2]="c" [3]="")

A third way to do splitting is to loop over the input string using
parameter expansions.

    # Usage: split_str separator input_string
    # Stores results in the output array 'split'.
    split_str() {
        local sep=$1 str=$2
        [[ $1 && $2 ]] || return
        split=()
        while [[ $str = *"$sep"* ]]; do
            split+=("${str%%"$sep"*}")
            str=${str#*"$sep"}
        done
        split+=("$str")
    }

The biggest advantage of looping like this is that the separator may be
a multi-character string, instead of just one character:

    hobbit:~$ split_str ' - ' 'foo - bar - cricket bat - fleur-de-lis - baz'
    hobbit:~$ declare -p split
    declare -a split=([0]="foo" [1]="bar" [2]="cricket bat" [3]="fleur-de-lis" 
[4]="baz")

This is probably the slowest choice, but it's by far the safest (has
the fewest surprise pitfalls) and the most flexible.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]