[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: (question) fast split/join of strings
From: |
Greg Wooledge |
Subject: |
Re: (question) fast split/join of strings |
Date: |
Tue, 17 Sep 2024 07:35:15 -0400 |
On Tue, Sep 17, 2024 at 02:56:05 -0400, Lawrence Velázquez wrote:
> This question is more appropriate for help-bash than bug-bash.
>
> On Tue, Sep 17, 2024, at 2:21 AM, William Park wrote:
> > For splitting, I'm aware of
> > old="a,b,c"
> > IFS=, read -a arr <<< "$old"
<https://mywiki.wooledge.org/BashPitfalls#pf47> is relevant here.
The other thing you need to watch out for when using IFS and <<< to do
array splitting is an internal newline character. The read command
will stop reading when it sees a newline. The workaround for that is to
use -d '' to set the line delimiter to the NUL byte. Without a trailing
NUL byte in the input stream, read returns a status of 1 (failure).
This is not a problem... unless you were insane enough to use set -e.
Of course, you can add a NUL byte to the input stream, but that
means you need to replace <<< "$old" with something like
< <(printf '%s\0' "$old") .
Another way to do splitting is to use readarray/mapfile with -d.
Pitfall 47 still applies here, as demonstrated:
hobbit:~$ mapfile -t -d , array < <(printf %s "a,b,c,"); declare -p array
declare -a array=([0]="a" [1]="b" [2]="c")
hobbit:~$ input="a,b,c,"
hobbit:~$ mapfile -t -d , array < <(printf %s, "$input"); declare -p array
declare -a array=([0]="a" [1]="b" [2]="c" [3]="")
A third way to do splitting is to loop over the input string using
parameter expansions.
# Usage: split_str separator input_string
# Stores results in the output array 'split'.
split_str() {
local sep=$1 str=$2
[[ $1 && $2 ]] || return
split=()
while [[ $str = *"$sep"* ]]; do
split+=("${str%%"$sep"*}")
str=${str#*"$sep"}
done
split+=("$str")
}
The biggest advantage of looping like this is that the separator may be
a multi-character string, instead of just one character:
hobbit:~$ split_str ' - ' 'foo - bar - cricket bat - fleur-de-lis - baz'
hobbit:~$ declare -p split
declare -a split=([0]="foo" [1]="bar" [2]="cricket bat" [3]="fleur-de-lis"
[4]="baz")
This is probably the slowest choice, but it's by far the safest (has
the fewest surprise pitfalls) and the most flexible.