help-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Help-bash] multiline random paste or how to properly use (named) stream


From: Garreau\, Alexandre
Subject: [Help-bash] multiline random paste or how to properly use (named) streams
Date: Tue, 13 Mar 2018 16:49:47 +0100
User-agent: Gnus (5.13), GNU Emacs 25.1.1 (x86_64-pc-linux-gnu)

Hi,

Recently a friend of mine asked me if I would have a right away solution
to interleaves line of a file foo and a file bar (m3u files actually),
in a way that each (at random) 1 to 5 lines of “foo” would be surrounded
by (at random) 2 to 4 (preferably but not necessarily) random unique
(not appearing several times in total) lines of “bar” (the variable
number of interleaved lines is what makes this problem beyond the scope
of paste from coreutils).

As the problem interested me, I generalized it into “interleaving
arbitrary random interval of lines of an arbitrary number of files”,
with the option of filtering some input with sort -R at the end. I could
have put all the contents in a variable but I thought it wouldn’t end
simple and readable, I didn’t want to open any file several times. Hence
I started learning named streams (also process substitution but I didn’t
found this useful), since, even thought I was going to work on an
arbitrary number of (thus numbered but not named) files, I saw in bash
manual that behavior with fd superior to 9 was unspecified, and I wanted
something robust yet able to operate on more than 9 files.

The interface I came with is (with later option to specify randomness):
interleave [<file> <interval-begin>-<interval-end>] ...

I ended with this code whose the unreadability introduced by the correct
opening and naming of each stream doesn’t satisfy me as a proof of
having learning and mastered well enough named streams (I don’t like
“parse all arguments then only then do actual stuff”). I’m especially
dissatisfied with the complexity of usage of my array (that I use to
number the streams (which are in arbitrary number)) as a consequence of
the arguments of my function being interleaved (I also thought of maybe
complexifying my interface to something like “[<file>:<interval>]...” in
order to unify and simplify retrieval of correct arguments):

#!/bin/bash
interval()
{
    IFS=\ - set -- $@
    declare -i begin="${1}" end="${2:-${1}}" inter=end-begin
    [[ inter -lt 0 ]] && inter=-inter && begin=end
    echo "$((begin + RANDOM%(inter+1)))"
}

interleave(){
    declare -a fd
    declare -i i=1 long=0
    while [[ i -le $# ]] ; do
        exec {fd[i/2]}<${!i}
        i+=2
    done
    i=2
    long=$(interval ${!i})
    while IFS='' read -r <&${fd[(i-1)/2]} ; do
        echo "${REPLY}"
        [[ --long -eq 0 ]] && i=1+(i+1)%$# && long=$(interval ${!i})
    done
}
For instance: interleave foo 1-5 bar 2-4

I’m working on maybe adding more intermediary variables, using an array
for intervals, changing interface, using associative arrays (so that to
take not-zero-starting uncontiguuous indexes without feeling to waste
space, yet seems dirtier to me than simple indexed array), or any idea
you could suggest better in terms of efficience or readability.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]