parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

--cat and --fifo


From: David
Subject: --cat and --fifo
Date: Wed, 2 Apr 2014 14:07:06 -0400 (EDT)

Ole,

It'd be great if parallel either used bash or emulated it in supporting <(...) and >(...) on all platforms.  Emulation might allow parallel to clean up named pipes created at end of run, which bash is shy about.  Start thinking of parallel as a shell of sorts.  I am guessing bash and ksh93 features can be dynamically linked into anything.  One thing that ksh did, but bash didn't, is make <(...) and >(...) a 'word', so it has implicit white pace around it; you cannot concatenate it into a sed script string after a 'w' command without passing it to a subroutine or the like to strip them.

$ echo X | sed 'w '>(sed 's/^/b/'>&2)'
s/^/a/' ; sleep 1
aX
bX
$

With <(...) and >(...) shell scripting, you can create a complex tree of pipeline-parallel processing with no temp files and minimal latency.  As the core counts blossom, we need such strategies to turn this resource into fast, low latency processing.

On nice UNIX's, on can say the input file is '/dev/stdin', so you can present the standard input as a file name without an extra 'cat'.  I recall fixing a 32 bit app running under sh reading a file that grew > 4Gb by letting the shell (large file ready) open it with < and telling the app to read /dev/stdin.  I am not sure if parallel could emulate this somehow for the other O/S.  Ditto for stdout and stderr.  (This begs the question of managing pricise time annotated stderr and stdout logs that keep each parallel run separate, or log lines together.  Alas, too many apps think stdout is good for logging, while other treat it as a data stream and keep logging on stderr.  Occasionally I use stderr for data, on O/S and shells without better ways to have a second output stream.)

Of course, some apps do seeks, so you need to make a temp file to satisfy such apps.  The temp file could be auto-delete if already opened by shell or parallel, deleted and passed as /dev/fd/#.  I guess if you have no /dev/fd/ or the like on your OS, you need a more complex temp file deletion strategy.  Not all /tmp are cleaned periodically or by reboot.  Can someone write a buffered/recording pipe that accepts seeks (data on 64 bit heap or tmpfile())?

Best,

David

-----Original Message-----
From: parallel-request <parallel-request@gnu.org>
To: parallel <parallel@gnu.org>
Sent: Sun, Mar 23, 2014 12:00 pm
Subject: Parallel Digest, Vol 47, Issue 8

Send Parallel mailing list submissions to
	parallel@gnu.org

To subscribe or unsubscribe via the World Wide Web, visit
	https://lists.gnu.org/mailman/listinfo/parallel
or, via email, send a message with subject or body 'help' to
	parallel-request@gnu.org

You can reach the person managing the list at
	parallel-owner@gnu.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Parallel digest..."


Today's Topics:

   1. --cat and --fifo (Ole Tange)
   2. Re: Recommendations for getting Parallel-like ::: behavior
      using Bash (Rhys Ulerich)


----------------------------------------------------------------------

Message: 1
Date: Sun, 23 Mar 2014 01:41:33 +0100
From: Ole Tange <tange@gnu.org>
To: "parallel@gnu.org" <parallel@gnu.org>
Subject: --cat and --fifo
Message-ID:
	<CA+4vN7wve3XrsfdbDaPGwpuJU-D+0q6vqHAst1R6LQ=YJR097w@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

Sometimes I meet commands that cannot read from stdin, but only from a
file or a fifo. You may be lucky that you can do:

    parallel --pipe 'command <(cat)'

But you may have to wrap that kind of commands  to make them work with
'parallel --pipe':

    parallel --pipe 'cat > {#}; command {#}; _EXIT=$?; rm {#}; exit $_EXIT'
    parallel --pipe 'mkfifo {#}; (command {#}) & _PID=$!; cat > {#};
wait $_PID;  _EXIT=$?; rm {#}; exit $_EXIT'

Not really elegant and if the file {#} already exists, it will be over
written. So I have implemented --cat and --fifo:

    parallel --pipe --cat 'command {}'
    parallel --pipe --fifo 'command {}'

The do the same as above except the filename is a tmpfile, so the
chance for overwriting 0 if run locally, and close to 0 if run
remotely.

--cat and --fifo do not make sense without --pipe, and I am thinking
that I could probably autodetect if the command contains {} then it
means '--pipe --cat'. But that might be surprising to the user, that
including {} in the command will run slower (as the cat will first
save stdin to a tmpfile).

--cat and --fifo could also just imply --pipe.

What do you think? How would you like them to work? Do you have more
describing names?

Test --cat and --fifo by:

    git clone git://git.savannah.gnu.org/parallel.git


/Ole



------------------------------

Message: 2
Date: Sat, 22 Mar 2014 23:59:10 -0500
From: Rhys Ulerich <rhys.ulerich@gmail.com>
To: GNU Parallel <parallel@gnu.org>
Subject: Re: Recommendations for getting Parallel-like ::: behavior
	using Bash
Message-ID:
	<CAKDqugTBdCFeU0e+kQ=RxQ=3UuY_d2k6nvx9W-4bB5H-1_dQVA@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

> I like how GNU Parallel does its ::: magic ...
>
> Has anyone implemented something similar in pure bash?

My quick version of :::-like processing for bash looks like the following:

declare -a cmd
while [ $# -gt 0 ]; do
    [ "$1" = ":::" ] && break
    cmd+=("$1")
    shift
done
if [ "$1" = ":::" ]; then
    while shift && [ $# -gt 0 ]; do
        echo "${cmd[@]}" "$1"
    done
else
    while read line; do
        echo "${cmd[@]}" "$line"
    done
fi

This breaks on multiple ::: and totally ignores ::::.

An "...Only experts do this on purpose...." homage might go in that
final else clause.

- Rhys



------------------------------

_______________________________________________
Parallel mailing list
Parallel@gnu.org
https://lists.gnu.org/mailman/listinfo/parallel


End of Parallel Digest, Vol 47, Issue 8
***************************************

reply via email to

[Prev in Thread] Current Thread [Next in Thread]