parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Using -C with --pipe is this possible?


From: Ole Tange
Subject: Re: Using -C with --pipe is this possible?
Date: Wed, 2 May 2012 16:23:57 +0200

On Tue, May 1, 2012 at 9:42 AM, Matt Oates (Home) <mattoates@gmail.com> wrote:
> On 30 April 2012 21:51, Ole Tange <tange@gnu.org> wrote:
>> On Thu, Apr 26, 2012 at 12:20 AM, Matt Oates (Home) <mattoates@gmail.com> 
>> wrote:
:
>>> I then want to run something of the form:
>>>
>>> parallel -C '\t' -N 1 --pipe "myprogram /dev/stdin | cat <(echo {1})
>>> -" < file.tab | output-processing-program > results.tab
>>
>> Will this work?
>>
>> cat file.tab | parallel -C '\t' 'echo {1}; echo {2} | myprogram
>> /dev/stdin' | output-processing-program > results.tab
>>
>> Or maybe --tag is even better for your purpose?
>>
>> cat file.tab | parallel --tag -C '\t' 'echo {2} | myprogram
>> /dev/stdin' | output-processing-program > results.tab
>
> Neither of these will work since they are changing the input that's
> going into "myprogram" rather than the output.

Ahh, so you want both {1} and {2} put into myprogram, and you want to
reuse {1}. If I understood it wrongly again please show me the
commands you want run given the input:

21501699        MSAFFPVISSLNPAVPSVAAP
21501700        MIGGILSCGITHTGITPLDVV
21501701        MVIAIAKYFGWPLDQLDVVTA
21501702        MKWHPDKNKNNLVEAQYRFQE

If I understand you correctly, you want:

  echo 21501699
  printf "21501699\tMSAFFPVISSLNPAVPSVAAP" | myprogram /dev/stdin

In that case would this work?

  cat foo.tab | parallel -C '\t' 'echo {1}; printf "{1}\t{2}\n" |
myprogram /dev/stdin'

In theory this should work, too:

  cat /tmp/a | parallel -C '\t' 'echo {1}; echo {} | myprogram /dev/stdin'

But it does not (\t is exchanged with a space. This is unfortunately
hard to fix without breaking a lot of other stuff - especially because
colsep can be a regexp - lines 4274 and 4591).

> The best feature I can imagine for what I want to do is a flag of the form:
>
> parallel --pipe --cut=1,3 -C ',' -N 1 'program -x {2} {1} /dev/stdin'
>
> So if the input was:
> 1,Hello world,11
> 2,Yay,3
>
> Then {1} and {2} would be:
> {1}=1 {2}=11
> {1}=2 {2}=3
>
> The input streams reaching 'program' would look like:
> Hello world
> EOF
> Yay
> EOF
>
> And the command lines per job would be:
> program -x 11 1 /dev/stdin
> program -x 3 2 /dev/stdin

But this you can accomplish by:

parallel -C , 'echo "{2}" | program -x {3} {1} /dev/stdin'

> --cut=1,3 tells parallel to cut these fields out of the input stream
> before they get sent anywhere or chunked up, and that the cut parts
> populate the {1}{2} parameter place holders. If --cut isn't specified
> but -C is then assume that we shouldn't cut anything out, but all
> fields should populate the parameter place holders.
> The problem comes when a single line is not used, and a larger chunk
> of data is being piped, in this instance you could just ignore the
> populating of the parameters when -N is greater than 1 or a
> --rec-start/end is specified. Still being able to cut the piped input
> without using the cut program isn't an awful feature for other people
> as a default behaviour to minimise surprise on what it does.

One of my major concerns is that --pipe is too slow as it is (I cannot
get 1 GByte/s through it, and I can easily do that with 'cat'). So I
need really compelling reasons to add more to the inner loop of
--pipe, thus slowing it down.

And it seems you could easily make a wrapper around myprogram, that
took the first two arguments off stdin and passed the rest of stdin to
myprogram:

#!/bin/bash

read VAR1
read VAR2

echo Starting $VAR1 $VAR2
# (cat included for emphasis - it is not needed)
cat - | myprogram $VAR1 $VAR2
echo Ended $VAR1 $VAR2


/Ole
-- 
Did you get your GNU Parallel merchandise?
https://www.gnu.org/software/parallel/merchandise.html



reply via email to

[Prev in Thread] Current Thread [Next in Thread]