[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Using -C with --pipe is this possible?
From: |
Ole Tange |
Subject: |
Re: Using -C with --pipe is this possible? |
Date: |
Wed, 2 May 2012 16:23:57 +0200 |
On Tue, May 1, 2012 at 9:42 AM, Matt Oates (Home) <mattoates@gmail.com> wrote:
> On 30 April 2012 21:51, Ole Tange <tange@gnu.org> wrote:
>> On Thu, Apr 26, 2012 at 12:20 AM, Matt Oates (Home) <mattoates@gmail.com>
>> wrote:
:
>>> I then want to run something of the form:
>>>
>>> parallel -C '\t' -N 1 --pipe "myprogram /dev/stdin | cat <(echo {1})
>>> -" < file.tab | output-processing-program > results.tab
>>
>> Will this work?
>>
>> cat file.tab | parallel -C '\t' 'echo {1}; echo {2} | myprogram
>> /dev/stdin' | output-processing-program > results.tab
>>
>> Or maybe --tag is even better for your purpose?
>>
>> cat file.tab | parallel --tag -C '\t' 'echo {2} | myprogram
>> /dev/stdin' | output-processing-program > results.tab
>
> Neither of these will work since they are changing the input that's
> going into "myprogram" rather than the output.
Ahh, so you want both {1} and {2} put into myprogram, and you want to
reuse {1}. If I understood it wrongly again please show me the
commands you want run given the input:
21501699 MSAFFPVISSLNPAVPSVAAP
21501700 MIGGILSCGITHTGITPLDVV
21501701 MVIAIAKYFGWPLDQLDVVTA
21501702 MKWHPDKNKNNLVEAQYRFQE
If I understand you correctly, you want:
echo 21501699
printf "21501699\tMSAFFPVISSLNPAVPSVAAP" | myprogram /dev/stdin
In that case would this work?
cat foo.tab | parallel -C '\t' 'echo {1}; printf "{1}\t{2}\n" |
myprogram /dev/stdin'
In theory this should work, too:
cat /tmp/a | parallel -C '\t' 'echo {1}; echo {} | myprogram /dev/stdin'
But it does not (\t is exchanged with a space. This is unfortunately
hard to fix without breaking a lot of other stuff - especially because
colsep can be a regexp - lines 4274 and 4591).
> The best feature I can imagine for what I want to do is a flag of the form:
>
> parallel --pipe --cut=1,3 -C ',' -N 1 'program -x {2} {1} /dev/stdin'
>
> So if the input was:
> 1,Hello world,11
> 2,Yay,3
>
> Then {1} and {2} would be:
> {1}=1 {2}=11
> {1}=2 {2}=3
>
> The input streams reaching 'program' would look like:
> Hello world
> EOF
> Yay
> EOF
>
> And the command lines per job would be:
> program -x 11 1 /dev/stdin
> program -x 3 2 /dev/stdin
But this you can accomplish by:
parallel -C , 'echo "{2}" | program -x {3} {1} /dev/stdin'
> --cut=1,3 tells parallel to cut these fields out of the input stream
> before they get sent anywhere or chunked up, and that the cut parts
> populate the {1}{2} parameter place holders. If --cut isn't specified
> but -C is then assume that we shouldn't cut anything out, but all
> fields should populate the parameter place holders.
> The problem comes when a single line is not used, and a larger chunk
> of data is being piped, in this instance you could just ignore the
> populating of the parameters when -N is greater than 1 or a
> --rec-start/end is specified. Still being able to cut the piped input
> without using the cut program isn't an awful feature for other people
> as a default behaviour to minimise surprise on what it does.
One of my major concerns is that --pipe is too slow as it is (I cannot
get 1 GByte/s through it, and I can easily do that with 'cat'). So I
need really compelling reasons to add more to the inner loop of
--pipe, thus slowing it down.
And it seems you could easily make a wrapper around myprogram, that
took the first two arguments off stdin and passed the rest of stdin to
myprogram:
#!/bin/bash
read VAR1
read VAR2
echo Starting $VAR1 $VAR2
# (cat included for emphasis - it is not needed)
cat - | myprogram $VAR1 $VAR2
echo Ended $VAR1 $VAR2
/Ole
--
Did you get your GNU Parallel merchandise?
https://www.gnu.org/software/parallel/merchandise.html