parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Using -C with --pipe is this possible?


From: Matt Oates (Home)
Subject: Re: Using -C with --pipe is this possible?
Date: Tue, 1 May 2012 08:42:23 +0100

On 30 April 2012 21:51, Ole Tange <tange@gnu.org> wrote:
> On Thu, Apr 26, 2012 at 12:20 AM, Matt Oates (Home) <mattoates@gmail.com> 
> wrote:
> Good to see protein people using GNU Parallel.

I think there are quite a few of us :)

>> I then want to run something of the form:
>>
>> parallel -C '\t' -N 1 --pipe "myprogram /dev/stdin | cat <(echo {1})
>> -" < file.tab | output-processing-program > results.tab
>
> Will this work?
>
> cat file.tab | parallel -C '\t' 'echo {1}; echo {2} | myprogram
> /dev/stdin' | output-processing-program > results.tab
>
> Or maybe --tag is even better for your purpose?
>
> cat file.tab | parallel --tag -C '\t' 'echo {2} | myprogram
> /dev/stdin' | output-processing-program > results.tab

Neither of these will work since they are changing the input that's
going into "myprogram" rather than the output. I need to tag the
output before it goes into the output processing program (which I've
written) but after the input from --pipe has already passed through. I
can't change anything about "myprogram" since its a precompiled
protein disorder predictor. Is there a reason --pipe and -C can't just
be made to work at the same time, if they could my command line would
work and in general more elaborate commands can be made. I'm a heavy
user of --pipe but it's very limited if I can't also use some of the
input to parallel just as parameters to the commands being run. I need
something like a --tag-output or --tag-after flag from GNU parallel so
that the job output associated with a chunk of split input is tagged
with the prefix. At the moment I'm faced with having to do something
other than use GNU parallel, which is a scary concept considering how
much GNU parallel does for me already :)

The best feature I can imagine for what I want to do is a flag of the form:

parallel --pipe --cut=1,3 -C ',' -N 1 'program -x {2} {1} /dev/stdin'

So if the input was:
1,Hello world,11
2,Yay,3

Then {1} and {2} would be:
{1}=1 {2}=11
{1}=2 {2}=3

The input streams reaching 'program' would look like:
Hello world
EOF
Yay
EOF

And the command lines per job would be:
program -x 11 1 /dev/stdin
program -x 3 2 /dev/stdin

--cut=1,3 tells parallel to cut these fields out of the input stream
before they get sent anywhere or chunked up, and that the cut parts
populate the {1}{2} parameter place holders. If --cut isn't specified
but -C is then assume that we shouldn't cut anything out, but all
fields should populate the parameter place holders.
The problem comes when a single line is not used, and a larger chunk
of data is being piped, in this instance you could just ignore the
populating of the parameters when -N is greater than 1 or a
--rec-start/end is specified. Still being able to cut the piped input
without using the cut program isn't an awful feature for other people
as a default behaviour to minimise surprise on what it does.

If you think this is a worthwhile addition to parallel I can work on a
patch for you, as I need this functionality myself ASAP.

Best,
Matt.

---
http://blog.mattoates.co.uk
http://bccs.bris.ac.uk



reply via email to

[Prev in Thread] Current Thread [Next in Thread]