Re: feature suggestion: --preserve-blocking-factor

parallel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: feature suggestion: --preserve-blocking-factor

From:	Ole Tange
Subject:	Re: feature suggestion: --preserve-blocking-factor
Date:	Fri, 17 Feb 2017 22:00:53 +0100

On Thu, Feb 16, 2017 at 7:16 PM, Cook, Malcolm <MEC@stowers.org> wrote:

> When using the --spreadstdin option, it may be desirable to ensure that the 
> blocks "keep together" certain blocks of data.

Yes. We use --recend --recstart for that.

> For example the input may be sorted on column 3, and it may be the case that 
> all lines having the same value for column 3 must be processed together.

So the record depends on column 3 having the same value.

Parsing a CSV-file is expensive if it has to do it correctly (E.g.
values with tabs, quotes, and newlines). I do not see that becoming
part of GNU Parallel.

So how do you deal with the column issue?

Let us use this as an example:

  paste <(seq 105) <(parallel yes {}'|head -n {#}' ::: {a..n}) <(seq
105 | shuf) > example

We want to group this by column 2, so all consecutive lines with the
same column 2 will be treated as a single record and not be split.
However, it will be OK to join multiple records.

We will make a small program to insert a record separator. This has to
be a string not found in the file. Here I have chosen '\0' but it
could be "p-O-P-p-y i'M poPpY", $(mmencode /dev/urandom|head), or
$(mktemp).

  cat example | perl -ape '$F[1] ne $old and print "\0"; $old = $F[1]'

Now it is suddenly trivially simple to tell GNU Parallel to group the
records together and remove the record separator:

  parallel --recend '\0' --rrs --pipe --block 200 wc

We might need something for --pipepart, so you can feed in potential
split positions, but you would still have to write the program that
finds the positions yourself.

/Ole

[Prev in Thread]

Current Thread

[Next in Thread]

feature suggestion: --preserve-blocking-factor, Cook, Malcolm, 2017/02/16
- Re: feature suggestion: --preserve-blocking-factor, Ole Tange <=
  - RE: feature suggestion: --preserve-blocking-factor, Cook, Malcolm, 2017/02/17
    - Re: feature suggestion: --preserve-blocking-factor, Ole Tange, 2017/02/18

Prev by Date: feature suggestion: --preserve-blocking-factor
Next by Date: Re: Unexpected positional replacement strings substitution
Previous by thread: feature suggestion: --preserve-blocking-factor
Next by thread: RE: feature suggestion: --preserve-blocking-factor
Index(es):
- Date
- Thread