[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: parallel cat
From: |
Ole Tange |
Subject: |
Re: parallel cat |
Date: |
Sun, 17 Jul 2011 16:08:25 +0200 |
On Fri, Jul 15, 2011 at 8:39 PM, Dan Kokron <daniel.kokron@nasa.gov> wrote:
> I have a bunch (~200) small (1K to 100K) binary files that I want to
> 'cat' into a larger file. I usually use "cat pe* > diag", but this
> takes considerable time on the Lustre file system we are using. I am
> exploring using GNU parallel for this task but have run into some
> difficulties. Basically the resulting diag file only contains one of
> the input files.
>
> I've tried the following variations.
>
> parallel "cat {} >diag_amsua_n18_03.2011041700" ::: pe*
> parallel cat {} ">"diag_amsua_n18_03.2011041700 ::: pe*
> ls pe* | parallel cat {} ">"diag_amsua_n18_03.2011041700
> ls pe* | parallel -j4 -k cat {} ">"diag_amsua_n18_03.2011041700
> ls pe* | parallel -k cat {} ">"diag_amsua_n18_03.2011041700
> parallel -j4 -k "cat {} >diag_amsua_n18_03.2011041700" ::: pe*
You are _so_ close.
parallel cat >diag_all ::: pe*
It is probably more readable for UNIX users to write this (It does
exactly the same):
parallel cat ::: pe* >diag_all
Or if you prefer the order kept:
parallel -k cat ::: pe* >diag_all
I have no experience with Lustre, but I would imagine that Lustre is
slow at getting the first byte and after that it is pretty fast. Also
the reason why it is slow is because it is waiting. If that is the
case then it will be OK to run a lot of cats simultaneously:
parallel -j0 cat ::: pe* >diag_all
These sections of the man page touches the subject of using the output
from GNU Parallel:
EXAMPLE: Rewriting a for-loop and a while-read-loop
EXAMPLE: Rewriting nested for-loops
EXAMPLE: Keep order of output same as order of input
EXAMPLE: Processing a big file using more cores
If you believe it can be explained better please post your suggestion
for discussion here.
/Ole