[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Threaded versions of cp, mv, ls for high latency / parallel filesyst
From: |
Jim Meyering |
Subject: |
Re: Threaded versions of cp, mv, ls for high latency / parallel filesystems? |
Date: |
Sat, 08 Nov 2008 19:05:25 +0100 |
Andrew McGill <address@hidden> wrote:
> Greetings coreutils folks,
>
> There are a number of interesting filesystems (glusterfs, lustre? ... NFS)
> which could benefit from userspace utilities doing certain operatings in
> parallel. (I have a very slow glusterfs installation that makes me think
> that some things can be done better.)
>
> For example, copying a number of files is currently done in series ...
> cp a b c d e f g h dest/
> but, on certain filesystems, it would be roughly twice as efficient if
> implemented in two parallel threads, something like:
> cp a c e g dest/ &
> cp b d f h dest/
> since the source and destination files can be stored on multiple physical
> volumes.
How about parallelizing it via xargs, e.g.,
$ echo a b c d e f g h | xargs -t -n4 --no-run-if-empty \
--max-procs=2 -- cp --target-directory=dest
cp --target-directory=dest a b c d
cp --target-directory=dest e f g h
Obviously the above is tailored (-L4) to your 8-input example.
In practice, you'd use a larger number, unless latency is
so high as to dwarf the cost of extra "fork/exec" syscalls,
in which case even -L1 might make sense.
mv and ln also accept the --target-directory=dest option.
> Simlarly, ls -l . will readdir(), and then stat() each file in the directory.
> On a filesystem with high latency, it would be faster to issue the stat()
> calls asynchronously, and in parallel, and then collect the results for
If you can demonstrate a large performance gain on
systems that many people use, then maybe...
There is more than a little value in keeping programs
like those in the coreutils package relatively simple,
but if the cost(maintenance+portability burden)/benefit
ratio is low enough, then anything is possible.
For example, a well-encapsulated, optionally-threaded
"stat_all_dir_entries" API might be useful in some situations.
If getting any eventual patch into upstream coreutils is
important to you, be sure there is some consensus on this
list before doing a lot of work on it.
> display. (This could improve performance for NFS, in proportion to the
> latency and the number of threads.)
>
>
> Question: Is there already a set of "improved" utilities that implement this
> kind of technique?
Not that I know of.
> If not, would this kind of performance enhancements be
> considered useful?
It's impossible to say without knowing more.
- Threaded versions of cp, mv, ls for high latency / parallel filesystems?, Andrew McGill, 2008/11/08
- Re: Threaded versions of cp, mv, ls for high latency / parallel filesystems?,
Jim Meyering <=
- Re: Threaded versions of cp, mv, ls for high latency / parallel filesystems?, James Youngman, 2008/11/08
- Re: Threaded versions of cp, mv, ls for high latency / parallel filesystems?, Andrew McGill, 2008/11/11
- Re: Threaded versions of cp, mv, ls for high latency / parallel filesystems?, James Youngman, 2008/11/12
- Re: Threaded versions of cp, mv, ls for high latency / parallel filesystems?, Phillip Susi, 2008/11/12
- Re: Threaded versions of cp, mv, ls for high latency / parallel filesystems?, James Youngman, 2008/11/13
- Re: Threaded versions of cp, mv, ls for high latency / parallel filesystems?, Ralf Wildenhues, 2008/11/13
- Re: Threaded versions of cp, mv, ls for high latency / parallel filesystems?, Andrew McGill, 2008/11/14
- Re: Threaded versions of cp, mv, ls for high latency / parallel filesystems?, James Youngman, 2008/11/15
Re: Threaded versions of cp, mv, ls for high latency / parallel filesystems?, Dr. David Alan Gilbert, 2008/11/09