coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [coreutils] cp --parents parallel


From: Bob Proulx
Subject: Re: [coreutils] cp --parents parallel
Date: Mon, 18 Oct 2010 16:45:32 -0600
User-agent: Mutt/1.5.20 (2009-06-14)

Rob Gom wrote:
> Regarding my example - I am aware of bashism. That was only an example
> and it was more convenient for me to use bash features.

But you didn't use any bash features that I noticed.

> As for the real case - I use it inside makefiles. I copy many
> directory structures into single root.

And the last one copied wins?  Okay.

I am inclined to suggest using 'rsync' if --update is important.  The
way that rsync updates the target is different from the way that cp
updates the target.  The cp command writes it in place.  The rsync
command uses a temporary file and a rename to ensure that the target
is never available in a half written state.  Also I am not convinced
that cp is completely race-condition safe when multiple cp processes
are writing to the file at the same time.  Could it get duplicated
data in the file?  I would need to look.

> If --parents itself works as expected, that will be the easiest
> solution.

I think it would be reasonable to make --parents work the same as it
does for 'mkdir --parents' and not complain if the internal mkdir
fails because the directory already exists.  But even if that is
patched it will be years before the release containing it flows down
to most installations where you can count on having it.  Your 7.4 copy
was released on 2009-05-07 and yet you still have it in place and
probably will for a while I would imagine.  Current is version 8.6
released 2010-10-15.  Therefore it would still be wise to avoid the
race and use techniques that don't exhibit the problem.  The example
code I posted showed one such way.

> Generally it looks like:
> target:
>     cp --parents --update $(FILES_LIST) $(TARGET)

You could easily call 'mkdir -p $(TARGET)' before calling cp, remove
the --parents, and avoid the problem.

> (pseudo code, not working). As you can see, it's more complicated than
> simple cp, so I wanted to avoid that.

You could very easily convert to making the directory first and then
copying the files into place however.  That wouldn't change what you
were doing very much at all.

If all copies are equivalent (since you have a race and any of the
subprocesses might update it) then it would seem to be better to do it
once only instead of many times in parallel and discarding the first
ones to be copied and keeping the last.  You would probably speed up
your operation too since it wouldn't be doing unnecessary work.

> By the way, I have just reproduced the issue on Debian with coreutils
> 8.5. To my surprise, it's much less frequent, though. However this is
> a completely different machine - that may explain the behaviour (I
> haven't seen relevant change in general release notes between 7.4 and
> 8.5).

Since it depeneds upon the order of execution the differences in
machines could cause quite a bit of change is the way the race
conditions are won.  You could see entirely different behavior.

Bob



reply via email to

[Prev in Thread] Current Thread [Next in Thread]