[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[coreutils] RE: cp command performance
From: |
Hemant Rumde |
Subject: |
[coreutils] RE: cp command performance |
Date: |
Thu, 23 Dec 2010 09:34:52 -0500 |
Hi Bob
Thanks for your quick response. I really appreciate your reply!
We are using HP Storage. I guess, our infrastructure is ok.
Lets discuss on "cp A1 A1.bk". Correct me if I am wrong.
In this cp, OS needs to cache all A1.bk data blocks from storage
to overwrite with A1 block. I guess, some time would be
utilized for this.
However, if A1.bk is new, then it would take free data
Blocks from super block. I guess, this should be faster.
Apart from this, read/write hits can make some difference
in performance. When you use dd, I guess most of your data
would be in buffer-cache and read-hit rate would be more
And very few calls would go to backend storage.
Does this make any sense?
Thanks
Hemant
-----Original Message-----
From: Bob Proulx [mailto:address@hidden]
Sent: Wednesday, December 22, 2010 9:17 PM
To: Hemant Rumde
Cc: address@hidden; address@hidden
Subject: Re: cp command performance
Hemant Rumde wrote:
> I do not log any bug for cp command.
In that case I will close the bug report that you have opened.
Let's have the discussion on the discussion mailing list
address@hidden. That is the more appropriate place. I have set the
mail headers to direct discussion there but if your mailer doesn't
comply please manually redirect it.
> In our company, we copy huge Cobol files before processing data. This
> is to rollback our data files. Suppose A1 is my huge file of 60GB and
> A1.bk is its backup file, before we process ( change ) data into A1.
> Then which of our method would be faster?
>
> 1. Method-1 ( A1.bk exists )
> $ cp A1 A1.bk
>
> 2. Method-2
> $ rm -f A1.bk
> $ cp A1 A1.bk
>
> 3. Method-3
> $ cp --remove-destination A1 A1,bk
All three of those should be virtually the same, especially the last
two. But benchmarking it is always good. I created a 10G test file
using dd and copied it once to set up the test and then performed the
following operations on a ext3 filesystem.
$ time cp testdata testdata.bak
real 3m34.435s
user 0m0.108s
sys 0m30.950s
$ time ( rm -f testdata.bak ; cp testdata testdata.bak )
real 3m27.941s
user 0m0.092s
sys 0m30.914s
$ time cp --remove-destination testdata testdata.bak
real 3m36.931s
user 0m0.068s
sys 0m30.862s
As you can see the times for all three operations are with limits of
being exactly the same.
> This operation is very simple. But our operators tell, in some cases
> cp takes longer time. How can we reduce copying time?
I do not doubt that there will be differences in times consumed for just
the raw command. With such a large file I think this will be dependent
upon outside influences. Such as what filesystem you are using for the
copy and how much ram you have available for buffer cache and whether
extraneous sync and fsync calls are happening at the same time and so
forth. I could send for-examples but I don't want to send you off on in
the wrong direction and so will resist.
Bob
---------------------------------------------------------
NOTICE: The information contained in this electronic mail message is
confidential and intended only for certain recipients. If you are not an
intended recipient, you are hereby notified that any disclosure, reproduction,
distribution or other use of this communication and any attachments is strictly
prohibited. If you have received this communication in error, please notify
the sender by reply transmission and delete the message without copying or
disclosing it.
============================================================================================