qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] memcpy speed (Re: [PATCH v2] netmap backend (revised))


From: Paolo Bonzini
Subject: Re: [Qemu-devel] memcpy speed (Re: [PATCH v2] netmap backend (revised))
Date: Thu, 24 Jan 2013 09:09:22 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2

Il 23/01/2013 17:03, Luigi Rizzo ha scritto:
> On Wed, Jan 23, 2013 at 02:03:17PM +0100, Stefan Hajnoczi wrote:
>> On Wed, Jan 23, 2013 at 12:50:26PM +0100, Luigi Rizzo wrote:
>>> On Wed, Jan 23, 2013 at 12:10:55PM +0100, Stefan Hajnoczi wrote:
>>>> On Tue, Jan 22, 2013 at 08:12:15AM +0100, Luigi Rizzo wrote:
> ...
>>>>> +// a fast copy routine only for multiples of 64 bytes, non overlapped.
>>>>> +static inline void
>>>>> +pkt_copy(const void *_src, void *_dst, int l)
>>> ...
>>>>> +                *dst++ = *src++;
>>>>> +        }
>>>>> +}
>>>>
>>>> I wonder how different FreeBSD bcopy() is from glibc memcpy() and if the
>>>> optimization is even a win.  The glibc code is probably hand-written
>>>> assembly that CPU vendors have contributed for specific CPU model
>>>> families.
>>>>
>>>> Did you compare glibc memcpy() against pkt_copy()?
>>>
>>> I haven't tried in detail on glibc but will run some tests.  In any
>>> case not all systems have glibc, and on FreeBSD this pkt_copy was
>>> a significant win for small packets (saving some 20ns each; of
>>> course this counts only when you approach the 10 Mpps range, which
>>> is what you get with netmap, and of course when data is in cache).
>>>
>>> One reason pkt_copy gains something is that if it can assume there
>>> is extra space in the buffer, it can work on large chunks avoiding the extra
>>> jumps and instructions for the remaining 1-2-4 bytes.
>>
>> I'd like to drop this code or at least make it FreeBSD-specific since
>> there's no guarantee that this is a good idea on any other libc.
>>
>> I'm even doubtful that it's always a win on FreeBSD.  You have a
>> threshold to fall back to bcopy() and who knows what the "best" value
>> for various CPUs is.
> 
> indeed.
> With the attached program (which however might be affected by the
> fact that data is not used after copying) it seems that on a recent
> linux (using gcc 4.6.2) the fastest is __builtin_memcpy()
> 
>       ./testlock -m __builtin_memcpy -l 64
> 
> (by a factor of 2 or more) whereas all the other methods have
> approximately the same speed.
> 
> On FreeBSD (with clang, gcc 4.2.1, gcc 4.6.4) the pkt_copy() above
> 
>       ./testlock -m fastcopy -l 64
> 
> is largely better than other methods. I am a bit puzzled why
> the builtin method on FreeBSD is not effective, but i will check
> on some other forum...

Perhaps a different default for -march/-mtune?

Paolo




reply via email to

[Prev in Thread] Current Thread [Next in Thread]