[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] memcpy speed (Re: [PATCH v2] netmap backend (revised))
From: |
Paolo Bonzini |
Subject: |
Re: [Qemu-devel] memcpy speed (Re: [PATCH v2] netmap backend (revised)) |
Date: |
Thu, 24 Jan 2013 09:09:22 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2 |
Il 23/01/2013 17:03, Luigi Rizzo ha scritto:
> On Wed, Jan 23, 2013 at 02:03:17PM +0100, Stefan Hajnoczi wrote:
>> On Wed, Jan 23, 2013 at 12:50:26PM +0100, Luigi Rizzo wrote:
>>> On Wed, Jan 23, 2013 at 12:10:55PM +0100, Stefan Hajnoczi wrote:
>>>> On Tue, Jan 22, 2013 at 08:12:15AM +0100, Luigi Rizzo wrote:
> ...
>>>>> +// a fast copy routine only for multiples of 64 bytes, non overlapped.
>>>>> +static inline void
>>>>> +pkt_copy(const void *_src, void *_dst, int l)
>>> ...
>>>>> + *dst++ = *src++;
>>>>> + }
>>>>> +}
>>>>
>>>> I wonder how different FreeBSD bcopy() is from glibc memcpy() and if the
>>>> optimization is even a win. The glibc code is probably hand-written
>>>> assembly that CPU vendors have contributed for specific CPU model
>>>> families.
>>>>
>>>> Did you compare glibc memcpy() against pkt_copy()?
>>>
>>> I haven't tried in detail on glibc but will run some tests. In any
>>> case not all systems have glibc, and on FreeBSD this pkt_copy was
>>> a significant win for small packets (saving some 20ns each; of
>>> course this counts only when you approach the 10 Mpps range, which
>>> is what you get with netmap, and of course when data is in cache).
>>>
>>> One reason pkt_copy gains something is that if it can assume there
>>> is extra space in the buffer, it can work on large chunks avoiding the extra
>>> jumps and instructions for the remaining 1-2-4 bytes.
>>
>> I'd like to drop this code or at least make it FreeBSD-specific since
>> there's no guarantee that this is a good idea on any other libc.
>>
>> I'm even doubtful that it's always a win on FreeBSD. You have a
>> threshold to fall back to bcopy() and who knows what the "best" value
>> for various CPUs is.
>
> indeed.
> With the attached program (which however might be affected by the
> fact that data is not used after copying) it seems that on a recent
> linux (using gcc 4.6.2) the fastest is __builtin_memcpy()
>
> ./testlock -m __builtin_memcpy -l 64
>
> (by a factor of 2 or more) whereas all the other methods have
> approximately the same speed.
>
> On FreeBSD (with clang, gcc 4.2.1, gcc 4.6.4) the pkt_copy() above
>
> ./testlock -m fastcopy -l 64
>
> is largely better than other methods. I am a bit puzzled why
> the builtin method on FreeBSD is not effective, but i will check
> on some other forum...
Perhaps a different default for -march/-mtune?
Paolo
- Re: [Qemu-devel] [PATCH v2] netmap backend (revised), (continued)
- Re: [Qemu-devel] [PATCH v2] netmap backend (revised), Stefan Hajnoczi, 2013/01/23
- Re: [Qemu-devel] [PATCH v2] netmap backend (revised), Luigi Rizzo, 2013/01/23
- Re: [Qemu-devel] [PATCH v2] netmap backend (revised), Stefan Hajnoczi, 2013/01/23
- [Qemu-devel] memcpy speed (Re: [PATCH v2] netmap backend (revised)), Luigi Rizzo, 2013/01/23
- Re: [Qemu-devel] memcpy speed (Re: [PATCH v2] netmap backend (revised)), Luigi Rizzo, 2013/01/23
- Re: [Qemu-devel] memcpy speed (Re: [PATCH v2] netmap backend (revised)), Stefan Hajnoczi, 2013/01/24
- Re: [Qemu-devel] memcpy speed (Re: [PATCH v2] netmap backend (revised)), Luigi Rizzo, 2013/01/24
- Re: [Qemu-devel] memcpy speed (Re: [PATCH v2] netmap backend (revised)), Stefan Hajnoczi, 2013/01/25
- Re: [Qemu-devel] memcpy speed (Re: [PATCH v2] netmap backend (revised)), Luigi Rizzo, 2013/01/28
- Re: [Qemu-devel] memcpy speed (Re: [PATCH v2] netmap backend (revised)),
Paolo Bonzini <=