[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH V14 0/5] VMXNET3 paravirtual NIC device implemen
From: |
Stefan Hajnoczi |
Subject: |
Re: [Qemu-devel] [PATCH V14 0/5] VMXNET3 paravirtual NIC device implementation |
Date: |
Tue, 12 Mar 2013 10:38:42 +0100 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
On Sat, Mar 09, 2013 at 11:21:01AM +0200, Dmitry Fleytman wrote:
> This set of patches implements VMWare VMXNET3 paravirtual NIC device.
> The device supports of all the device features including offload capabilties,
> VLANs and etc.
> The device is tested on different OSes:
> Fedora 15
> Ubuntu 10.4
> Centos 6.2
> Windows 2008R2
> Windows 2008 64bit
> Windows 2008 32bit
> Windows 2003 64bit
> Windows 2003 32bit
>
> Changes in V14:
> QOM-related fixes
>
> Changes in V13:
> Licensing of vmxnet3.h changed to match original
> VMWare's header file license
>
> Changes in V12:
> Reported-by: Andreas Farber <address@hidden>
> QOM-related fixes
>
> Changes in V11:
> Rebased to master HEAD to fix multiqueue compilation issues
>
> Changes in V10:
> Reported-by: Stefan Hajnoczi <address@hidden>
> Issues reported by Stefan Hajnoczi fixed.
>
> Changes in V9:
> Reported-by: Stefan Hajnoczi <address@hidden>
> Issues reported by Stefan Hajnoczi fixed.
>
> Changes in V8:
> Reported-by: Stefan Hajnoczi <address@hidden>
> Issues reported by Stefan Hajnoczi reviewed and mostly fixed:
>
> > + }
> > + curr_src_off += src[i].iov_len;
> > + }
> > + return j;
> > +}
>
> The existing iov_copy() function provides equivalent functionality. I
> don't think iov_rebuild() is needed.
>
> [DF] Done. Thanks, missed it.
>
>
> > + size -= len;
> > + }
> > + iovec_off += iov[i].iov_len;
> > + }
> > + return res;
> > +}
>
> Rename this net_checksum_add_iov() and place it in net/checksum.c,
> then the new dependency on net from block can be dropped.
>
> [DF] Done.
>
> > +vmw_shmem_read(hwaddr addr, void *buf, int len)
> > {
> > VMW_SHPRN("SHMEM r: %" PRIx64 ", len: %d to %p", addr, len, buf);
> > cpu_physical_memory_read(addr, buf, len);
> > }
>
> All changes to this file should be squashed with the previous patch.
>
> [DF] Done
>
> > +#ifdef VMXNET_DEBUG_SHMEM_ACCESS
> > +#define VMW_SHPRN(fmt, ...)
> >
> > \
> > + do {
> >
> > \
> > + printf("[%s][SH][%s]: " fmt "\n", VMXNET_DEVICE_NAME, __func__,
> >
> > \
> > + ## __VA_ARGS__);
> >
> > \
> > + } while (0)
> > +#else
> > +#define VMW_SHPRN(fmt, ...) do {} while (0)
> > +#endif
>
> Please use QEMU tracing. It eliminates all this boilerplate and
> conditional compilation. Tracing can be enabled/disabled at runtime
> and works with SystemTap/DTrace. See docs/tracing.txt.
>
> [DF] We'd like to stick with compile time logic in this case becase of 2
> reasons:
> [DF] 1. These printouts are intended for reverse engineering/development only
> and there is
> [DF] no need to enable them at run time
> [DF] 2. There is a big number of printouts, all driver-device communication
> is traced,
> [DF] they hit performance even on strongest x86 in case of run-time logic
>
>
> > +struct eth_header {
> > + uint8_t h_dest[ETH_ALEN]; /* destination eth addr */
> > + uint8_t h_source[ETH_ALEN]; /* source ether addr */
> > + uint16_t h_proto; /* packet type ID field */
> > +};
>
> Looks like it's copy-pasted stuff from /usr/include/linux/if_*.h,
> /usr/include/netinet/*.h, and friends. If the system-wide headers are
> included names will collide for some of the macros at least.
>
> Did you check if the slirp/ definitions can be reused?
>
> [DF] Yes, you are right. This is copy-pasted from different places.
> [DF] Slips definishing do not fully cover our needs.
>
>
> I'd rather we import network header definitions once in a generic
> place into the source tree. That way vmxnet and other components
> don't need to redefine these structs.
>
> [DF] Exaclty! Our intention is to create generic header with network
> definitions and make everyone use it.
> [DF] We can move our header to some shared place if you want, however I'd do
> it in parallel with cleanup
> [DF] of similar definitions in existing code and this is a big change that os
> out of scope of these patches.
>
> > +
> > *===========================================================================*/
>
> Is this huge comment box a sign that the code should be split into a
> foo_tx.c and an foo_rx.c file?
>
> [DF] As for me this file is not that big to be splitted (<800 lines), however
> I'll do this if you insist :)
>
> > +size_t vmxnet_tx_pkt_send(VmxnetTxPktH pkt, NetClientState *vc)
>
> 'vc' is an old name that was used for VLANClientState. The struct has
> since been renamed to NetClientState and the rest of QEMU uses 'nc'
> instead of 'vc'.
>
> [DF] Fixed. Thanks.
>
> > +/* tx module context handle */
> > +typedef void *VmxnetTxPktH;
>
> Forward-declaring the struct is nicer:
>
> typedef struct VmxnetTxPkt VmxnetTxPkt;
>
> The definition of VmxnetTxPkt is still hidden from the caller but you
> avoid the void* and casting. In vmxnet_pkt.c define using:
>
> struct VmxnetTxPkt {
> ...
> };
>
> [DF] Agree, fixed. Thanks.
>
>
> > diff --git a/hw/vmxnet_utils.h b/hw/vmxnet_utils.h
> > index 7fd9a01..fac3b7b 100644
> > --- a/hw/vmxnet_utils.h
> > +++ b/hw/vmxnet_utils.h
>
> Please squash these fixes into the previous patch.
>
>
> [DF] Oops, my bad. Fixed.
>
>
> > + uint8_t msix_used;
> > + /* Whether MSI support was installed successfully */
> > + uint8_t msi_used;
>
> These two fields should be bool.
>
> > + /* Whether automatic interrupts masking enabled */
> > + uint8_t auto_int_masking;
>
> bool
>
> [DF] Done.
>
>
> > +static inline void vmxnet3_flush_shmem_changes(void)
> > +{
> > + /*
> > + * Flush shared memory changes
> > + * Needed before sending interrupt to guest to ensure
> > + * it gets consistent memory state
> > + */
> > + smp_wmb();
> > +}
>
> It's useful to document why a memory barrier is being used in each
> instance. Therefore hiding smp_wmb() inside a wrapper function isn't
> great.
>
> [DF] Fixed.
>
> Also, it's suspicious that smb_wmb() is used but no other barriers are
> used. What about a read memory barrier when accessing shared memory
> written by the guest?
>
> [DF] VMWARE interface build a in safe way - you always read out data buffer
> (packet descriptor) with memcopy,
> [DF] and then check its last byte to see whether it was fully filled by
> driver.
> [DF] In this case device doesn't need read barriers. Drivers need write
> barriers of course to secure
> [DF] proper order of writes.
>
> Changes in V7:
>
> Reported-by: Michael S. Tsirkin <address@hidden>
> Issues reported by Michael S. Tsirkin reviewed and mostly fixed:
>
> File vmware_utils.h:
>
> ...
>
> > +static inline void
> > +vmw_shmem_st64(target_phys_addr_t addr, uint64_t value)
> > +{
> > + VMW_SHPRN("SHMEM store64: %" PRIx64 " (value %" PRIx64 ")", addr,
> > value);
> > + stq_le_phys(addr, value);
> > +}
> > +
>
> Pls remove these wrappers. These are just memory stores. Our codebase
> is too large as it is without every driver wrapping all standard calls.
>
> [DF] Idea behind this macros is to have printout in each of them
> [DF] Printouts are really needed when reverse-engineering windows guest
> drivers
> [DF] Since there is no windows drivers code this is the only way to understand
> [DF] sequences they use
>
> > +/* MACROS for simplification of operations on array-style registers */
>
> UPPERCASE ABUSE
>
> [DF] Fixed
>
> > +#define IS_MULTIREG_ADDR(addr, base, cnt, regsize) \
> > + (((addr + 1) > (base)) && ((addr) < (base) + (cnt) * (regsize)))
>
>
> Same as range_covers_byte(base, cnt * regsize, addr)?
>
> [DF] Fixed, thanks
>
> > +
> > +#define MULTIREG_IDX_BY_ADDR(addr, base, regsize) \
> > + (((addr) - (base)) / (regsize))
> > +
>
> Above two macros is all that's left. No objection but it does not say
> what they do - want to add minimal documentation?
> And please prefix with VMWARE_ or something.
>
> [DF] Prefix added, macros documented
>
> File vmxnet_utils.h (and related with the same problems):
>
> ...
> > + * See the COPYING file in the top-level directory.
> > + *
> > + */
> > +
> > +#ifndef _VMXNET_UTILS_H_
> > +#define _VMXNET_UTILS_H_
>
> Please do not start identifiers with _ followed by an uppercase
> lattters.
>
> [DF] Fixed
>
> ...
> > +
> > +struct eth_header {
> > + uint8_t h_dest[ETH_ALEN]; /* destination eth addr */
> > + uint8_t h_source[ETH_ALEN]; /* source ether addr */
> > + uint16_t h_proto; /* packet type ID field */
> > +};
> > +
>
> And fix struct definitions to
> 1. follow qemu coding style
> [DF] It's not clear what's wrong with the coding style. Please, elaborate.
> 2. start with vmxnet
> [DF] Since this is generic network definitions with no device specifics
> [DF] I'm not sure it makes sense to add vmxnet prefix.
> [DF] I'd say it makes sense to put them into some generic header files under
> "net" directory instead
> [DF] and clean up other devices that have their local definitions of the same
> structures.
> [DF] Please, advise.
>
>
> I also don't really understand why are these
> functions split out - vmxnet is the only user, no?
>
> [DF] Currently vmxnet3 is the only user, hovewer we have vmxnet2
> implementation as well (not submitted yet)
> [DF] and it uses the same header
>
> File vmxnet_pkt.h:
>
> ...
> > +
> > +/* tx module context handle */
> > +typedef void *VmxnetTxPktH;
>
> This gets you zero type safety and makes debugging impossible.
> Just use pointers like everyone does.
>
> [DF] Handle type is added to hide VmxnetTxPkt structure into .c header
> [DF] for better encapsulation.
> [DF] Should we drop it anyway?
>
> Files vmxnet3.*:
>
> ...
> > +
> > + if (zero_region) {
> > + vmw_shmem_set(pa, 0, size*cell_size);
>
> spaces around *
>
> [DF] Fixed
>
> ...
> > +#define vmxnet3_ring_dump(macro, ring_name, ridx, r)
> >
> > \
> > + macro("%s#%d: base %" PRIx64 " size %lu cell_size %lu gen %d next
> > %lu",
> > \
> > + (ring_name), (ridx),
> >
> > \
> > + (r)->pa, (r)->size, (r)->cell_size, (r)->gen, (r)->next)
>
> make macros upper case
>
> [DF] Fixed
>
> ...
> > +static inline void vmxnet3_flush_shmem_changes(void)
> > +{
> > + /*
> > + * Flush shared memory changes
> > + * Needed before transferring control to guest
>
> what does 'transferring control to guest' mean?
> [DF] Changed to
> [DF] /*
> [DF] * Flush shared memory changes
> [DF] * Needed before sending interrupt to guest to ensure
> [DF] * it gets consistent memory state
> [DF] */
>
> ...
> > + */
> > + smp_wmb();
> > +}
>
> Don't use wrappers like this. They just hide bugs. For example
> it's not helpful before an interrupt in the function below.
>
> [DF] I guess you are talking about vmxnet3_complete_packet()
> [DF] Strictly speaking barrier is a must because we change shared memory in
> [DF] vmxnet3_complete_packet()
>
> [DF] And the wrapper is a good thing because its name explains its effect
> [DF] in a formal way as opposed to comments
>
> ...
> > + switch (status) {
> > + case VMXNET3_PKT_STATUS_OK: {
>
> don't put {} around cases: they align incorrectly
> if it's too big move to a function.
>
> [DF] Fixed
>
> ...
> > +static bool
> > +vmxnet3_send_packet(VMXNET3State *s, uint32_t qidx)
> > +{
> > + size_t bytes_sent = 0;
> > + bool res = true;
>
> why = true? don't initialize just because.
>
> [DF] Fixed
>
> ...
> > +/*
> > + * VMWARE headers we got from Linux kernel do not fully comply QEMU coding
> > + * standards in sense of types and defines used.
> > + * Since we didn't want to change VMWARE code, following set of typedefs
> > + * and defines needed to compile these headers with QEMU introduced.
> > + */
>
> No need for this now.
> You can export headers and put them under linux-headers.
>
> [DF] Not sure it is possible because the header as-is is not stand-alone and
> won't compile
> [DF] without changes. We extracted definitions we use from their header and
> dropped unused
> [DF] and kernel-specific stuff.
> [DF} Please, advise.
>
> ...
>
> > + if (VMXNET3_OM_TSO == s->offload_mode) {
>
> Don't do Yoda style like this
>
> [DF] "Yoda" style removed everywhere
>
>
> Changes in V6:
> Fixed most of problems pointed out by Michael S. Tsirkin
> The only issue still open is creation of shared place
> with generic network structures and functions. Currently
> all generic network code introduced by VMXNET3 resides in
> vmxnet_utils.c/h files. It could be moved to some shared location however
> we believe it is a matter of separate refactoring as there are a lot of
> copy-pasted
> definitions in almost every device and code cleanup efforts requred in
> order
> to create truly shared codebase.
>
> Reported-by: Michael S. Tsirkin <address@hidden>
>
> Implemented suggestions by Anthony Liguori
>
> Reported-by: Anthony Liguori <address@hidden>
>
> Fixed incorrect checksum caclulation for some packets in SW offloads mode
>
> Reported-by: Gerhard Wiesinger <address@hidden>
>
> Changes in V5:
> MSI-X save/load implemented in the device instead of pci bus as
> suggested by Michael S. Tsirkin
>
> Reported-by: Michael S. Tsirkin <address@hidden>
>
> Patches regrouped as suggested by Paolo Bonzini
>
> Reported-by: Paolo Bonzini <address@hidden>
>
> Changes in V4:
> Fixed a few problems uncovered by NETIO test suit
> Assertion on failure to initialize MSI/MSI-X replaced with warning
> message and fallback to Legacy/MSI respectively
>
> Reported-by: Gerhard Wiesinger <address@hidden>
>
> Various coding style adjustments and patch split-up as suggested by
> Anthony
> Liguori
>
> Reported-by: Anthony Liguori <address@hidden>
>
> Live migration support added
>
> Changes in V3:
> Fixed crash when net device that is used as network fronted has no
> virtio HDR support.
> Task offloads emulation for cases when net device that is used as
> network fronted has no virtio HDR support.
>
> Reported-by: Gerhard Wiesinger <address@hidden>
>
> Changes in V2:
> License text changed accoring to community suggestions
> Standard license header from GPLv2+ - licensed QEMU files used
>
> Dmitry Fleytman (5):
> Checksum-related utility functions
> net: iovec checksum calculator
> Common definitions for VMWARE devices
> Packet abstraction for VMWARE network devices
> VMXNET3 device implementation
>
> default-configs/pci.mak | 1 +
> hw/Makefile.objs | 2 +
> hw/pci/pci.h | 1 +
> hw/vmware_utils.h | 143 +++
> hw/vmxnet3.c | 2461
> +++++++++++++++++++++++++++++++++++++++++++++++
> hw/vmxnet3.h | 760 +++++++++++++++
> hw/vmxnet_debug.h | 115 +++
> hw/vmxnet_rx_pkt.c | 187 ++++
> hw/vmxnet_rx_pkt.h | 174 ++++
> hw/vmxnet_tx_pkt.c | 567 +++++++++++
> hw/vmxnet_tx_pkt.h | 148 +++
> include/net/checksum.h | 26 +-
> include/net/eth.h | 347 +++++++
> net/Makefile.objs | 1 +
> net/checksum.c | 42 +-
> net/eth.c | 217 +++++
> 16 files changed, 5185 insertions(+), 7 deletions(-)
> create mode 100644 hw/vmware_utils.h
> create mode 100644 hw/vmxnet3.c
> create mode 100644 hw/vmxnet3.h
> create mode 100644 hw/vmxnet_debug.h
> create mode 100644 hw/vmxnet_rx_pkt.c
> create mode 100644 hw/vmxnet_rx_pkt.h
> create mode 100644 hw/vmxnet_tx_pkt.c
> create mode 100644 hw/vmxnet_tx_pkt.h
> create mode 100644 include/net/eth.h
> create mode 100644 net/eth.c
>
> --
> 1.8.1.2
>
Thanks, applied to my net tree:
https://github.com/stefanha/qemu/commits/net
Stefan
- [Qemu-devel] [PATCH V14 1/5] Checksum-related utility functions, (continued)
- [Qemu-devel] [PATCH V14 1/5] Checksum-related utility functions, Dmitry Fleytman, 2013/03/09
- [Qemu-devel] [PATCH V14 2/5] net: iovec checksum calculator, Dmitry Fleytman, 2013/03/09
- [Qemu-devel] [PATCH V14 3/5] Common definitions for VMWARE devices, Dmitry Fleytman, 2013/03/09
- [Qemu-devel] [PATCH V14 4/5] Packet abstraction for VMWARE network devices, Dmitry Fleytman, 2013/03/09
- [Qemu-devel] [PATCH V14 5/5] VMXNET3 device implementation, Dmitry Fleytman, 2013/03/09
Re: [Qemu-devel] [PATCH V14 0/5] VMXNET3 paravirtual NIC device implementation,
Stefan Hajnoczi <=