Re: [Qemu-devel] [RFC PATCH 0/5] Introduce Intel 82574 GbE Controller Em

On 2 Nov 2015, at 05:35 AM, Jason Wang <address@hidden> wrote:

On 10/31/2015 01:52 PM, Dmitry Fleytman wrote:
Hello Jason,

Thanks for reviewing. See my answers inline.

On 30 Oct 2015, at 07:28 AM, Jason Wang <address@hidden
<mailto:address@hidden>> wrote:

On 10/28/2015 01:44 PM, Jason Wang wrote:

On 10/26/2015 01:00 AM, Leonid Bloch wrote:
Hello qemu-devel,

This patch series is an RFC for the new networking device emulation
we're developing for QEMU.

This new device emulates the Intel 82574 GbE Controller and works
with unmodified Intel e1000e drivers from the Linux/Windows kernels.

The status of the current series is "Functional Device Ready, work
on Extended Features in Progress".

More precisely, these patches represent a functional device, which
is recognized by the standard Intel drivers, and is able to transfer
TX/RX packets with CSO/TSO offloads, according to the spec.

Extended features not supported yet (work in progress):
1. TX/RX Interrupt moderation mechanisms
2. RSS
3. Full-featured multi-queue (use of multiqueued network backend)

Also, there will be some code refactoring and performance
optimization efforts.

This series was tested on Linux (Fedora 22) and Windows (2012R2)
guests, using Iperf, with TX/RX and TCP/UDP streams, and various
packet sizes.

More thorough testing, including data streams with different MTU
sizes, and Microsoft Certification (HLK) tests, are pending missing
features' development.

See commit messages (esp. "net: Introduce e1000e device emulation")
for more information about the development approaches and the
architecture options chosen for this device.

This series is based upon v2.3.0 tag of the upstream QEMU repository,
and it will be rebased to latest before the final submission.

Please share your thoughts - any feedback is highly welcomed :)

Best Regards,
Dmitry Fleytman.
Thanks for the series. Will go through this in next few days.

Have a quick glance at the series, got the following questions:

- Though e1000e differs from e1000 in many places, I still see lots of
code duplications. We need consider to reuse e1000.c (or at least part
of). I believe we don't want to fix a bug twice in two places in the
future and I expect hundreds of lines could be saved through this way.

That’s a good question :)

This is how we started, we had a common “core” code base meant to
implement all common logic (this split is still present in the patches
- there are e1000e_core.c and e1000e.c files).
Unfortunately at some point it turned out that there are more
differences that commons. We noticed that the code becomes filled with
many minor differences handling.
This also made the code base more complicated and harder to follow.

So at some point of time it was decided to split the code base and
revert all changes done to the e1000 device (except a few
fixes/improvements Leonid submitted a few days ago).

Although there was common code between devices, total SLOC of e1000
and e1000e devices became smaller after the split.

Amount of code that may be shared between devices will be even smaller
after we complete the implementation which still misses a few features
(see cover letter) that will change many things.

Still after the device implementation is done, we plan to review code
similarities again to see if there are possibilities for code sharing.

I see, but if we can try to re-use or unify the codes from beginning, it
would be a little bit easier. Looks like the differences were mainly:

1) MSI-X support
2) offloading support through virtio-net header
3) trace points
4) other new functions through e1000e specific registers

So we could first unify the code through implementing the support of 2
and 3 for e1000. For MSI-X and other e1000e specific new functions, it
could be done through:

1) model specific callbacks, e.g realize, transmission and reception
2) A new register flags e.g PHY_RW_E1000E which means the register is
for e1000e only. Or even model specific wirteops and readops
3) For other subtle differences, it could be done in the code by
checking the model explicitly.

What's your opinion? (A good example of code sharing is freebsd's e1000
driver which covers both 8254x and 8257x).

Hi Jason,

This is exactly how we started.

Issues that made us split the code base were following:

1. The majority of registers are different. Even same registers in many cases have different bits and require different processing logic. This introduced too much if’s into the code;

2. The data path is totally different not just because of virtio headers but also because these devices use different descriptor layouts and require different logic in order to parse and fill those. There are legacy descriptors that look almost the same but of course we must support all descriptor types described by spec;

3. Interrupt handling logic is different because of MSI support;

4. Mutli-queue and RSS make the code even less similar to the old device;

5. Changes required to isolate shared code base required changes in migration tables and fishy tricks to preserve compatibility with current implementation;

6. e1000 code suffered from massive changes which are very hard to verify because spec is big and there are no drivers that use all device features.

Overall, code for handling differences in device behaviours was bigger and more complicated then the device logic itself. The difference is not subtle when it comes to the full featured device implementation.

As for FreeBSD’s driver, I’m not deeply familiar with its code but I suspect it works with device in legacy mode which is pretty similar to an old device indeed. Since we must support all modes our situation is different.

Again, I’m totally into shared code and would like to have some common code base anyway. Currently it looks like the best way to achieve this is to finish with all device features and then see what parts of logic may be shared between the old and the new devices. It’s better to have slight code duplication and smaller shared code base than to have bloated and tricky shared code that will introduce its own problems to both devices.

Best Regards,

Dmitry

- For e1000e it self, since it was a new device, so no need to care
about compatibility stuffs (e.g auto negotiation and mit). We can just
enable them forever.

Yes, we have this in plans.

- And for the generic packet abstraction layer, what's the advantages of
this? If it has lot, maybe we can use it in other nic model (e.g
virtio-net)?

These abstractions were initially developed by me as a part of vmxnet3
device to be generic and re-usable. Their main advantage is support
for virtio headers for virtio-enabled backends and emulation of
network offloads in software for backends that do not support virtio.

Of course they may be re-used by virtio, however I’m not sure if it
will be really useful because virtio has feature negotiation
facilities and do not require SW emulation for network task offloads.

For other devices they are useful because each and every device that
requires SW offloads implementation need to do exactly the same things
and it doesn’t make sense to have a few implementations for this.

Best Regards,
Dmitry

Ok, thanks for the explanation.

Thanks

Since 2.5 is in soft freeze, this looks a 2.6 material.

From:	Dmitry Fleytman
Subject:	Re: [Qemu-devel] [RFC PATCH 0/5] Introduce Intel 82574 GbE Controller Emulation (e1000e)
Date:	Mon, 2 Nov 2015 09:49:02 +0200