Re: [Qemu-devel] [RFC PATCH 0/5] Introduce Intel 82574 GbE Controller Em

On 3 Nov 2015, at 07:44 AM, Jason Wang <address@hidden> wrote:

On 11/02/2015 03:49 PM, Dmitry Fleytman wrote:

On 2 Nov 2015, at 05:35 AM, Jason Wang <address@hidden
<mailto:address@hidden>> wrote:

On 10/31/2015 01:52 PM, Dmitry Fleytman wrote:
Hello Jason,

Thanks for reviewing. See my answers inline.

On 30 Oct 2015, at 07:28 AM, Jason Wang <address@hidden
<mailto:address@hidden>
<mailto:address@hidden>> wrote:

On 10/28/2015 01:44 PM, Jason Wang wrote:

On 10/26/2015 01:00 AM, Leonid Bloch wrote:
Hello qemu-devel,

This patch series is an RFC for the new networking device emulation
we're developing for QEMU.

This new device emulates the Intel 82574 GbE Controller and works
with unmodified Intel e1000e drivers from the Linux/Windows kernels.

The status of the current series is "Functional Device Ready, work
on Extended Features in Progress".

More precisely, these patches represent a functional device, which
is recognized by the standard Intel drivers, and is able to transfer
TX/RX packets with CSO/TSO offloads, according to the spec.

Extended features not supported yet (work in progress):
1. TX/RX Interrupt moderation mechanisms
2. RSS
3. Full-featured multi-queue (use of multiqueued network backend)

Also, there will be some code refactoring and performance
optimization efforts.

This series was tested on Linux (Fedora 22) and Windows (2012R2)
guests, using Iperf, with TX/RX and TCP/UDP streams, and various
packet sizes.

More thorough testing, including data streams with different MTU
sizes, and Microsoft Certification (HLK) tests, are pending missing
features' development.

See commit messages (esp. "net: Introduce e1000e device emulation")
for more information about the development approaches and the
architecture options chosen for this device.

This series is based upon v2.3.0 tag of the upstream QEMU repository,
and it will be rebased to latest before the final submission.

Please share your thoughts - any feedback is highly welcomed :)

Best Regards,
Dmitry Fleytman.
Thanks for the series. Will go through this in next few days.

Have a quick glance at the series, got the following questions:

- Though e1000e differs from e1000 in many places, I still see lots of
code duplications. We need consider to reuse e1000.c (or at least part
of). I believe we don't want to fix a bug twice in two places in the
future and I expect hundreds of lines could be saved through this way.

That’s a good question :)

This is how we started, we had a common “core” code base meant to
implement all common logic (this split is still present in the patches
- there are e1000e_core.c and e1000e.c files).
Unfortunately at some point it turned out that there are more
differences that commons. We noticed that the code becomes filled with
many minor differences handling.
This also made the code base more complicated and harder to follow.

So at some point of time it was decided to split the code base and
revert all changes done to the e1000 device (except a few
fixes/improvements Leonid submitted a few days ago).

Although there was common code between devices, total SLOC of e1000
and e1000e devices became smaller after the split.

Amount of code that may be shared between devices will be even smaller
after we complete the implementation which still misses a few features
(see cover letter) that will change many things.

Still after the device implementation is done, we plan to review code
similarities again to see if there are possibilities for code sharing.

I see, but if we can try to re-use or unify the codes from beginning, it
would be a little bit easier. Looks like the differences were mainly:

1) MSI-X support
2) offloading support through virtio-net header
3) trace points
4) other new functions through e1000e specific registers

So we could first unify the code through implementing the support of 2
and 3 for e1000. For MSI-X and other e1000e specific new functions, it
could be done through:

1) model specific callbacks, e.g realize, transmission and reception
2) A new register flags e.g PHY_RW_E1000E which means the register is
for e1000e only. Or even model specific wirteops and readops
3) For other subtle differences, it could be done in the code by
checking the model explicitly.

What's your opinion? (A good example of code sharing is freebsd's e1000
driver which covers both 8254x and 8257x).

Hi Jason,

This is exactly how we started.

Issues that made us split the code base were following:

1. The majority of registers are different. Even same registers in
many cases have different bits and require different processing logic.
This introduced too much if’s into the code;

Then we can probably have different writeops and readops. This way, we
can at least save the codes of common registers.

2. The data path is totally different not just because of virtio
headers but also because these devices use different descriptor
layouts and require different logic in order to parse and fill those.
There are legacy descriptors that look almost the same but of course
we must support all descriptor types described by spec;

Yes, but looks like the only extend rx/tx descriptors is 8257x specific.
And 8257x fully supports both legacy and context descriptor of 8254x.
This give us another chance to reuse the code.

3. Interrupt handling logic is different because of MSI support;

Right, but this is not hard to address, probably a new helper.

4. Mutli-queue and RSS make the code even less similar to the old device;

Yes, this could be in 8275x specific file.

5. Changes required to isolate shared code base required changes in
migration tables and fishy tricks to preserve compatibility with
current implementation;

Since 8257x is a totally new device, it can has its own vmstate if it's
simpler to be implemented and we don't even need to care migration
compatibility.

6. e1000 code suffered from massive changes which are very hard to
verify because spec is big and there are no drivers that use all
device features.

Then we can try to change e1000 as mini as possible. E.g just factor out
the common logic to helpers to reuse it in e1000e.

Overall, code for handling differences in device behaviours was bigger
and more complicated then the device logic itself. The difference is
not subtle when it comes to the full featured device implementation.
As for FreeBSD’s driver, I’m not deeply familiar with its code but I
suspect it works with device in legacy mode which is pretty similar to
an old device indeed. Since we must support all modes our situation is
different.

Yes, it does not use extended descriptor format.

Again, I’m totally into shared code and would like to have some common
code base anyway. Currently it looks like the best way to achieve this
is to finish with all device features and then see what parts of logic
may be shared between the old and the new devices. It’s better to have
slight code duplication and smaller shared code base than to have
bloated and tricky shared code that will introduce its own problems to
both devices.

The code duplication is not slight in this rfc :). So the code has the
possibility to be unified. But I'm ok to evaluate this after all
features were developed.

Hello Jason,

I’m strongly into making code as much shared as possible :)

Let’s see how it goes with the final code…

Thanks for your comments,

Dmitry.

Thanks

Best Regards,
Dmitry

- For e1000e it self, since it was a new device, so no need to care
about compatibility stuffs (e.g auto negotiation and mit). We can just
enable them forever.

Yes, we have this in plans.

- And for the generic packet abstraction layer, what's the
advantages of
this? If it has lot, maybe we can use it in other nic model (e.g
virtio-net)?

These abstractions were initially developed by me as a part of vmxnet3
device to be generic and re-usable. Their main advantage is support
for virtio headers for virtio-enabled backends and emulation of
network offloads in software for backends that do not support virtio.

Of course they may be re-used by virtio, however I’m not sure if it
will be really useful because virtio has feature negotiation
facilities and do not require SW emulation for network task offloads.

For other devices they are useful because each and every device that
requires SW offloads implementation need to do exactly the same things
and it doesn’t make sense to have a few implementations for this.

Best Regards,
Dmitry

Ok, thanks for the explanation.

Thanks

Since 2.5 is in soft freeze, this looks a 2.6 material.

From:	Dmitry Fleytman
Subject:	Re: [Qemu-devel] [RFC PATCH 0/5] Introduce Intel 82574 GbE Controller Emulation (e1000e)
Date:	Tue, 3 Nov 2015 11:17:44 +0200