On 11/02/2015 03:49 PM, Dmitry Fleytman wrote:
On 2 Nov 2015, at 05:35 AM, Jason Wang <address@hidden <mailto:address@hidden>> wrote:
On 10/31/2015 01:52 PM, Dmitry Fleytman wrote:
Hello Jason,
Thanks for reviewing. See my answers inline.
On 30 Oct 2015, at 07:28 AM, Jason Wang <address@hidden <mailto:address@hidden> <mailto:address@hidden>> wrote:
On 10/28/2015 01:44 PM, Jason Wang wrote:
On 10/26/2015 01:00 AM, Leonid Bloch wrote:
Hello qemu-devel,
This patch series is an RFC for the new networking device emulation we're developing for QEMU.
This new device emulates the Intel 82574 GbE Controller and works with unmodified Intel e1000e drivers from the Linux/Windows kernels.
The status of the current series is "Functional Device Ready, work on Extended Features in Progress".
More precisely, these patches represent a functional device, which is recognized by the standard Intel drivers, and is able to transfer TX/RX packets with CSO/TSO offloads, according to the spec.
Extended features not supported yet (work in progress): 1. TX/RX Interrupt moderation mechanisms 2. RSS 3. Full-featured multi-queue (use of multiqueued network backend)
Also, there will be some code refactoring and performance optimization efforts.
This series was tested on Linux (Fedora 22) and Windows (2012R2) guests, using Iperf, with TX/RX and TCP/UDP streams, and various packet sizes.
More thorough testing, including data streams with different MTU sizes, and Microsoft Certification (HLK) tests, are pending missing features' development.
See commit messages (esp. "net: Introduce e1000e device emulation") for more information about the development approaches and the architecture options chosen for this device.
This series is based upon v2.3.0 tag of the upstream QEMU repository, and it will be rebased to latest before the final submission.
Please share your thoughts - any feedback is highly welcomed :)
Best Regards, Dmitry Fleytman.
Thanks for the series. Will go through this in next few days.
Have a quick glance at the series, got the following questions:
- Though e1000e differs from e1000 in many places, I still see lots of code duplications. We need consider to reuse e1000.c (or at least part of). I believe we don't want to fix a bug twice in two places in the future and I expect hundreds of lines could be saved through this way.
That’s a good question :)
This is how we started, we had a common “core” code base meant to implement all common logic (this split is still present in the patches - there are e1000e_core.c and e1000e.c files). Unfortunately at some point it turned out that there are more differences that commons. We noticed that the code becomes filled with many minor differences handling. This also made the code base more complicated and harder to follow.
So at some point of time it was decided to split the code base and revert all changes done to the e1000 device (except a few fixes/improvements Leonid submitted a few days ago).
Although there was common code between devices, total SLOC of e1000 and e1000e devices became smaller after the split.
Amount of code that may be shared between devices will be even smaller after we complete the implementation which still misses a few features (see cover letter) that will change many things.
Still after the device implementation is done, we plan to review code similarities again to see if there are possibilities for code sharing.
I see, but if we can try to re-use or unify the codes from beginning, it would be a little bit easier. Looks like the differences were mainly:
1) MSI-X support 2) offloading support through virtio-net header 3) trace points 4) other new functions through e1000e specific registers
So we could first unify the code through implementing the support of 2 and 3 for e1000. For MSI-X and other e1000e specific new functions, it could be done through:
1) model specific callbacks, e.g realize, transmission and reception 2) A new register flags e.g PHY_RW_E1000E which means the register is for e1000e only. Or even model specific wirteops and readops 3) For other subtle differences, it could be done in the code by checking the model explicitly.
What's your opinion? (A good example of code sharing is freebsd's e1000 driver which covers both 8254x and 8257x).
Hi Jason,
This is exactly how we started.
Issues that made us split the code base were following:
1. The majority of registers are different. Even same registers in many cases have different bits and require different processing logic. This introduced too much if’s into the code;
Then we can probably have different writeops and readops. This way, wecan at least save the codes of common registers.2. The data path is totally different not just because of virtio headers but also because these devices use different descriptor layouts and require different logic in order to parse and fill those. There are legacy descriptors that look almost the same but of course we must support all descriptor types described by spec;
Yes, but looks like the only extend rx/tx descriptors is 8257x specific.And 8257x fully supports both legacy and context descriptor of 8254x.This give us another chance to reuse the code.3. Interrupt handling logic is different because of MSI support;
Right, but this is not hard to address, probably a new helper.4. Mutli-queue and RSS make the code even less similar to the old device;
Yes, this could be in 8275x specific file.
5. Changes required to isolate shared code base required changes in migration tables and fishy tricks to preserve compatibility with current implementation;
Since 8257x is a totally new device, it can has its own vmstate if it'ssimpler to be implemented and we don't even need to care migrationcompatibility.
6. e1000 code suffered from massive changes which are very hard to verify because spec is big and there are no drivers that use all device features.
Then we can try to change e1000 as mini as possible. E.g just factor outthe common logic to helpers to reuse it in e1000e.
Overall, code for handling differences in device behaviours was bigger and more complicated then the device logic itself. The difference is not subtle when it comes to the full featured device implementation. As for FreeBSD’s driver, I’m not deeply familiar with its code but I suspect it works with device in legacy mode which is pretty similar to an old device indeed. Since we must support all modes our situation is different.
Yes, it does not use extended descriptor format.
Again, I’m totally into shared code and would like to have some common code base anyway. Currently it looks like the best way to achieve this is to finish with all device features and then see what parts of logic may be shared between the old and the new devices. It’s better to have slight code duplication and smaller shared code base than to have bloated and tricky shared code that will introduce its own problems to both devices.
The code duplication is not slight in this rfc :). So the code has thepossibility to be unified. But I'm ok to evaluate this after allfeatures were developed.
I’m strongly into making code as much shared as possible :) Let’s see how it goes with the final code…
Thanks for your comments, Dmitry.
Thanks Best Regards, Dmitry
- For e1000e it self, since it was a new device, so no need to care about compatibility stuffs (e.g auto negotiation and mit). We can just enable them forever.
Yes, we have this in plans.
- And for the generic packet abstraction layer, what's the advantages of this? If it has lot, maybe we can use it in other nic model (e.g virtio-net)?
These abstractions were initially developed by me as a part of vmxnet3 device to be generic and re-usable. Their main advantage is support for virtio headers for virtio-enabled backends and emulation of network offloads in software for backends that do not support virtio.
Of course they may be re-used by virtio, however I’m not sure if it will be really useful because virtio has feature negotiation facilities and do not require SW emulation for network task offloads.
For other devices they are useful because each and every device that requires SW offloads implementation need to do exactly the same things and it doesn’t make sense to have a few implementations for this.
Best Regards, Dmitry
Ok, thanks for the explanation.
Thanks
Since 2.5 is in soft freeze, this looks a 2.6 material.
|