First of all, thank you for all your feedbacks
Please help me to summarize and let us understand better what we do in v2:
Major questions are:
1. Building eBPF from source during qemu build vs. regenerating it on
demand and keeping in the repository
Solution 1a (~ as in v1): keep instructions or ELF in H file, generate
it out of qemu build. In general we'll need to have BE and LE binaries.
Solution 1b: build ELF or instructions during QEMU build if llvm +
clang exist. Then we will have only one (BE or LE, depending on
current QEMU build)
We agree with any solution - I believe you know the requirements better.
Jason, Daniel, Michael
Can you please let us know what you think and why?
On Thu, Nov 5, 2020 at 3:19 PM Daniel P. Berrangé <berrange@redhat.com
<mailto:berrange@redhat.com>> wrote:
On Thu, Nov 05, 2020 at 10:01:09AM +0000, Daniel P. Berrangé wrote:
> On Thu, Nov 05, 2020 at 11:46:18AM +0800, Jason Wang wrote:
> >
> > On 2020/11/4 下午5:31, Daniel P. Berrangé wrote:
> > > On Wed, Nov 04, 2020 at 10:07:52AM +0800, Jason Wang wrote:
> > > > On 2020/11/3 下午6:32, Yuri Benditovich wrote:
> > > > >
> > > > > On Tue, Nov 3, 2020 at 11:02 AM Jason Wang
<jasowang@redhat.com <mailto:jasowang@redhat.com>
> > > > > <mailto:jasowang@redhat.com
<mailto:jasowang@redhat.com>>> wrote:
> > > > >
> > > > >
> > > > > On 2020/11/3 上午2:51, Andrew Melnychenko wrote:
> > > > > > Basic idea is to use eBPF to calculate and steer
packets in TAP.
> > > > > > RSS(Receive Side Scaling) is used to distribute
network packets
> > > > > to guest virtqueues
> > > > > > by calculating packet hash.
> > > > > > eBPF RSS allows us to use RSS with vhost TAP.
> > > > > >
> > > > > > This set of patches introduces the usage of eBPF
for packet steering
> > > > > > and RSS hash calculation:
> > > > > > * RSS(Receive Side Scaling) is used to distribute
network packets to
> > > > > > guest virtqueues by calculating packet hash
> > > > > > * eBPF RSS suppose to be faster than already
existing 'software'
> > > > > > implementation in QEMU
> > > > > > * Additionally adding support for the usage of
RSS with vhost
> > > > > >
> > > > > > Supported kernels: 5.8+
> > > > > >
> > > > > > Implementation notes:
> > > > > > Linux TAP TUNSETSTEERINGEBPF ioctl was used to
set the eBPF program.
> > > > > > Added eBPF support to qemu directly through a
system call, see the
> > > > > > bpf(2) for details.
> > > > > > The eBPF program is part of the qemu and
presented as an array
> > > > > of bpf
> > > > > > instructions.
> > > > > > The program can be recompiled by provided
Makefile.ebpf(need to
> > > > > adjust
> > > > > > 'linuxhdrs'),
> > > > > > although it's not required to build QEMU with
eBPF support.
> > > > > > Added changes to virtio-net and vhost, primary
eBPF RSS is used.
> > > > > > 'Software' RSS used in the case of hash
population and as a
> > > > > fallback option.
> > > > > > For vhost, the hash population feature is not
reported to the guest.
> > > > > >
> > > > > > Please also see the documentation in PATCH 6/6.
> > > > > >
> > > > > > I am sending those patches as RFC to initiate the
discussions
> > > > > and get
> > > > > > feedback on the following points:
> > > > > > * Fallback when eBPF is not supported by the kernel
> > > > >
> > > > >
> > > > > Yes, and it could also a lacking of CAP_BPF.
> > > > >
> > > > >
> > > > > > * Live migration to the kernel that doesn't have
eBPF support
> > > > >
> > > > >
> > > > > Is there anything that we needs special treatment here?
> > > > >
> > > > > Possible case: rss=on, vhost=on, source system with
kernel 5.8
> > > > > (everything works) -> dest. system 5.6 (bpf does not
work), the adapter
> > > > > functions, but all the steering does not use proper queues.
> > > >
> > > > Right, I think we need to disable vhost on dest.
> > > >
> > > >
> > > > >
> > > > >
> > > > > > * Integration with current QEMU build
> > > > >
> > > > >
> > > > > Yes, a question here:
> > > > >
> > > > > 1) Any reason for not using libbpf, e.g it has been
shipped with some
> > > > > distros
> > > > >
> > > > >
> > > > > We intentionally do not use libbpf, as it present only
on some distros.
> > > > > We can switch to libbpf, but this will disable bpf if
libbpf is not
> > > > > installed
> > > >
> > > > That's better I think.
> > > >
> > > >
> > > > > 2) It would be better if we can avoid shipping
bytecodes
> > > > >
> > > > >
> > > > >
> > > > > This creates new dependencies: llvm + clang + ...
> > > > > We would prefer byte code and ability to generate it if
prerequisites
> > > > > are installed.
> > > >
> > > > It's probably ok if we treat the bytecode as a kind of
firmware.
> > > That is explicitly *not* OK for inclusion in Fedora. They
require that
> > > BPF is compiled from source, and rejected my suggestion that
it could
> > > be considered a kind of firmware and thus have an exception
from building
> > > from source.
> >
> >
> > Please refer what it was done in DPDK:
> >
> > http://git.dpdk.org/dpdk/tree/doc/guides/nics/tap.rst#n235
> >
> > I don't think what proposed here makes anything different.
>
> I'm not convinced that what DPDK does is acceptable to Fedora either
> based on the responses I've received when asking about BPF handling
> during build. I wouldn't suprise me, however, if this was simply
> missed by reviewers when accepting DPDK into Fedora, because it is
> not entirely obvious unless you are looking closely.
FWIW, I'm pushing back against the idea that we have to compile the
BPF code from master source, as I think it is reasonable to have the
program embedded as a static array in the source code similar to what
DPDK does. It doesn't feel much different from other places where
apps
use generated sources, and don't build them from the original source
every time. eg "configure" is never re-generated from
"configure.ac <http://configure.ac>"
by Fedora packagers, they just use the generated "configure" script
as-is.
Regards,
Daniel
--
|: https://berrange.com -o-
https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o-
https://www.instagram.com/dberrange :|