qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 04/12] usb: Add support for input pipelining


From: Hans de Goede
Subject: Re: [Qemu-devel] [PATCH 04/12] usb: Add support for input pipelining
Date: Mon, 08 Oct 2012 16:50:32 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:15.0) Gecko/20120911 Thunderbird/15.0.1

Hi,

On 10/08/2012 02:50 PM, Gerd Hoffmann wrote:
On 10/08/12 09:51, Hans de Goede wrote:
Currently we effectively only do pipelining for output endpoints, since
controllers will stop filling the queue for a packet with stop the queue
on a short read semantics enabled.

This causes large input transfers to get split into multiple packets, which
comes with a serious performance penalty.

This patch makes it possible to do pipelining for input endpoints, by
re-combining packets which belong to 1 transfer at the guest device-driver
level into 1 large packet before sending them to the device, this is mostly
useful as a significant performance improvement for redirection of real USB
devices.

This splitting and recombining is too much magic in the core layer for
my taste.

I disagree, let me try to explain below.

I think the core work flow should stay as-is.  Instead we should add
more meta data to USBPacket (stop-queue-on-short-read bit, maybe
interrupt-on-complete too).

We agree on the meta data :)

Continue queuing packets even with
stop-queue-on-short-read set.

This can only happen if we combine them first. We *must* ensure that
we do not read additional data after a short completion and add
that to a packet belonging to the previous short ended transfer, or
if we end up cancelling after the short completion, throwing away that
data, otherwise the guest and device can get out of sync. Which is exactly
what was happening with the uhci code, before I added the SPD tests there.

Another important reason for combining is that in most cases, we only
see short-not-ok flags because the device-driver in the guest has asked
for a single bulk in transfer which does not fit in a single td, so
it gets split, and to make sure that if it ends up short, the problem
of data from another part of the protocol (ie usb mass storage bulk only
transport data versus sense) ending up in tds belonging to the previous
transfer does not happen, the short-not-ok flag gets set on all
packets but the last one. Notice btw that the xhci controller has no
notion of short-not-ok, instead it simply allows for very large
bulk transfers to be submitted in one "atomic" entity.

So far to not cause this mixup of to which protocol phase in-data belongs,
we do not queue up packets to the device past a short-not-ok boundary,
but this means for serial <-> usb converters that instead of doing:

submit 1st 256 byte bulk in
submit 2nd 256 byte bulk in
<wait for completions>
<after a single transfer completion>
resubmit 256 byte bulk in

We do:
submit 1st 64/256 byte bulk in
<wait for completion>
<on full 64 byte completion>
submit 2nd 64/256 byte bulk in
<etc>

Meaning that for receiving 256 bytes we pay 4 times the round trip
time instead of 1 time, and since we do not submit further packets
after a short-not-ok packet, that we don't have a 2nd 256 byte
transfer queued up ready to receive more data while the 1st
one is being processed. Making reading from such a device highly
unreliable if there is any sort of load on the host.

The whole combining "magic" results in a single 256 byte packet, which
does not have its short-not-ok flag set, allowing true pipelining and
having another in transfer queued up to keep the device busy will the
1st one is being processed.

Re-combining the packets into a single transfer, is the *only* way to
ensure that a short completion cannot end up reading some data from
the next protocol phase.

One possible alternative to re-combining would be to pass the
short-not-ok flag along the chain all the way down into usbfs, but
that won't work since the USBFS_URB_SHORT_NOT_OK flag does not
work with XHCI controllers! There has even been discussion about
outright refusing packet submissions with that flag set in newer
Linux kernels, but that would cause regressions for apps which set
it when they don't really need it (and most don't). So since more
and more machines are using XHCI controllers, passing the
short-not-ok flag along is not a valid option.

So lets try to split this discussion into multiple questions:

1) Can we agree that re-combining input packets back into their original
transfers as done at the guest device driver level, is useful and given
the serial <-> usb converters case, even necessary to have ?


2) If we agree on 1), what is the right place to do them ?

To be able to properly combine packets into one larger transfer, an
uncombine them again later, some coordination is needed between the
entity doing the combining / uncombining and the hcd code:

2a) The combining entity needs to know when the hcd is done with
queuing up packets for an ep.

2b) When uncombining it is possible that less packets are actually
used / filled with data, then were combined into one transfer. When
this happens the hcd code will stop processing the queue after the
short packet, and wait for the guest to restart it. When the guest
restarts the queue it will be modified and the un-used packets will
be gone from it. Since the hcd code keeps its own queue, when
when uncombining the hcd code needs to be told to forget about
the unused packets.

Note that currently we have the same problem on a halt of the ep
queue already, and currently we expect the hcd to completely empty
the queue on a stall. I'm planning to change this code path over
to also let the core tell the hcd to forget about the packets it
had queued beyond the stall, as this seems something to do at the
core level rather then reproducing it in every hcd emulation
separately.


Given the necessary close interaction between the hcd code and
the combine / uncombine code + not wanting to reproduce the
same code in both host-linux.c and redirect.c I believe that
the usb core is the right place for this.

> Maybe add a callback to notify USBDevice
when the host controller is done filling the queue, so USBDevices can
process all queued packets in one go (bottom half would work too
though).

Changing the callback which I added for this to a bh is fine with me,
I think that both from a making clear what is going on when reading
the code, and from a having all the data hot in cache pov, the callback
is better though.

Then expect USBDevices to get things right based on the
USBPacket flags passed on (i.e. have host-linux use
USBFS_URB_BULK_CONTINUATION as needed).

Note that the need for host-linux to use USBFS_URB_BULK_CONTINUATION +
SHORT_NOT_OK flags with older kernels, and to simply no longer split with
newer kernels, already exists! A single ehci td can span 20k, and we
split at 16k, so an input transfer ending short in the first 16k will
already cause the wrong thing to happen (the sense data of usb-ms gets read
as the next 4k), without input pipelining coming into play at all, and with
the xhci emulation we can already get much larger in transfers...

The reason why I disable input pipelining for host-linux for now in my
patchset is because it will make the problem worse, not because the
problem is new.

For the full details on all the magic needed to correctly handle
larger input transfers on usbdevfs with all possible different
kernel versions, please see:
https://github.com/libusbx/libusbx/commit/ede02ba91920f9be787a7f3cd006c5a4b92b5eab

Or just switch to libusb ...

This way usb-host and usbredir will see what is really going on instead
of having the usb core magically fixing up stuff for them.

I believe this is based on you thinking simply passing along a short-not-ok
flag all the way down into usbfs will be enough, but it definitely is not
enough, since the XHCI hardware has no notion of short-not-ok, the only
way to do input pipelinig with an XHCI attached device is to re-combine
the split packets into a single large input transfer. And the proper
place for such combining code is in the core IMHO.

I think this
will serve us better long-term.  Maybe you are seeing this "data *and*
stall" issue (patch 1) exactly because of all this combining & splitting
logic?

I'm not seeing the data and stall condition in practice at all, but while
designing all this I noticed that this can happen. a single ehci td
can be a multiple of the endpoints maxPacketSize and then the endpoint
can respond with maxPacketSize data, maxPacketSize data, stall at which
point we will have both data and a stall condition in one.

And the Linux ehci + usbfs + libusb + usbredir code all handle passing
along both data and a stall as a result for one packet / transfer,
so this is something which eventually we will need to support in
the qemu usb core too. For now I decided on the quick fix in usbredir
to cover the eventuality of this actually happening and added a
warning to see if this actually ever happens (which it seems not to
in my testing done sofar).

Regards,

Hans




reply via email to

[Prev in Thread] Current Thread [Next in Thread]