Re: [Qemu-devel] "usb: uhci: Look up queue by address, not token"

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] "usb: uhci: Look up queue by address, not token"

From:	Hans de Goede
Subject:	Re: [Qemu-devel] "usb: uhci: Look up queue by address, not token"
Date:	Sat, 17 Nov 2012 11:20:09 +0100
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20121016 Thunderbird/16.0.1

Hi,

On 11/16/2012 04:06 PM, Jan Kiszka wrote:

On 2012-11-16 14:29, Hans de Goede wrote:

Hi,

On 11/15/2012 04:40 PM, Jan Kiszka wrote:

Hi Hans,

On 2012-11-15 16:19, Hans de Goede wrote:

Hi Jan,

I just saw your $subject patch in Gerd's usb-next tree, and I've a question
about it. The token should be enough to uniquely identify a device + ep,
and unless a guest uses multiple qhs for a singe ep, that _should_ be enough.


But what disallows that the guest issues multiple requests (QH + series
of TDs) for a single endpoint? I'm not finding any trace in the spec
that disallows this.


It may not be explicitly forbidden in the spec, but the whole text of the
spec implies a 1 on 1 relationship between qhs and eps, and this is how
all major guests do it.


Was afraid of this...

And my special guest is stumbling over that limitation in QEMU.


I guess you're not at liberty to disclose what guest this is ?


Hairy legacy code with additions I'm not yet able to share.

<snip>

The reason why I want to know, is that identifying the queue by qh has a
disadvantage, to be precise I believe the following can then happen:

1) The guest queues up multiple requests for a queue
2) We execute one, go async
3) The guest changes it mind an unlinks the qh
4) The guest will think the queue is cancelled after frindex has
changed by 1, but we keep it around for 64 frames (which sucks,
and I want to improve on this, but we need to for various reasons)
5) The guests submits some new requests to the queue, using a
new qh (which possibly has a different address).
6) We see the new qh, and execute a request from there
7) The 1st request on the old qh completes, and we execute the next
8) Now things are a mess as we're executing requests from the old
(cancelled from the guest pov) and new queue intermixed...

Using the token to identify the queue fixes this, cause we will
find the same queue for the old and new qh, and uhci_queue_verify()
will then fail because of the qh addr change, and then we cancel
the old queue. IOW then we do the right thing.


I was afraid of triggering some conflict but, as pointed out above, the
current code also does something wrong. It associates two different
active QH with the same queue and then triggers a failure for the first
QH as it wrongly thinks the guest has unlinked it.


So I'm wondering if there is another, potentially better fix for
what you are seeing?


/me too. I'm not fully understanding your scenario above yet,
specifically when and how the guest can correctly remove an active QH,


A guest can unlink an active qh when it wants to cancel the requests
in there (ie a driver level timeout kicks in). This is a quite normal
thing to do (unlike using multiple qhs for a single ep).

For ehci we can reliable detect this happening (except for interrupt
endpoints, sigh) thanks to the "doorbell" mechanism. But for uhci it
is harder, and things are further complicated by the fact that
Linux' full speed recovery code for bulk endpoints temporarily unlinks
and then relinks a pending requests to make some changes to the qh
while requests are in flight ... (bad Linux, bad!).

So we rely on a combination of measures to detect cancels in the
qemu uhci code, of which the qh validation is one, and this gets
broken by your patch.


How should unlink be defined here? A qh is no longer present in any
chain of any frame? For how long?


That is the problem, for async and control transfers it would
be "the qh is no longer present in the chain for the current frame",
which we could use to if it were not for Linux doing its unlink / relink
trick (one could argue uhci spec violation), for full speed recovery
on bulk endpoints.

Another problem is interrupt queue heads, which are only present once
every interrupt ep interval frames.

Real hardware does not have any issues with this, since it simply
stops talking to the device if it stops encountering tds for it, which
allows the dirty tricks like Linux is using ..., but if we've send of
a request to the device, and the device is handling it async, then we
need to cancel the request, which means actively detecting unlinks.

so I cannot provide good suggestions at this point.


I see 2 solutions at this point:
1) You modify your special guest if you can
2) We add a property to the uhci emulation to search for queues
by address rather then by token, which would of course default
to false.


Well, this particular thing worked on real hw,


I guess this means that the code is using depth first rather then
breadth first, thus avoiding the problem of intermixing tds from the
2 qhs pointing to the same ep, this means however that traffic
to other eps, including eps from the same device! Will be bandwidth
starved as it won't get serviced at all until all the tds in the qh
which is using depth first schedule traversal is finished!

Normally one would use depth first traversal *only* for qhs for
interrupt endpoints, and use breadth first traversal for qhs
for control / bulk endpoints dividing the remaining bandwidth
fairly between all eps. But this of course requires a one on one
relation ship between qhs and eps, as having multiple qhs point to
1 ep with breadth first traversal will lead to chaos.

so I thought it would be good to adjust QEMU.


I agree that something which works on real hardware should work on
qemu too, without requiring special flags, but old (pre xhci) usb
controllers are just very very hard to emulate.

One of the reasons is that they handle outstanding request cancellation
by simple dropping the request from the schedule without ever telling
the controller. A real controller won't care since it does not keep
any state for outstanding requests, it just sends/asks the device data
each time it encounters a td.

> But as I'm struggling with my guest anyway, it may

turn out that we will no longer need this in the end - at least in this
case. Still, I'm with a bad feeling regarding the current heuristics for
identifying invalidated queues.


As the person who has made the recent invalid queue detection code, I can
say that I'm actually reasonable confident in it :)

Regards,

Hans

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] "usb: uhci: Look up queue by address, not token", Hans de Goede, 2012/11/15
- Re: [Qemu-devel] "usb: uhci: Look up queue by address, not token", Jan Kiszka, 2012/11/15
  - Re: [Qemu-devel] "usb: uhci: Look up queue by address, not token", Gerd Hoffmann, 2012/11/16
    - Re: [Qemu-devel] "usb: uhci: Look up queue by address, not token", Jan Kiszka, 2012/11/16
    - Re: [Qemu-devel] "usb: uhci: Look up queue by address, not token", Gerd Hoffmann, 2012/11/16
    - Re: [Qemu-devel] "usb: uhci: Look up queue by address, not token", Hans de Goede, 2012/11/17
  - Re: [Qemu-devel] "usb: uhci: Look up queue by address, not token", Hans de Goede, 2012/11/16
    - Re: [Qemu-devel] "usb: uhci: Look up queue by address, not token", Jan Kiszka, 2012/11/16
    - Re: [Qemu-devel] "usb: uhci: Look up queue by address, not token", Hans de Goede <=

Prev by Date: Re: [Qemu-devel] [PATCH] Fix out-of-tree and cross compile builds for pixman
Next by Date: Re: [Qemu-devel] "usb: uhci: Look up queue by address, not token"
Previous by thread: Re: [Qemu-devel] "usb: uhci: Look up queue by address, not token"
Next by thread: [Qemu-devel] [PATCH 0/7] virtio: virtio-blk data plane
Index(es):
- Date
- Thread