qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] scsi-generic and max request size


From: Benjamin Herrenschmidt
Subject: Re: [Qemu-devel] scsi-generic and max request size
Date: Wed, 22 Dec 2010 09:05:26 +1100

> > So back to square 1 ... my vscsi (and virtio-blk too btw) can
> > technically pass a max size to the guest, but we don't have a way to
> > interrogate scsi-generic (and the underlying block driver) which is the
> > main issue (that plus the fact that the ioctl seems to be broken in
> > "compat" mode for /dev/sg specifically)...
> > 
> Ah, the warm and fuzzy feeling knowing to be not alone in this ...
> 
> This is basically the same issue I brought up with the first
> submission round of my megasas emulation.

heh.

> As we're passing scatter-gather lists directly to the underlying
> device we might end up sending a request which is improperly
> formatted. The linux block layer has three limits onto which a
> request has to be formatted:
> - Max length of the scatter-gather list (max_sectors)
> - Max overall request size (max_segments)

Didn't you swap the 2 above ? max_sectors is the max overall req. size
and max_segments the max number of SG elements afaik :-)

> - Max length of individual sg elements (max_segment_size)

> newer kernels export these limits; they have been exported with
> commit c77a5710b7e23847bfdb81fcaa10b585f65c960a.
> For older kernels, however, we're being left in the dark here.

Well, first of all, "sg" is not there so that doesn't help with the
scsi-generic problem much, then parsing sysfs... yuck.

> So on newer kernel we probably could be doing a quick check on the
> block queue limits and reformat the I/O if required.

Maybe but then, "sg" isn't there. We "could" I suppose use "sr" as an
indication tho when we know it's a cdrom.

> Instead of reformatting we could be sendiong each element of an eg
> list individually. Thereby we would be introducing some slowdown as
> the sg lists have to be reassembled again by the lower layers, but
> we would be insulated from any sg list mismatch.
> However, this won't cover requests with too large sg elements.
> For those we could probably use some simple divide-by-two algorithm
> on the element to make them fit.

How can we ? We need a single request to match a single sg list anyways
no ?

Let's say you get a READ10 from the guest for 200Kb and your underlying
max_sectors is 128Kb. How do you want to "break that up" ? The only way
would be to make it two different READ10's and that's a can of worms
especially if you start putting tags into the picture...

> But seeing we have to split the I/O requests anyway we might as well
> use the divide-by-two algorithm for the sg lists, too.
> 
> Easiest would be if we could just transfer the available bits and
> push the request back to the guest as a partial completion.
> Sadly the I/O stack on the guest will choose to interpret this as an
> I/O error instead of retrying the remainder :-(
> 
> So in the long run I fear we have to implement some sort of I/O
> request splitting in Qemu, using the values from sysfs.

So in my case, I'm happy for the time being to continue doing bounce
buffering and so my only problem at the moment is the max request size
(aka max_sectors). Also I -can- tell the guest what my limitation is,
it's part of the vscsi login protocol. I can look into doing DMA
directly to the guest SG lists later maybe.

However, I can't quite figure out how to reliably obtain that
information in my driver since on one hand, the ioctl doesn't seem to
work in mixed 32/64-bit environments, and on the other hand, sysfs
doesn't seem to have anything for "sg" in /sys/class/block... Besides,
those are both Linux-isms... so we'd have to be extra careful there too.

Cheers,
Ben.

> Cheers,
> 
> Hannes
> --
> Dr. Hannes Reinecke                 zSeries & Storage
> address@hidden                              +49 911 74053 688
> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: Markus Rex, HRB 16746 (AG Nürnberg)





reply via email to

[Prev in Thread] Current Thread [Next in Thread]