qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] RFC: guest-side retrieval of fw_cfg file


From: Gabriel L. Somlo
Subject: Re: [Qemu-devel] RFC: guest-side retrieval of fw_cfg file
Date: Tue, 14 Jul 2015 15:24:39 -0400
User-agent: Mutt/1.5.23 (2014-03-12)

On Tue, Jul 14, 2015 at 07:48:30PM +0100, Richard W.M. Jones wrote:
> > > > /* read chunk of given fw_cfg blob (caller responsible for 
> > > > sanity-check) */
> > > > static inline void fw_cfg_read_blob(uint16_t select,
> > > >                                      void *buf, loff_t pos, size_t 
> > > > count)
> > > > {
> > > >         mutex_lock(&fw_cfg_dev_lock);
> > > >         outw(select, FW_CFG_PORT_CTL);
> > > >         while (pos-- > 0)
> > > >                 inb(FW_CFG_PORT_DATA);
> > > >         insb(FW_CFG_PORT_DATA, buf, count);
> > > >         mutex_unlock(&fw_cfg_dev_lock);
> > > > }
> > > 
> > > How slow is this?
> > 
> > Well, I think each outw() and inb() will result in a vmexit, with
> > userspace handling emulation, so much slower comparatively than
> > inserting into a list (hence mutex here, vs. spinlock there).
> 
> I wonder if using a string instruction (ie. rep insb etc) would be
> faster.  On x86, qemu specifically optimizes these.  Maybe GCC turns
> the above into a string instruction?

After some digging...

The insb call is indeed implemented as a "rep ins" in the kernel, and
rep appears to be optimized on the host/kvm side, so we might be in
luck.

The "while (pos-- > 0) inb(FW_CFG_PORT_DATA);" portion is there just
in case, since most of the time pos==0 and we don't need to skip any
bytes from the given fw_cfg blob before getting to the optimized insb.

I guess partial interleaved raw reads of different blobs are
*theoretically* possible, but I expect in practice they'll be
rather unlikely...

Thanks,
--Gabriel

> The reason I note all this is because there has been an ongoing
> discussion about the slowness of fw_cfg.  Starting in 2010 in fact:
> 
> https://lists.gnu.org/archive/html/qemu-devel/2010-07/msg00962.html
> https://lists.gnu.org/archive/html/qemu-devel/2011-10/msg00996.html
> 
> On aarch64 kernel loading is really slow because it can only transfer
> (IIRC) 8 bytes at a time, and there are no string instructions we can
> use to speed it up.
> 
> A long time ago I wrote a memcpy and a "pseudo-DMA" interface for
> fw_cfg, but they were both roundly rejected as you can find in the
> archives.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]