qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Block I/O optimizations


From: Abel Gordon
Subject: Re: [Qemu-devel] Block I/O optimizations
Date: Thu, 28 Feb 2013 20:20:08 +0200

Stefan Hajnoczi <address@hidden> wrote on 28/02/2013 04:43:04 PM:

> > I see your point, but the shared-process only needs access to
> > the virtio ring/buffers (not necessary the entire memory of
> > all the guests), the network sockets and image files opened by
> > all the qemu user-space process. So, if you have a security hole,
> > an attacker can get access only to all these resources.
>
> You must never be able to get access to other VMs disk/memory/network.
> That is game over:
>  * Disk - you can steal their data or tamper with it.
>  * Memory - same as disk really, because you can inject code to do
>    anything you want.
>  * Network - you can spoof the guest or monitor its traffic.  Although
>    due to crypto this is the least dangerous of the three resources.

I agree with all the points you mentioned but you need to differentiate
between getting access to the "qemu" process and getting access to the
"shared I/O process".  Assuming some attacker got access to the "qemu"
process, the attacker can read/write to the I/O queues of this qemu
process but not the I/O queues of the other processes.
If the attacker got access to the shared I/O process, then he will
get access to the resources of all the VMs as you mentioned. However,
remember that the shared I/O process does NOT execute/emulate the
VM code. The shared process just processes I/O requests, so the VM code
can not attack the shared I/O process directly. The VM code must first
exploit some security bug in the I/O path to inject code. With a good
design, proper precautions and assuming that a qemu process can become
malicious, it is possible to secure shared I/O process.

> > With the traditional model (not shared thread), if you have a security
> > hole in qemu then an attacker will be able to exploit exactly the same
> > security hole to obtain access to the resources "all the qemu
instances"
> > have access. I don't see why a security hole in qemu will work only for
> > VM1 and not VM2...they are hosted using exactly the same qemu code.
>
> Most QEMU security holes will not be remote exploitable.  Now if the
> attacker has access to VM1 (they are renting a VM on your cloud) they
> cannot get access to VM2 due to isolation.

Let's assume you are right and most of the qemu security holes will
not be remote exploitable and require to put some code as part of the
VM image.  That means that a malicious VM can take control of the host
qemu process but not the shared I/O process. In this case, the malicious
qemu process can't access resources of other VMs. It can try to "attack"
the shared I/O process (e.g. inject code via the I/O path, impersonate as
other VM...), but the shared I/O process can be designed assuming a qemu
process may become compromised.

> > If you move the virtio back-end from qemu to a different user-space
> > process,
> > it will be easier to analyze, maintain the code, and detect security
> > bugs.
>
> I agree with this to some extent.  It's the micro-kernel vs monolithic
> kernel debate.  In theory micro-kernel is a nicer design.
>
> > Maybe, you can use this model also to improve security:
> > you can give access to the network/disk only to the shared virtio
back-end
> > process and not to the qemu processes...
>
> That's not possible to achieve as long as the QEMU process has the guest
> memory and can control guest execution.  QEMU could inject guest code
> that accesses the disk.

I think it is possible but requires major changes (e.g. move from qemu
to the shared I/O process "all" the I/O logic). In this case, the
QEMU process only controls the execution/emulation and memory of the VM
and has no access to the network/disk.


> > The logic to select which queue should be processed and how many
requests
> > should
> > bee processed from this specific queue is implemented in user-space and
> > depends
> > on the ongoing  I/O activity in all the queues. Note that with this
model
> > you can
> > process many queues in less than 1 scheduler time slice.
> > With the traditional thread-per device model, the Linux scheduler is
> > actually who decides
> > which queue will be processed and the amount of requests that will be
> > processed for this
> > specific queue (cycles the thread runs). Note Linux has no information
> > about the ongoing
> > activity and status of the queues. The scheduler only knows if a thread
is
> > waiting
> > (empty queue) or is ready to run (queue has data = event signaled).
> >
> > Finally, with the shared-thread model you have have significantly less
> > thread/process
> > context switches compared to 1 I/O thread per qemu process.
>
> This final point is the only one that I can 100% agree with.

And it may be a major issue from the performance perspective
if you take in account the additional TLB flushes and also tlb/
and data cache misses caused by the process context switches.

> Everything else can be handled by a system that is designed to use the
> Linux scheduler rather than bypass it:
>
> 1. Use a budget to set a hard limit on the amount of resources to expend
>    per queue per iteration.  (We don't do this today.)

But remember there are two things here: how many "cpu" cycles you consume
to process a request and how many requests ("I/O") you process per
iteration. The "cycles" per "byte" is not constant and depends on the
system, the type of request, the load....

> 2. Use an I/O resource controller (cgroups blkio controller or QEMU I/O
>    throttling, which are both supported today) to set a per-guest
>    quality of service (max IOPS, max bandwidth, priorities).  Note that
>    this isn't about CPU scheduling, it's about I/O request scheduling.

Exactly, but using the thread per device model, the I/O is actually
scheduled by the CPU scheduler which has no inner information about
the content of the queues. From the Linux perspective, the queues are just
a bunch of memory and not "pending" I/O requests...

> 3. Choose an appropriate I/O scheduler on the host (e.g. deadline) to
>    meet your requirements.  This is possible today.

Not sure it will work properly. You need to differentiate between pending
I/O
requests in the virtual device queues and I/O requests processed/pending
in the host. This is not the same.

> 4. Use thread priorities to favor specific guests.  (We don't do this
>    today.)

You will be using a CPU prioritization mechanisms to prioritize I/O. I
don't
think this is a good direction, and in general, it is very difficult to
tune
thread priorities to prioritize I/O.

> I think extending and tuning the existing mechanisms is the way to go.
> I don't see obvious advantages other than reducing context switches.

Maybe it is worth checking...
We did experiments using vhost-net and vhost-blk. We measured and compared
the traditional model (kernel thread per VM/virtual device) to the
shared-thread model with fine-grained I/O scheduling (single kernel thread
used to serve multiple VMs). We noticed improvements up-to 2.5x
in throughput and almost half the latency when running up-to 14 VMs.

> You also lose out on the I/O scheduler since everything is being
> submitted by a single shared thread.
...but the single thread actually schedules the I/O more efficiently.

> Since I don't see a killer advantage, both approaches can achieve good
> results.

Based on our experiments, the shared-thread with the fine-grained
I/O scheduling can improve significantly performance and scalability.

> KVM's philosophy is to make use of Linux instead of
> duplicating its functionality, so using the scheduler is in the spirit
> of that.

Is not about duplicating Linux functionality, it's about improving
qemu "vritual I/O" model. The code you use to handle virtual queues will
always be different than kernel I/O code. I don't see why QEMU
queue handling should be treated/designed differently than any other
user-space process than handle (I/O) requests from different
clients/queues, such as a Web Servers, File Severs, Message Queue
Servers...




reply via email to

[Prev in Thread] Current Thread [Next in Thread]