qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] [PATCHv2 2/2] Adding basic calls to libseccomp in


From: Corey Bryant
Subject: Re: [Qemu-devel] [RFC] [PATCHv2 2/2] Adding basic calls to libseccomp in vl.c
Date: Tue, 19 Jun 2012 12:51:21 -0400
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:13.0) Gecko/20120605 Thunderbird/13.0



On 06/19/2012 11:37 AM, Will Drewry wrote:
On Tue, Jun 19, 2012 at 8:35 AM, Corey Bryant <address@hidden> wrote:


On 06/18/2012 06:14 PM, Will Drewry wrote:

[-all]

On Mon, Jun 18, 2012 at 4:53 PM, Corey Bryant <address@hidden>
wrote:



On 06/18/2012 04:18 PM, Blue Swirl wrote:


On Mon, Jun 18, 2012 at 3:22 PM, Corey Bryant
<address@hidden>
wrote:




On 06/18/2012 04:33 AM, Daniel P. Berrange wrote:



On Fri, Jun 15, 2012 at 07:04:45PM +0000, Blue Swirl wrote:



On Wed, Jun 13, 2012 at 8:33 PM, Daniel P. Berrange
<address@hidden>
wrote:



On Wed, Jun 13, 2012 at 07:56:06PM +0000, Blue Swirl wrote:



On Wed, Jun 13, 2012 at 7:20 PM, Eduardo Otubo
<address@hidden> wrote:



I added a syscall struct using priority levels as described in the
libseccomp man page. The priority numbers are based to the
frequency
they appear in a sample strace from a regular qemu guest run under
libvirt.

Libseccomp generates linear BPF code to filter system calls, those
rules
are read one after another. The priority system places the most
common
rules first in order to reduce the overhead when processing them.

Also, since this is just a first RFC, the whitelist is a little
raw.
We
might need your help to improve, test and fine tune the set of
system
calls.

v2: Fixed some style issues
        Removed code from vl.c and created qemu-seccomp.[ch]
        Now using ARRAY_SIZE macro
        Added more syscalls without priority/frequency set yet

Signed-off-by: Eduardo Otubo <address@hidden>
---
  qemu-seccomp.c |   73
++++++++++++++++++++++++++++++++++++++++++++++++++++++++
  qemu-seccomp.h |    9 +++++++
  vl.c           |    7 ++++++
  3 files changed, 89 insertions(+)
  create mode 100644 qemu-seccomp.c
  create mode 100644 qemu-seccomp.h

diff --git a/qemu-seccomp.c b/qemu-seccomp.c
new file mode 100644
index 0000000..048b7ba
--- /dev/null
+++ b/qemu-seccomp.c
@@ -0,0 +1,73 @@




Copyright and license info missing.

+#include <stdio.h>
+#include <seccomp.h>
+#include "qemu-seccomp.h"
+
+static struct QemuSeccompSyscall seccomp_whitelist[] = {




'const'

+    { SCMP_SYS(timer_settime), 255 },
+    { SCMP_SYS(timer_gettime), 254 },
+    { SCMP_SYS(futex), 253 },
+    { SCMP_SYS(select), 252 },
+    { SCMP_SYS(recvfrom), 251 },
+    { SCMP_SYS(sendto), 250 },
+    { SCMP_SYS(read), 249 },
+    { SCMP_SYS(brk), 248 },
+    { SCMP_SYS(clone), 247 },
+    { SCMP_SYS(mmap), 247 },
+    { SCMP_SYS(mprotect), 246 },
+    { SCMP_SYS(ioctl), 245 },
+    { SCMP_SYS(recvmsg), 245 },
+    { SCMP_SYS(sendmsg), 245 },
+    { SCMP_SYS(accept), 245 },
+    { SCMP_SYS(connect), 245 },
+    { SCMP_SYS(bind), 245 },




It would be nice to avoid connect() and bind(). Perhaps seccomp
init
should be postponed to after all sockets have been created?




If you want to migrate your guest, you need to be able to
call connect() at an arbitrary point in the QEMU process'
lifecycle. So you can't avoid allowing connect(). Similarly
if you want to allow hotplug of NICs (and their backends)
then you need to have both bind() + connect() available.




That's bad. Migration could conceivably be extended to use file
descriptor passing, but hotplug is more tricky.




As with execve(), i'm reporting this on the basis that on the previous
patch posting I was told we must whitelist any syscalls QEMU can
conceivably use to avoid any loss in functionality.




Thanks for pointing out syscalls needed for the whitelist.

As Paul has already mentioned, it was recommended that we restrict all
of
QEMU (as a single process) from the start of execution.  This is
opposed
to
other options of restricting QEMU from the time that vCPUS start,
further
restricting based on syscall parms, or decomposing QEMU into multiple
processes that are individually restricted with their own seccomp
whitelists.



Can each thread have separate seccomp whitelists? For example CPU
threads should not need pretty much anything but the I/O thread needs
I/O.


No, seccomp filters are defined and enforced at the process level.


I'll keep lurking :) especially since I don't know the internals of
qemu well, but you can do per-thread seccomp filters since
processes==threads on linux. The real risk is that threads share so
much that an attack on the CPU thread may be able to parlay that into
a syscall proxy on a another thread.  Probably what would make sense
in that way is a loose global filter, then have each sub-thread
install a functionality specific second filter.

I may be way off base though, so feel free to just tell me to keep lurking
:)

Thanks again for all the support and for pushing hard to get this
functionality in qemu!


Please keep lurking!  I appreciate the input and education.  :)

So whether it's a thread or process, I assume it will have its own a
task_struct, allowing us to set a filter per thread or per process.  The
difference being that threads share more resources than processes.  Sort of
thinking out loud here to see if I'm right.

Exactly!

It doesn't seem ideal vs process separation, but it's do-able.

Yep -- so for something like qemu, you could install a global baseline
policy (e.g., union of all needed syscalls) then for each thread, they
can install a more restrictive set.  The actual security guarantees
will be the total synthesis because of cross-thread attacks, but it
would make exploitation pretty painful.

If you want better guarantees, then process separation is needed.  One
option is even doing brokering for complex syscalls using either
ptrace or a sigsys handler, but that is likely too much to get into
while establishing a baseline.


In response to "Can each thread have separate seccomp whitelists?" please take a look at the thread above from Will Drewry. seccomp *can* be used per thread. However, it's not ideal vs per process seccomp filters.

--
Regards,
Corey


You don't mind if I share your input with the others, do you?

Of course not!

cheers!


--
Regards,
Corey




I think this approach is a good starting point that can be further
tuned
in
the future.  And as with most security measures, defense in depth
improves
the cause (e.g. combining seccomp with DAC or MAC).



Agreed.


--
Regards,
Corey













reply via email to

[Prev in Thread] Current Thread [Next in Thread]