qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] linux-user: implement pipe2 syscall


From: Jamie Lokier
Subject: Re: [Qemu-devel] [PATCH] linux-user: implement pipe2 syscall
Date: Wed, 6 May 2009 12:08:32 +0100
User-agent: Mutt/1.5.13 (2006-08-11)

Riku Voipio wrote:
> > The point of pipe2() with FD_CLOEXEC is to be atomic: make sure
> > another thread can never see the file descriptor with FD_CLOEXEC not set.
> 
> > If you can't guarantee that, it's better to return ENOSYS as every
> > application using pipe2() like this has a fallback to use pipe() and
> > FD_CLOEXEC itself, and probably has application logic to protect
> > against the race condition.
> 
> > If there's only one thread, or if you can arrange to block any
> > concurrent clone/fork/execve calls in other threads (in QEMU) during
> > the race window, then it's fine to emulate it with fcntl.
> 
> We haven't returned from the pipe2 syscall when setting the flag with fcntl.
> Before returning from the syscall, the pipe file descriptors could point
> to anything (unitialized memory, zeros, ...)

That's not possible with file descriptors.  A user program never sees
an uninitialized descriptor - because descriptors aren't visible to
the user program (in any threads) until they are stored into the file
descriptor table for the process.  That happens once the descriptor is
completely initialised, and for pipe2() that means _after_ FD_CLOEXEC
is set.

Of course it's usually an application bug to use a specific file
descriptor from another thread, when that descriptor is still being
created :-)

But it's not a bug to call execve(), or fork() then execve(), in
another thread at the same time as descriptors are being created.
Those calls scan the whole file descriptor table, and look at the
FD_CLOEXEC flags.

The bug is that execve() in parallel with pipe()+fcntl() can result in
the file descriptor getting copied to a child process, because
execve() scans it.  That's why pipe2() exists, to fix that bug
properly by making it impossible.

I haven't looked too closely at how guest file descriptors are handled
in QEMU these days.  In an older version I'm looking at, guest file
descriptors are simply host file descriptors so the pipe2 emulation is
broken in this way.

If QEMU maintained a guest file descriptor table internally, emulating
what a kernel does, this would be solved automatically, but it doesn't.

You can solve it quite simply for any host kernel with the lock
solution I just posted in another mail on this thread.  The same
method works for all the other syscalls taking *_CLOEXEC flags, so
it's probably a good idea :-)

-- Jamie




reply via email to

[Prev in Thread] Current Thread [Next in Thread]