I like the ioctl() interface. If the overhead matters in your hot path, I suspect you're doing it wrong; use irq fds & ioevent fds. You might fix the semantic mismatch by having a notion of a "current process's VM" and "current thread's VCPU", and just use the one /dev/kvm filedescriptor.
Or you could go the other way, and break the connection between VMs and processes / VCPUs and threads: I don't know how easy it is to do it in Linux, but a VCPU might be backed by a kernel thread, operated on via ioctl()s, indicating that they've exited the guest by having their descriptors become readable (and either use read() or mmap() to pull off the reason why the VCPU exited). This would allow for a variety of different programming styles for the VMM--I'm a fan of CSP model myself, but that's hard to do with the current API.
It'd be nice to be able to kick a VCPU out of the guest without messing around with signals. One possibility would be to tie it to an eventfd; another might be to add a pseudo-register to indicate whether the VCPU is explicitly suspended. (Combined with the decoupling idea, you'd want another pseudo-register to indicate whether the VMM is implicitly suspended due to an intercept; a single "runnable" bit is racy if both the VMM and VCPU are setting it.)
ioevent fds are definitely useful. It might be cute if they could synchronously set the VIRTIO_USED_F_NOTIFY bit - the guest could do this itself, but that'd require giving the guest write access to the used side of the virtio queue, and I kind of like the idea that it doesn't need write access there. Then again, I don't have any perf data to back up the need for this.
The rest of it sounds great.