qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 3/3] replay: introduce block devices record/repl


From: Kevin Wolf
Subject: Re: [Qemu-devel] [PATCH 3/3] replay: introduce block devices record/replay
Date: Thu, 11 Feb 2016 13:18:29 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

Am 11.02.2016 um 12:00 hat Pavel Dovgalyuk geschrieben:
> > From: Kevin Wolf [mailto:address@hidden
> > Am 11.02.2016 um 07:05 hat Pavel Dovgalyuk geschrieben:
> > > > From: Kevin Wolf [mailto:address@hidden
> > > > Am 10.02.2016 um 13:51 hat Pavel Dovgalyuk geschrieben:
> > > > > However, I don't understand yet which layer do you offer as the 
> > > > > candidate
> > > > > for record/replay? What functions should be changed?
> > > > > I would like to investigate this way, but I don't got it yet.
> > > >
> > > > At the core, I wouldn't change any existing function, but introduce a
> > > > new block driver. You could copy raw_bsd.c for a start and then tweak
> > > > it. Leave out functions that you don't want to support, and add the
> > > > necessary magic to .bdrv_co_readv/writev.
> > > >
> > > > Something like this (can probably be generalised for more than just
> > > > reads as the part after the bdrv_co_reads() call should be the same for
> > > > reads, writes and any other request types):
> > > >
> > > > int blkreplay_co_readv()
> > > > {
> > > >     BlockReplayState *s = bs->opaque;
> > > >     int reqid = s->reqid++;
> > > >
> > > >     bdrv_co_readv(bs->file, ...);
> > > >
> > > >     if (mode == record) {
> > > >         log(reqid, time);
> > > >     } else {
> > > >         assert(mode == replay);
> > > >         bool *done = req_replayed_list_get(reqid)
> > > >         if (done) {
> > > >             *done = true;
> > > >         } else {
> > > >             req_completed_list_insert(reqid, qemu_coroutine_self());
> > > >             qemu_coroutine_yield();
> > > >         }
> > > >     }
> > > > }
> > > >
> > > > /* called by replay.c */
> > > > int blkreplay_run_event()
> > > > {
> > > >     if (mode == replay) {
> > > >         co = req_completed_list_get(e.reqid);
> > > >         if (co) {
> > > >             qemu_coroutine_enter(co);
> > > >         } else {
> > > >             bool done = false;
> > > >             req_replayed_list_insert(reqid, &done);
> > > >             /* wait synchronously for completion */
> > > >             while (!done) {
> > > >                 aio_poll();
> > > >             }
> > > >         }
> > > >     }
> > > > }
> > > >
> > > > Where we could consider changing existing code is that it might be
> > > > desirable to automatically put an instance of this block driver on top
> > > > of every block device when record/replay is used. If we don't do that,
> > > > you need to explicitly specify -drive driver=blkreplay,...
> > >
> > > As far, as I understand, all synchronous read/write request are also 
> > > passed
> > > through this coroutines layer.
> > 
> > Yes, all read/write requests go through the same function internally, no
> > matter which external interface was used.
> > 
> > > It means that every disk access in replay phase should match the 
> > > recording phase.
> > 
> > Right. If I'm not mistaken, this was the fundamental requirement you
> > have, so I wouldn't have suggested this otherwise.
> > 
> > > Record/replay is intended to be used for debugging and analysis.
> > > When execution is replayed, guest machine cannot notice analysis overhead.
> > > Some analysis methods may include disk image reading. E.g., qemu-based
> > > analysis framework DECAF uses sleuthkit for disk forensics (
> > https://github.com/sycurelab/DECAF ).
> > > If similar framework will be used with replay, forensics disk access 
> > > operations
> > > won't work if we will record/replay the coroutines.
> > 
> > Sorry, I'm not sure if I can follow.
> > 
> > If such analysis software runs in the guest, it's not a replay any more
> > and I completely fail to see what you're doing.
> > 
> > If it's a qemu component independent from the guest, then my method
> > gives you a clean way to bypass the replay driver that wouldn't be
> > possible with yours.
> 
> The second one. qemu may be extended with some components that
> perform guest introspection.
> 
> > If your plan was to record/replay only async requests and then use sync
> > requests to bypass the record/replay, let me clearly state that this is
> > the wrong approach: There are still guest devices which do synchronous
> > I/O and need to be considered in the replay log, and you shouldn't
> > prevent the analysis code from using AIO (in fact, using sync I/O in new
> > code is very much frowned upon).
> 
> Why do guest synchronous requests have to be recorded?
> Aren't they completely deterministic?

Good point. I think you're right in practice. In theory, with dataplane
(i.e. when running the request in a separate thread) it could happen,
but I guess that isn't very compatible with replay anyway - and at the
first sight I couldn't see it performing synchronous requests either.

> > I can explain in more detail what the block device structure looks like
> > and how to access an image with and without record/replay, but first let
> > me please know whether I guessed right what your problem is. Or if I
> > missed your point, can you please describe in detail a case that
> > wouldn't work?
> 
> You have understood it correctly.
> And what is the solution for bypassing one of the layers from component that
> should not affect the replay?

For this, you need to understand how block drivers are stacked in qemu.
Each driver in the stack has a separate struct BlockDriverState, which
can be used to access its data. You could hook up things like this:

      virtio-blk              NBD server
    --------------           ------------
          |                        |
          v                        |
    +------------+                 |
    | blkreplay  |                 |
    +------------+                 |
          |                        |
          v                        |
    +------------+                 |
    |   qcow2    | <---------------+
    +------------+
          |
          v
    +------------+
    | raw-posix  |
    +------------+
          |
          v
    --------------
      filesystem

As you see, what I've chosen for the external analysis interface is just
an NBD server as this is the component that we already have today. You
could hook up any other (new) code there; the important part is that it
doesn't work on the BDS of the blkreplay driver, but directly on the BDS
of the qcow2 driver.

On the command line, it could look like this (this assumes that we don't
add syntactic sugar that creates the blkreplay part automatically - we
can always do that):

    -drive file=image.qcow2,if=none,id=img-direct
    -drive driver=blkreplay,if=none,image=img-direct,id=img-blkreplay
    -device virtio-blk-pci,drive=img-blkreplay

The NBD server can't be started on the command line, so you'd go to the
monitor and start it there with the direct access:

    (qemu) nbd_server_start unix:/tmp/my_socket
    (qemu) nbd_server_add img-direct

(Exact syntax is untested, but this should roughly be how it works.)

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]