Re: [Qemu-devel] [PATCH 1/2] Add virtagent file system freeze/thaw

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 1/2] Add virtagent file system freeze/thaw

From:	Michael Roth
Subject:	Re: [Qemu-devel] [PATCH 1/2] Add virtagent file system freeze/thaw
Date:	Thu, 03 Feb 2011 11:41:43 -0600
User-agent:	Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.9.2.13) Gecko/20101207 Thunderbird/3.1.7

On 02/02/2011 02:48 AM, Jes Sorensen wrote:

On 02/02/11 08:57, Stefan Hajnoczi wrote:

On Tue, Feb 1, 2011 at 10:58 AM,<address@hidden>  wrote:

From: Jes Sorensen<address@hidden>

Implement freeze/thaw support in the guest, allowing the host to
request the guest freezes all it's file systems before a live snapshot
is performed.
  - fsfreeze(): Walk the list of mounted local real file systems,
               and freeze them.


Does this add a requirement that guest agent code issues no disk I/O
in its main loop (e.g. logging)?  Otherwise we might deadlock
ourselves waiting for I/O which is never issued.


Yes very much so[1] - one reason why it would be nice to have virtagent
use threads to execute the actual commands. We should probably add a
flag to agent commands indicating whether they issue disk I/O or not, so
we can block attempts to execute commands that do so, while the guest is
frozen.


**Warning, epic response**

For things like logging and i/o on a frozen system...I agree we'd needsome flag for these kinds of situations. Maybe a disable_logging()flag....i really don't like this though... I'd imagine even syslogd()could block virtagent in this type of situation, so that would need tobe disabled as well.

But doing so completely subverts our attempts and providing properaccounting of what the agent is doing to the user. A user can freeze thefilesystem, knowing that logging would be disabled, then prod atwhatever he wants. So the handling should be something specific tofsfreeze, with stricter requirements:

If a user calls fsfreeze(), we disable logging, but also disable theability to do anything other than fsthaw() or fsstatus(). This actuallysolves the potential deadlocking problem for other RPCs as well...sincethey cant be executed in the first place.


So I think that addresses the agent deadlocking itself, post-freeze.

However, fsfreeze() itself might lock-up the agent as well...I'm notconfident we can really put any kind of bound on how long it'll take toexecute, and if we timeout on the client-side the agent can still blockhere.

Plus there are any number of other situations where an RPC can stillhang things...in the future when we potentially allow things like scriptexecution, they might do something like attempt to connect to a socketthat's already in use and wait on the server for an arbitrary amount oftime, or open a file on an nfs share that in currently unresponsive.

So a solution for these situations is still needed, and I'm starting toagree that threads are needed, but I don't think we should do RPCsconcurrently (not sure if that's what is being suggested or not). Atleast, there's no pressing reason for it as things currently stand(there aren't currently any RPCs where fast response times are all thatimportant, so it's okay to serialize them behind previous RPCs, andHMP/QMP are command at a time), and it's something that Im fairlyconfident can be added if the need arises in the future.

But for dealing with a situation where an RPC can hang the agent, Ithink one thread should do it. Basically:

We associate each RPC with a time limit. Some RPCs, very special onesthat we'd trust with our kids, could potentially specify an unlimitedtimeout. The client side should use this same timeout on it's end. Inthe future we might allow the user to explicitly disable the timeout fora certain RPC. The logic would then be:


- read in a client RPC request
- start a thread to do RPC

- if there's a timeout, register an alarm(<timeout>), with a handlerthat will call something like pthread_kill(current_worker_thread). Onthe thread side, this signal will induce a pthread_exit()

- wait for the thread to return (pthread_join(current_worker_thread))

- return it's response back to the caller if it finished, return atimeout indication otherwise


Cheers,
Jes

[1] speaking from experience ... a Linux desktop gets really upset if
you freeze the file systems from a command in an xterm.... ho hum

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] [PATCH 1/2] Add virtagent file system freeze/thaw, (continued)
- [Qemu-devel] [PATCH 2/2] Add monitor commands for fsfreeze support, Jes . Sorensen, 2011/02/01
- Re: [Qemu-devel] [PATCH 0/2] virtagent - fsfreeze support, Vasiliy G Tolstov, 2011/02/01
  - Re: [Qemu-devel] [PATCH 0/2] virtagent - fsfreeze support, Jes Sorensen, 2011/02/01
  - Re: [Qemu-devel] [PATCH 0/2] virtagent - fsfreeze support, Richard W.M. Jones, 2011/02/01
    - Re: [Qemu-devel] [PATCH 0/2] virtagent - fsfreeze support, Vasiliy G Tolstov, 2011/02/01

Prev by Date: Re: [Qemu-devel] [0.14?][PATCH 3/4] ioapic: Prepare for base address relocation
Next by Date: [Qemu-devel] Re: QCOW2 bugs releated to qcow2_aio_cancel()
Previous by thread: Re: [Qemu-devel] [PATCH 1/2] Add virtagent file system freeze/thaw
Next by thread: Re: [Qemu-devel] [PATCH 1/2] Add virtagent file system freeze/thaw
Index(es):
- Date
- Thread