[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] Hibernate and qemu-nbd
From: |
Mark Trumpold |
Subject: |
Re: [Qemu-devel] Hibernate and qemu-nbd |
Date: |
Thu, 19 Sep 2013 20:44:12 +0000 |
>-----Original Message-----
>From: Stefan Hajnoczi [mailto:address@hidden
>Sent: Wednesday, September 18, 2013 06:12 AM
>To: 'Mark Trumpold'
>Cc: address@hidden, 'Paul Clements', address@hidden,
>address@hidden, address@hidden
>Subject: Re: [Qemu-devel] Hibernate and qemu-nbd
>
>On Tue, Sep 17, 2013 at 07:10:44AM -0700, Mark Trumpold wrote:
>> I am using the kernel functionality directly with the commands:
>> echo platform >/sys/power/disk
>> echo disk >/sys/power/state
>>
>> The following appears in dmesg when I attempt to hibernate:
>>
>> ====================================================
>> [ 38.881397] nbd (pid 1473: qemu-nbd) got signal 0
>> [ 38.881401] block nbd0: shutting down socket
>> [ 38.881404] block nbd0: Receive control failed (result -4)
>> [ 38.881417] block nbd0: queue cleared
>> [ 87.463133] block nbd0: Attempted send on closed socket
>> [ 87.463137] end_request: I/O error, dev nbd0, sector 66824
>> ====================================================
>>
>> My environment:
>> Debian: 6.0.5
>> Kernel: 3.3.1
>> Qemu userspace: 1.2.0
>
>This could be a bug in the nbd client kernel module.
>drivers/block/nbd.c:sock_xmit() does the following:
>
> result = kernel_recvmsg(sock, &msg, &iov, 1, size,
> msg.msg_flags);
>
> if (signal_pending(current)) {
> siginfo_t info;
> printk(KERN_WARNING "nbd (pid %d: %s) got signal %d\n",
> task_pid_nr(current), current->comm,
> dequeue_signal_lock(current, ¤t->blocked, &info));
> result = -EINTR;
> sock_shutdown(nbd, !send);
> break;
> }
>
>The signal number in the log output looks bogus, we shouldn't get 0.
>sock_xmit() actually blocks all signals except SIGKILL before calling
>kernel_recvmsg(). I guess this is an artifact of the suspend-to-disk
>operation, maybe the signal pending flag is set on the process.
>
>Perhaps someone with a better understanding of the kernel internals can
>check this?
>
>What happens next is that the nbd kernel module shuts down the NBD connection.
>
>As a workaround, please try running a separate nbd-client(1) process and drop
>the qemu-nbd -c command-line argument. This way nbd-client(1) uses the
>nbd kernel module instead of the qemu-nbd process and you'll get the
>benefit of nbd-client's automatic reconnect.
>
>Stefan
>
Hi Stefan,
Thank you for the information.
I did some experiments per you suggestion. Wasn't sure if the following
was what you had in mind:
1) Configured 'nbd-server' and started (/etc/nbd-server/config):
[generic]
[export]
exportname = /root/qemu/q1.img
port = 2000
2) Started 'nbd-client':
-> nbd-client localhost 2000 /dev/nbd0
3) Verify '/dev/nbd0' is in use (will appear in list):
-> cat /proc/partitions
At this point I could mount '/dev/nbd0' as expected, but not necessary
to demonstrate a problem.
Now at this point if I enter S1(standby), S3(suspend to ram), or
S4(suspend to disk) I get the same dmesg as before indicating
'nbd0' caught signal 0 and exited.
When I resume I simply repeat step #3 to verify.
==================
Also, previously before contacting the group I had modified the same
kernel source that you had identified in 'drivers/block/nbd.c:sock_xmit()'
to not take any action. This was strictly for troubleshooting:
199 result = kernel_recvmsg(sock, &msg, &iov, 1, size,
200 msg.msg_flags);
201
202 if (signal_pending(current)) {
203 siginfo_t info;
204 printk(KERN_WARNING "nbd (pid %d: %s) got signal %d\n",
205 task_pid_nr(current), current->comm,
206 dequeue_signal_lock(current, ¤t->blocked,&info));
207
208 //result = -EINTR;
209 //sock_shutdown(nbd, !send);
210 //break;
211 }
We then got errors ("Wrong magac ...) in the following section:
/* NULL returned = something went wrong, inform userspace */
static struct request *nbd_read_stat(struct nbd_device *lo)
{
int result;
struct nbd_reply reply;
struct request *req;
reply.magic = 0;
result = sock_xmit(lo, 0, &reply, sizeof(reply), MSG_WAITALL);
if (result <= 0) {
dev_err(disk_to_dev(lo->disk),
"Receive control failed (result %d)\n", result);
goto harderror;
}
if (ntohl(reply.magic) != NBD_REPLY_MAGIC) {
dev_err(disk_to_dev(lo->disk), "Wrong magic (0x%lx)\n",
(unsigned long)ntohl(reply.magic));
result = -EPROTO;
goto harderror;
So, it seemed to me the call at line #199 above must be returning with
error after we commented out the signal action logic.
Thank you for your attention on this.
Let me know if I followed you suggestion correctly, and/or other tests
I can do.
Regards,
Mark T.