qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [Qemu-stable] Data corruption in Qemu 2.7.1


From: Paolo Bonzini
Subject: Re: [Qemu-devel] [Qemu-stable] Data corruption in Qemu 2.7.1
Date: Wed, 18 Jan 2017 17:30:17 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1


On 18/01/2017 17:19, Fabian Grünbichler wrote:
> Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#109 FAILED Result: 
> hostbyte=DID_OK driverbyte=DRIVER_SENSE
> Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#109 Sense Key : Illegal 
> Request [current]
> Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#109 Add. Sense: Invalid 
> field in cdb
> Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#109 CDB: Write(10) 2a 00 
> 0d d6 51 48 00 08 00 00
> Jan 18 17:07:51 ubuntu kernel: blk_update_request: critical target error, dev 
> sda, sector 232149320
> Jan 18 17:07:51 ubuntu kernel: EXT4-fs warning (device sda1): 
> ext4_end_bio:329: I/O error -121 writing to inode 125 (offset 0 size 0 
> starting block 29018921)
> Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 
> 29018409
> Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 
> 29018410
> Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 
> 29018411
> Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 
> 29018412
> Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 
> 29018413
> Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 
> 29018414
> Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 
> 29018415
> Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 
> 29018416
> Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 
> 29018417
> Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 
> 29018418
> Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#106 FAILED Result: 
> hostbyte=DID_OK driverbyte=DRIVER_SENSE
> Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#106 Sense Key : Illegal 
> Request [current]
> Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#106 Add. Sense: Invalid 
> field in cdb
> Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#106 CDB: Write(10) 2a 00 
> 0d d6 59 48 00 08 00 00
> Jan 18 17:07:51 ubuntu kernel: blk_update_request: critical target error, dev 
> sda, sector 232151368
> Jan 18 17:07:51 ubuntu kernel: EXT4-fs warning (device sda1): 
> ext4_end_bio:329: I/O error -121 writing to inode 125 (offset 0 size 0 
> starting block 29019177)
> Jan 18 17:07:52 ubuntu kernel: JBD2: Detected IO errors while flushing file 
> data on sda1-8
> Jan 18 17:07:58 ubuntu kernel: JBD2: Detected IO errors while flushing file 
> data on sda1-8
> 
> 
> strace (with some random grep-ing):
> [pid  1794] ioctl(19, SG_IO, {'S', SG_DXFER_TO_DEV, cmd[10]=[2a, 00, 0d, d6, 
> 51, 48, 00, 08, 00, 00], mx_sb_len=252, iovec_count=17, dxfer_len=1048576, 
> timeout=4294967295, flags=0x1, 
> data[1048576]=["\0`\235=c\177\0\0\0\0\1\0\0\0\0\0\0`\236=c\177\0\0\0\0\1\0\0\0\0\0"...]})
>  = -1 EINVAL (Invalid argument)
> [pid  1794] ioctl(19, SG_IO, {'S', SG_DXFER_TO_DEV, cmd[10]=[2a, 00, 0d, d6, 
> 59, 48, 00, 08, 00, 00], mx_sb_len=252, iovec_count=16, dxfer_len=1048576, 
> timeout=4294967295, flags=0x1, 
> data[1048576]=["\0`-=c\177\0\0\0\0\1\0\0\0\0\0\0`.=c\177\0\0\0\0\1\0\0\0\0\0"...]})
>  = -1 EINVAL (Invalid argument)

This is useful, thanks.  I suspect blk_rq_map_user_iov is failing,
meaning that the scatter/gather list has too many segments for the HBA
in the host.  (The limit can be found in /sys/block/sda/queue/max_segments).

This is consistent with your finding here:

> disabling THP on the hypervisor host with
> 
> # echo 'never' > /sys/kernel/mm/transparent_hugepage/enabled
> 
> allows reproducing the bug very reliably, shutting the VM down, then
> enabling THP (with 'always') and trying again makes it go away.

because no THP means more memory fragmentation and thus more segments.

I'm not sure how to fix it, unfortunately. :(

Paolo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]