qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [Qemu-stable] Data corruption in Qemu 2.7.1


From: Fabian Grünbichler
Subject: Re: [Qemu-devel] [Qemu-stable] Data corruption in Qemu 2.7.1
Date: Wed, 18 Jan 2017 18:17:28 +0100 (CET)

> Paolo Bonzini <address@hidden> hat am 18. Januar 2017 um 17:30 geschrieben:
> 
> 
> 
> 
> On 18/01/2017 17:19, Fabian Grünbichler wrote:
> > Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#109 FAILED Result: 
> > hostbyte=DID_OK driverbyte=DRIVER_SENSE
> > Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#109 Sense Key : 
> > Illegal Request [current]
> > Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#109 Add. Sense: 
> > Invalid field in cdb
> > Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#109 CDB: Write(10) 2a 
> > 00 0d d6 51 48 00 08 00 00
> > Jan 18 17:07:51 ubuntu kernel: blk_update_request: critical target error, 
> > dev sda, sector 232149320
> > Jan 18 17:07:51 ubuntu kernel: EXT4-fs warning (device sda1): 
> > ext4_end_bio:329: I/O error -121 writing to inode 125 (offset 0 size 0 
> > starting block 29018921)
> > Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical 
> > block 29018409
> > Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical 
> > block 29018410
> > Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical 
> > block 29018411
> > Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical 
> > block 29018412
> > Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical 
> > block 29018413
> > Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical 
> > block 29018414
> > Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical 
> > block 29018415
> > Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical 
> > block 29018416
> > Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical 
> > block 29018417
> > Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical 
> > block 29018418
> > Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#106 FAILED Result: 
> > hostbyte=DID_OK driverbyte=DRIVER_SENSE
> > Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#106 Sense Key : 
> > Illegal Request [current]
> > Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#106 Add. Sense: 
> > Invalid field in cdb
> > Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#106 CDB: Write(10) 2a 
> > 00 0d d6 59 48 00 08 00 00
> > Jan 18 17:07:51 ubuntu kernel: blk_update_request: critical target error, 
> > dev sda, sector 232151368
> > Jan 18 17:07:51 ubuntu kernel: EXT4-fs warning (device sda1): 
> > ext4_end_bio:329: I/O error -121 writing to inode 125 (offset 0 size 0 
> > starting block 29019177)
> > Jan 18 17:07:52 ubuntu kernel: JBD2: Detected IO errors while flushing file 
> > data on sda1-8
> > Jan 18 17:07:58 ubuntu kernel: JBD2: Detected IO errors while flushing file 
> > data on sda1-8
> > 
> > 
> > strace (with some random grep-ing):
> > [pid  1794] ioctl(19, SG_IO, {'S', SG_DXFER_TO_DEV, cmd[10]=[2a, 00, 0d, 
> > d6, 51, 48, 00, 08, 00, 00], mx_sb_len=252, iovec_count=17, 
> > dxfer_len=1048576, timeout=4294967295, flags=0x1, 
> > data[1048576]=["\0`\235=c\177\0\0\0\0\1\0\0\0\0\0\0`\236=c\177\0\0\0\0\1\0\0\0\0\0"...]})
> >  = -1 EINVAL (Invalid argument)
> > [pid  1794] ioctl(19, SG_IO, {'S', SG_DXFER_TO_DEV, cmd[10]=[2a, 00, 0d, 
> > d6, 59, 48, 00, 08, 00, 00], mx_sb_len=252, iovec_count=16, 
> > dxfer_len=1048576, timeout=4294967295, flags=0x1, 
> > data[1048576]=["\0`-=c\177\0\0\0\0\1\0\0\0\0\0\0`.=c\177\0\0\0\0\1\0\0\0\0\0"...]})
> >  = -1 EINVAL (Invalid argument)
> 
> This is useful, thanks.  I suspect blk_rq_map_user_iov is failing,
> meaning that the scatter/gather list has too many segments for the HBA
> in the host.  (The limit can be found in /sys/block/sda/queue/max_segments).

I can try to get some more info tomorrow..

> 
> This is consistent with your finding here:
> 
> > disabling THP on the hypervisor host with
> > 
> > # echo 'never' > /sys/kernel/mm/transparent_hugepage/enabled
> > 
> > allows reproducing the bug very reliably, shutting the VM down, then
> > enabling THP (with 'always') and trying again makes it go away.
> 
> because no THP means more memory fragmentation and thus more segments.
> 
> I'm not sure how to fix it, unfortunately. :(

Well at least this means we have a (potentially too conservative) check for 
deciding when to use scsi-disk instead of scsi-block (maybe this could be 
detected in qemu as well?).

Seems especially troublesome since the (hypervisor) admin can change it at 
runtime, and it seems like there are widespread recommendations to disable THP 
for e.g., DB use cases..

> 
> Paolo
>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]