qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] block: fix big write


From: Ming Lei
Subject: Re: [Qemu-devel] [PATCH] block: fix big write
Date: Wed, 10 Dec 2014 23:47:25 +0800

On Wed, Dec 10, 2014 at 11:02 PM, Paolo Bonzini <address@hidden> wrote:
>
>
> On 10/12/2014 15:35, Ming Lei wrote:
>>>> It is _not_ never happen at all, and easy to be triggered when using
>>>> mkfs.
>>>
>>> mkfs is not something to optimize for, it's just something that should
>>> work.  (Also, some hardware may time out if you do write same with too
>>> high a block count).
>>
>> I don't think it is related with the hardware time out issue since your
>> patch still splits the block count into 2G - 1, and both are same wrt.
>> block count.
>
> If the guest sends a 1TB WRITE SAME, it's more likely to time out.

Guest implementation should have a top limit for the corresponding
time out, like SD_MAX_WS16_BLOCKS in linux.

>
>>> Both Linux and Windows will always use UNMAP on QEMU, except for the
>>> small time period where Linux used WRITE SAME and this bug was
>>> discovered.  And all versions of Linux that used WRITE SAME honored the
>>> max_ws_blocks field.
>>
>> Not sure how you get the conclusion.
>
> Because the WRITE SAME patch was submitted ~1 month ago.
>
> Windows uses UNMAP because Microsoft says so.
>
>> Secondly SBC-3 draft doesn't describe the priority explicitly among
>> UNMAP, WRITE SAME 10, and WRITE SAME 16, so it is driver's
>> freedom to take anyone in theory.
>
> Sure, but WRITE SAME with UNMAP doesn't make sense if you do not have
> LBPRZ, which QEMU does not set.  In fact the only sensible things to do are:
>
> - use WRITE SAME if LBPRZ
>
> - use UNMAP if !LBPRZ
>
> So any sensible guest will use UNMAP.
>
>> Finally blkdev_issue_zeroout() can send WRITE SAME(10/16) directly
>> and it can be from user space, fs, and block drivers.
>
> That is WRITE SAME without UNMAP, it is not used by mkfs, and Linux has
> always honored max_write_same_blocks for it (defaulting to a 65535 block
> limit for older devices that did not report a limit).

>From QEMU view, blk_aio_write_zeroes() still need to handle
case without UNMAP, and the default 65535 is just linux's current
implementation, and even the recent patch tries to increase
the default setting. Also the default limit might be bigger on other OS.

>
> So what *concrete* case would be fixed by adding extra little-used code
> in QEMU to do the split?
>
> Paolo
>
>> Thanks,
>> Ming Lei
>>
>>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]