qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-ppc] Tests with the latest ppc repo


From: Mark Cave-Ayland
Subject: Re: [Qemu-ppc] Tests with the latest ppc repo
Date: Tue, 13 Jan 2015 11:31:43 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Icedove/31.3.0

On 13/01/15 10:17, Kevin Wolf wrote:

>> 2) The fix for mixing synchronous and asynchronous requests in the macio
>> code corrupts a fresh install of my Darwin test ISO when running through
>> the complete installation:
>> http://git.qemu.org/?p=qemu.git;a=commitdiff;h=3e300fa6ad4ee19b16339c25773dec8df0bfb982
> 
> First of all, the commits message says:
> 
>     However, the block infrastructure changed below our feet and now
>     it's impossible to call a synchronous block read/write from the aio
>     callback handler of a previous block access.
> 
> That's news for me. If it's true, this is probably not intentional at
> all and block layer code should have been fixed instead.
> 
> What kind of failure do you get when you try such a call?

The original patch was supplied by Alex based on a report from John
which I suspect may have been provided off-list (see
http://lists.gnu.org/archive/html/qemu-ppc/2014-05/msg00461.html) and
was not something that I was seeing locally at the time.

>> So far I don't see anything obvious with the above commit except that
>> the multi-write combining seems to be more aggressive afterwards.
> 
> Maybe I'm missing something, but I don't see any combining in this code?

This is something that I found by enabling tracepoints for both the good
and bad commits and trying to compare the differences. It looks like
when the unaligned requests were submitted previously, e.g. issue the
bulk request via dma_blk_read()/dma_blk_write() followed by the
remainder with bdrv_read()/bdrv_write() then the block layer disk
requests would look fairly similar to the DMA engine requests.

When the above commit changed the unaligned accesses from
bdrv_read()/bdrv_write() into bdrv_aio_readv()/bdrv_aio_writev() as used
internally by dma_blk_read() and dma_blk_write(), then individual DMA
read/write requests would start getting merged together, presumably by
the request-merging code in block.c.

Note that this is on a laptop with a slow internal HD (non-SSD) with
encryption enabled and so could this perhaps could be encouraging more
request merging?

>> I'm wondering whether or not the synchronous requests for the
>> unaligned accesses had a side effect of introducing read/write
>> barriers which have now been removed?
> 
> Let's start a bit simpler: While a request is running, its iov is in
> use. pmac_ide_transfer_cb() seems to issue an AIO request and
> immediately continue working on the same iov. I wouldn't be surprised if
> some internal data structures in the macio driver weren't made for
> multiple requests at the same time either.
> 
> The usual pattern is that after issuing an AIO command, you return, and
> only continue what you were doing in its callback.
> 
> That said, you have so many operations in pmac_ide_transfer_cb() using
> the function recursively as their callback that I find it really hard to
> read by now. Perhaps it would benefit from a conversion to coroutines,
> which would make the code linear again.

Yeah, it's a fairly horrible piece of code which is why I've been
spending time trying to fix it up :/  I do have an alpha quality patch
that separates the DMA and disk logic back out again, but it also
suffers from the same problem as the existing code in that while the
installation completes the resulting disk image is corrupted in a
seemingly similar way.

I can temporarily push this code, which I believe is easier to read, to
my github account if that helps?

Another solution to explore could be to move the AIO context over to
using offset and length rather than sector_num and sector count so that
the existing code for allowing 512 byte sector accesses on 4096 byte
sector devices can be used. The caller could potentially provide an
"allow unaligned accesses" flag when creating the AIO context, and then
all that would remain would be to create custom DMA helpers for macio,
similar to what I've tried to do in my experimental patch.

>> I can provide an easy, although annoyingly time-consuming test case if
>> either Kevin or Stefan (added to CC) are able to help point me in the
>> right direction as to how to go about debugging this?
> 
> I'm not familiar enough with the controller to help in debugging it, but
> as long as we know which one is the faulty patch, we can probably solve
> this by code review.

That would be great! For reference the test case is fairly simple:

1) Grab and decompress the darwinppc-602.iso file from
https://opensource.apple.com/static/iso/

2) Grab and decompress the pre-partitioned qcow2 HD image from
http://www.ilande.co.uk/tmp/darwin_empty.qcow2.xz

3) Switch to something around the v2.0 release, e.g.

git checkout v2.0.0

3) Run the installation like this:

qemu-system-ppc -hda darwin_empty.qcow2 -cdrom darwinppc-602.iso -boot d

4) Verify the installation completes successfully with no errors and the
HD image can be booted

5) Now cherry-pick commit 3e300fa6ad4ee19b16339c25773dec8df0bfb982 on
top of your v2.0.0 git checkout

6) Re-run the entire installation as per step 3 on a fresh copy of
darwin_empty.qcow2

At the very end of the installation, the kernel extension cache build
fails with "Segmentation Fault" on "chroot /mnt kextcache -el -a `arch`
2>/dev/null". The resulting disk image is subsequently unusable and
hangs on reboot.


ATB,

Mark.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]