qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 2/2] ide: fix crash in IDE cdrom read


From: Denis V. Lunev
Subject: Re: [Qemu-devel] [PATCH 2/2] ide: fix crash in IDE cdrom read
Date: Wed, 24 Jan 2018 13:25:01 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0

sorry, I have caught this latter tooo late.

On 12/18/2017 08:49 PM, John Snow wrote:
>
> On 12/14/2017 06:29 AM, Denis V. Lunev wrote:
>>> If this has been broken since 2.9, 2.11-rc3 is too late for a bandaid
>>> applied to something I can't diagnose. Let's discuss this for 2.12 and I
>>> will keep trying to figure out what the root cause is.
>> I have read the entire letter in 2 subsequent attempts, but
>> unfortunately I can not say much more additionally :(
>>
> No problem, sometimes I don't understand myself. And the IDE code isn't
> exactly the nicest stuff to read. If I was smart enough I'd refactor the
> whole thing, but without breaking migration it's a little hard :(
>
>>> Some questions for you:
>>>
>>> (1) Is the guest Linux? Do we know why this one machine might be
>>> tripping up QEMU? (Is it running a fuzzer, a weird OS, etc...?)
>> This is running by the end-user by our customer and we do not have
>> access to that machine and customer. This is anonymized crash report
>> from the node. This is not a single crash. We observe 1-2 reports with
>> this crash in a day.
>>
> Yikes. Is this still on a 2.9-based VM, or have you upgraded to 2.10 or
> 2.11 at this point?
>
> (From memory this was a problem with a 2.9 based machine)
the problem is with 2.9

>>> (2) Does the VM actually have a CDROM inserted at some point? Is it
>>> possible we're racing on some kind of eject or graph manipulation failure?
>> unclear but IMHO probable.
>>
> If they're using a 2.10+ based VM, could you look at some trace points?
>
> either:
> trace_ide_atapi_cmd (just scsi byte 0), or
> trace_ide_atapi_cmd_packet (the entire scsi cdb)
>
> and
>
> trace_ide_exec_cmd
>
> the actual command bytes never get saved in the state struct, so it's
> hard to tell from traces what commands were being processed, but these
> traces help.
unfortunately I do not have access to the crashing node :(
that is the problem.


>>> (3) Is this using AHCI or IDE?
>> IDE. This is known 120%. We do not provide ability to enable AHCI
>> without manual tweaking.
>>
> At least that helps narrow down the path...
>
>>> If I can't figure it out within a week or so from here I'll just check
>>> in the band-aid with some /* FIXME */ comments attached.
>> No prob. We are going to ship my band-aid and see to report statistics.
>>
>> Thank you in advance,
>>     Den
> I'll stage the band-aid with some FIXME comments, and maybe some scary
> error_report prints with some information in them. I'll send it to the list.
I do not see them merged. Have you?

For now I have merged my patch downstream. I do not see that it could be
wrong. The release is scheduled late this spring and if crashes
will stop to happen - I'll let you know.

Den



reply via email to

[Prev in Thread] Current Thread [Next in Thread]