qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] Re: [PATCH 1/2] Pad iommu with an empty slot (necessary for


From: Artyom Tarasenko
Subject: [Qemu-devel] Re: [PATCH 1/2] Pad iommu with an empty slot (necessary for SunOS 4.1.4)
Date: Fri, 21 May 2010 19:23:04 +0200

2010/5/10 Blue Swirl <address@hidden>:
> On 5/10/10, Artyom Tarasenko <address@hidden> wrote:
>> 2010/5/10 Blue Swirl <address@hidden>:
>>
>> > On 5/10/10, Artyom Tarasenko <address@hidden> wrote:
>>  >> 2010/5/9 Blue Swirl <address@hidden>:
>>  >>  > On 5/9/10, Artyom Tarasenko <address@hidden> wrote:
>>  >>  >> 2010/5/9 Blue Swirl <address@hidden>:
>>  >>  >>
>>  >>  >> > On 5/8/10, Artyom Tarasenko <address@hidden> wrote:
>>  >>  >>  >> On the real hardware (SS-5, LX) the MMU is not padded, but 
>> aliased.
>>  >>  >>  >>  Software shouldn't use aliased addresses, neither should it 
>> crash
>>  >>  >>  >>  when it uses (on the real hardware it wouldn't). Using 
>> empty_slot
>>  >>  >>  >>  instead of aliasing can help with debugging such accesses.
>>  >>  >>  >
>>  >>  >>  > TurboSPARC Microprocessor User's Manual shows that there are
>>  >>  >>  > additional pages after the main IOMMU for AFX registers. So this 
>> is
>>  >>  >>  > not board specific, but depends on CPU/IOMMU versions.
>>  >>  >>
>>  >>  >>
>>  >>  >> I checked it on the real hw: on LX and SS-5 these are aliased MMU 
>> addresses.
>>  >>  >>  SS-20 doesn't have any aliasing.
>>  >>  >
>>  >>  > But are your machines equipped with TurboSPARC or some other CPU?
>>  >>
>>  >>
>>  >> Good point, I must confess, I missed the word "Turbo" in your first
>>  >>  answer. LX and SS-20 don't.
>>  >>  But SS-5 must have a TurboSPARC CPU:
>>  >>
>>  >>  ok cd /FMI,MB86904
>>  >>  ok .attributes
>>  >>  context-table            00 00 00 00 03 ff f0 00 00 00 10 00
>>  >>  psr-implementation       00000000
>>  >>  psr-version              00000004
>>  >>  implementation           00000000
>>  >>  version                  00000004
>>  >>  cache-line-size          00000020
>>  >>  cache-nlines             00000200
>>  >>  page-size                00001000
>>  >>  dcache-line-size         00000010
>>  >>  dcache-nlines            00000200
>>  >>  dcache-associativity     00000001
>>  >>  icache-line-size         00000020
>>  >>  icache-nlines            00000200
>>  >>  icache-associativity     00000001
>>  >>  ncaches                  00000002
>>  >>  mmu-nctx                 00000100
>>  >>  sparc-version            00000008
>>  >>  mask_rev                 00000026
>>  >>  device_type              cpu
>>  >>  name                     FMI,MB86904
>>  >>
>>  >>  and still it behaves the same as TI,TMS390S10 from the LX. This is done 
>> on SS-5:
>>  >>
>>  >>  ok 10000000 20 spacel@ .
>>  >>  4000009
>>  >>  ok 14000000 20 spacel@ .
>>  >>  4000009
>>  >>  ok 14000004 20 spacel@ .
>>  >>  23000
>>  >>  ok 1f000004 20 spacel@ .
>>  >>  23000
>>  >>  ok 10000008 20 spacel@ .
>>  >>  4000009
>>  >>  ok 14000028 20 spacel@ .
>>  >>  4000009
>>  >>  ok 1000000c 20 spacel@ .
>>  >>  23000
>>  >>  ok 10000010 20 spacel@ .
>>  >>  4000009
>>  >>
>>  >>
>>  >>  LX is the same except for the IOMMU-version:
>>  >>
>>  >>  ok 10000000 20 spacel@ .
>>  >>  4000005
>>  >>  ok 14000000 20 spacel@ .
>>  >>  4000005
>>  >>  ok 18000000 20 spacel@ .
>>  >>  4000005
>>  >>  ok 1f000000 20 spacel@ .
>>  >>  4000005
>>  >>  ok 1ff00000 20 spacel@ .
>>  >>  4000005
>>  >>  ok 1fff0004 20 spacel@ .
>>  >>  1fe000
>>  >>  ok 10000004 20 spacel@ .
>>  >>  1fe000
>>  >>  ok 10000108 20 spacel@ .
>>  >>  41000005
>>  >>  ok 10000040 20 spacel@ .
>>  >>  41000005
>>  >>  ok 1fff0040 20 spacel@ .
>>  >>  41000005
>>  >>  ok 1fff0044 20 spacel@ .
>>  >>  1fe000
>>  >>  ok 1fff0024 20 spacel@ .
>>  >>  1fe000
>>  >>
>>  >>
>>  >>  >>  At what address the additional AFX registers are located?
>>  >>  >
>>  >>  > Here's complete TurboSPARC IOMMU address map:
>>  >>  >  PA[30:0]          Register          Access
>>  >>  > 1000_0000       IOMMU Control         R/W
>>  >>  > 1000_0004    IOMMU Base Address       R/W
>>  >>  > 1000_0014   Flush All IOTLB Entries    W
>>  >>  > 1000_0018        Address Flush         W
>>  >>  > 1000_1000  Asynchronous Fault Status  R/W
>>  >>  > 1000_1004 Asynchronous Fault Address  R/W
>>  >>  > 1000_1010  SBus Slot Configuration 0   R/W
>>  >>  > 1000_1014  SBus Slot Configuration 1   R/W
>>  >>  > 1000_1018  SBus Slot Configuration 2   R/W
>>  >>  > 1000_101C  SBus Slot Configuration 3   R/W
>>  >>  > 1000_1020  SBus Slot Configuration 4   R/W
>>  >>  > 1000_1050     Memory Fault Status     R/W
>>  >>  > 1000_1054    Memory Fault Address     R/W
>>  >>  > 1000_2000     Module Identification    R/W
>>  >>  > 1000_3018      Mask Identification      R
>>  >>  > 1000_4000      AFX Queue Level         W
>>  >>  > 1000_6000      AFX Queue Level         R
>>  >>  > 1000_7000      AFX Queue Status        R
>>  >>
>>  >>
>>  >>
>>  >> But if I read it correctly 0x12fff294 (which makes SunOS crash with -m 
>> 32) is
>>  >>  well above this limit.
>>  >
>>  > Oh, so I also misread something. You are not talking about the
>>  > adjacent pages, but 16MB increments.
>>  >
>>  > Earlier I sent a patch for a generic address alias device, would it be
>>  > useful for this?
>>
>>
>> Should do as well. But I thought empty_slot is less overhead and
>>  easier to debug.
>>

Also the aliasing patch would require one more parameter: the size of
area which has to be aliased. Except we implement stubs for all
missing devices and and do aliasing of the connected port ranges. And
then again, SS-20 doesn't have aliasing in this area at all.

What do you think about this (empty_slot) solution (except that I
missed the SoB line)? Meanwhile it's tested with SunOS 4.1.3U1 too.

>>> Maybe we have a general design problem, perhaps unassigned access
>>> faults should only be triggered inside SBus slots and ignored
>>> elsewhere. If this is true, generic Sparc32 unassigned access handler
>>> should just ignore the access and special fault generating slots
>>> should be installed for empty SBus address ranges.

Agreed that they should be special for SBus, because SS-20 OBP is
not happy with the fault we are currently generating. But otherwise I think qemu
does it correct. On SS-5:

ok f7ff0000 2f spacel@ .
Data Access Error
ok sfar@ .
f7ff0000
ok 20000000 2f spacel@ .
Data Access Error
ok sfar@ .
20000000
ok 40000000 20 spacel@ .
Data Access Error
ok sfar@ .
40000000

Neither ff7ff0000 nor f20000000, nor 40000000 are in SBus range,  right?

>> My impression was that SS-5 and SS-20 do unassigned accesses a bit 
>> differently.
>>  The current IOMMU implementation fits SS-20, which has no aliasing.
>
> It's probably rather the board design than just IOMMU.

Agreed. That's why I bound the patch to machine hwdef  and not to iommu.

>>  >>  >>  > One approach would be that IOMMU_NREGS would be increased to cover
>>  >>  >>  > these registers (with the bump in savevm version field) and
>>  >>  >>  > iommu_init1() should check the version field to see how much MMIO 
>> to
>>  >>  >>  > provide.
>>  >>  >>
>>  >>  >>
>>  >>  >> The problem I see here is that we already have too much registers: we
>>  >>  >>  emulate SS-20 IOMMU (I guess), while SS-5 and LX seem to have only
>>  >>  >>  0x20 registers which are aliased all the way.
>>  >>  >>
>>  >>  >>
>>  >>  >>  > But in order to avoid the savevm version change, iommu_init1() 
>> could
>>  >>  >>  > just install dummy MMIO (in the TurboSPARC case), if OBP does not 
>> care
>>  >>  >>  > if the read back data matches what has been written earlier. 
>> Because
>>  >>  >>  > from OBP point of view this is identical to what your patch 
>> results
>>  >>  >>  > in, I'd suppose this approach would also work.
>>  >>  >>
>>  >>  >>
>>  >>  >> OBP doesn't seem to care about these addresses at all. It's only the 
>> "MUNIX"
>>  >>  >>  SunOS 4.1.4 kernel who does. The "MUNIX" kernel is the only kernel 
>> available
>>  >>  >>  during the installation, so it is currently not possible to install 
>> 4.1.4.
>>  >>  >>  Surprisingly "GENERIC" kernel which is on the disk after the
>>  >>  >>  installation doesn't
>>  >>  >>  try to access these address ranges either, so a disk image taken 
>> from a live
>>  >>  >>  system works.
>>  >>  >>
>>  >>  >>  Actually access to the non-connected/aliased addresses may also be a
>>  >>  >>  consequence of phys_page_find bug I mentioned before. When I run
>>  >>  >>  install with -m 64 and -m 256 it tries to access different
>>  >>  >>  non-connected addresses. May also be a SunOS bug of course. 256m 
>> used
>>  >>  >>  to be a lot back then.
>>  >>  >
>>  >>  > Perhaps with 256MB, memory probing advances blindly from memory to
>>  >>  > IOMMU registers. Proll (used before OpenBIOS) did that once, with bad
>>  >>  > results :-). If this is true, 64M, 128M and 192M should show identical
>>  >>  > results and only with close or equal to 256M the accesses happen.
>>  >>
>>  >>
>>  >> 32m: 0x12fff294
>>  >>  64m: 0x14fff294
>>  >>  192m:0x1cfff294
>>  >>  256m:0x20fff294
>>  >>
>>  >>  Memory probing? It would be strange that OS would do it itself. The OS
>>  >>  could just
>>  >>  ask OBP how much does it have. Here is the listing where it happens:
>>  >>
>>  >>  _swift_vac_rgnflush:            rd      %psr, %g2
>>  >>  _swift_vac_rgnflush+4:          andn    %g2, 0x20, %g5
>>  >>  _swift_vac_rgnflush+8:          mov     %g5, %psr
>>  >>  _swift_vac_rgnflush+0xc:        nop
>>  >>  _swift_vac_rgnflush+0x10:       nop
>>  >>  _swift_vac_rgnflush+0x14:       mov     0x100, %g5
>>  >>  _swift_vac_rgnflush+0x18:       lda     [%g5] 0x4, %g5
>>  >>  _swift_vac_rgnflush+0x1c:       sll     %o2, 0x2, %g1
>>  >>  _swift_vac_rgnflush+0x20:       sll     %g5, 0x4, %g5
>>  >>  _swift_vac_rgnflush+0x24:       add     %g5, %g1, %g5
>>  >>  _swift_vac_rgnflush+0x28:       lda     [%g5] 0x20, %g5
>>  >>
>>  >>  _swift_vac_rgnflush+0x28: is the fatal one.
>>  >>
>>  >>  kadb> $c
>>  >>  _swift_vac_rgnflush(?)
>>  >>  _vac_rgnflush() + 4
>>  >>  _hat_setup_kas(0xc00,0xf0447000,0x43a000,0x400,0xf043a000,0x3c0) + 70
>>  >>  _startup(0xfe000000,0x10000000,0xfa000000,0xf00e2bfc,0x10,0xdbc00) + 
>> 1414
>>  >>  _main(0xf00e0fb4,0xf0007810,0x293ff49f,0xa805209c,0x200,0xf00d1d18) + 14
>>  >>
>>  >>  Unfortunately (but not surprisingly) kadb doesn't allow debugging
>>  >>  cache-flush code, so I can't check what is in
>>  >>  [%g5] (aka sfar) on the real machine when this happens.
>>  >
>>  > Linux code for Swift/TurboSPARC VAC flush should be similar.

Do you have an idea why would anyone try reading a value referenced in sfar?
Especially during flushing? I can't imagine a case where it wouldn't
produce a fault.

>>  >>  But the bug in phys_page_find would explain this accesses: sfar gets
>>  >>  the wrong address, and then the secondary access happens on this wrong
>>  >>  address instead of the original one.
>>  >
>>  > I doubt phys_page_find can be buggy, it is so vital for all architecture.
>>
>>
>> But you've seen the example of buggy behaviour I posted last Friday, right?
>>  If it's not phys_page_find, it's either cpu_physical_memory_rw (which
>>  is also pretty generic), or
>>  the way SS-20 registers devices. Can it be that all the pages must be
>>  registered in the proper order?
>
> How about unassigned access handler, could it be suspected?

Doesn't look like it: it gets a physical address as a parameter. How
would it know the address is wrong?

>>  I think it's a pretty rare use case where you have a memory fault (not
>>  a translation fault) on an unknown address. You may have such fault
>>  during device probing, but in such case you know what address you are
>>  probing, so you don't care about the sync fault address register.
>>
>>  Besides, do all architectures have sync fault address register?
>
> No, I think system level checks like that and IOMMU-like controls on
> most architectures are very poor compared to Sparc32. Server and
> mainframe systems may be a bit better.

And do we have any mainframe emulated good enough to have a user base
and hence bug reports?

>>  >>  fwiw the routine is called only once on the real hardware. It sort of
>>  >>  speaks for your hypothesis about the memory probing. Although it may
>>  >>  not necessarily probe for memory...
>>  >>
>>  >>


-- 
Regards,
Artyom Tarasenko

solaris/sparc under qemu blog: http://tyom.blogspot.com/



reply via email to

[Prev in Thread] Current Thread [Next in Thread]