qemu-arm
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 12/14] aspeed: Make aspeed_board_init_flashes public


From: Alex Bennée
Subject: Re: [PATCH 12/14] aspeed: Make aspeed_board_init_flashes public
Date: Wed, 29 Jun 2022 15:14:10 +0100
User-agent: mu4e 1.7.27; emacs 28.1.50

Cédric Le Goater <clg@kaod.org> writes:

> On 6/24/22 18:50, Cédric Le Goater wrote:
>> On 6/23/22 20:43, Peter Delevoryas wrote:
>>>
>>>
>>>> On Jun 23, 2022, at 8:09 AM, Cédric Le Goater <clg@kaod.org> wrote:
>>>>
>>>> On 6/23/22 12:26, Peter Delevoryas wrote:
>>>>> Signed-off-by: Peter Delevoryas <pdel@fb.com>
>>>>
>>>> Let's start simple without flash support. We should be able to
>>>> load FW blobs in each CPU address space using loader devices.
>>>
>>> Actually, I was unable to do this, perhaps because the fb OpenBMC
>>> boot sequence is a little weird. I specifically _needed_ to have
>>> a flash device which maps the firmware in at 0x2000_0000, because
>>> the fb OpenBMC U-Boot SPL jumps to that address to start executing
>>> from flash? I think this is also why fb OpenBMC machines can be so slow.
>>>
>>> $ ./build/qemu-system-arm -machine fby35 \
>>>      -device loader,file=fby35.mtd,addr=0,cpu-num=0 -nographic \
>>>      -d int -drive file=fby35.mtd,format=raw,if=mtd
>> Ideally we should be booting from the flash device directly using
>> the machine option '-M ast2600-evb,execute-in-place=true' like HW
>> does. Instructions are fetched using SPI transfers. But the amount
>> of code generated is tremendous.

Yeah because there is a potential race when reading from HW so we throw
away TB's after executing them because we have no way of knowing if it
has changed under our feet. See 873d64ac30 (accel/tcg: re-factor non-RAM
execution code) which cleaned up this handling.

>> See some profiling below for a
>> run which barely reaches DRAM training in U-Boot.
>
> Some more profiling on both ast2500 and ast2600 machines shows :
>
>
> * ast2600-evb,execute-in-place=true :
>
> Type               Object  Call site                Wait Time (s)         
> Count  Average (us)
> ---------------------------------------------------------------------------------------------
> BQL mutex  0x564dc03922e0  accel/tcg/cputlb.c:1365       14.21443
> 32909927          0.43

This is unavoidable as a HW access needs the BQL held so we will go
through this cycle every executed instruction.

Did I miss why the flash contents are not mapped into the physical
address space? Isn't that how it appear to the processor?

> condvar    0x564dc0f02988  util/thread-pool.c:90         10.02312            
> 56     178984.32
> condvar    [           2]  softmmu/cpus.c:423             0.10051             
> 6      16752.04
> BQL mutex  0x564dc03922e0  util/rcu.c:269                 0.04372             
> 4      10930.60
> BQL mutex  0x564dc03922e0  cpus-common.c:341              0.00151             
> 8        189.16
> condvar    0x564dc0390360  cpus-common.c:176              0.00092             
> 8        115.04
> condvar    0x564dc0392280  softmmu/cpus.c:642             0.00013             
> 2         65.04
> condvar    0x564dc0392240  softmmu/cpus.c:571             0.00010             
> 2         49.54
> BQL mutex  0x564dc03922e0  accel/tcg/cputlb.c:1426        0.00006           
> 467          0.14
> condvar    0x564dc03903a0  cpus-common.c:206              0.00004             
> 8          5.28
> ---------------------------------------------------------------------------------------------
>
>
> * ast2500-evb,execute-in-place=true :
>
> Type               Object  Call site                Wait Time (s)         
> Count  Average (us)
> ---------------------------------------------------------------------------------------------
> condvar    0x55a581137f88  util/thread-pool.c:90         10.01158            
> 28     357556.50
> BQL mutex  0x55a57f0e02e0  accel/tcg/cputlb.c:1365        0.29886      
> 14394475          0.02
> condvar    0x55a5814cb5a0  softmmu/cpus.c:423             0.02182             
> 2      10912.44
> BQL mutex  0x55a57f0e02e0  util/rcu.c:269                 0.01420             
> 4       3549.56
> mutex      0x55a5813381c0  tcg/region.c:204               0.00007          
> 3052          0.02
> condvar    0x55a57f0e0280  softmmu/cpus.c:642             0.00006             
> 1         59.79
> mutex      [           2]  chardev/char.c:118             0.00003          
> 1492          0.02
> BQL mutex  0x55a57f0e02e0  util/main-loop.c:318           0.00002            
> 34          0.72
> BQL mutex  0x55a57f0e02e0  accel/tcg/cputlb.c:1426        0.00002           
> 973          0.02
> condvar    0x55a57f0e0240  softmmu/cpus.c:571             0.00002             
> 1         15.16
> ---------------------------------------------------------------------------------------------
>
> C.
>
>
>
>> * execute-in-place=true
>> Each sample counts as 0.01 seconds.
>>    %   cumulative   self              self     total
>>   time   seconds   seconds    calls  ns/call  ns/call  name
>> 100.00      0.02     0.02   164276   121.75   121.75  
>> memory_region_init_rom_device
>>    0.00      0.02     0.00 1610346008     0.00     0.00  tcg_code_capacity
>>    0.00      0.02     0.00 567612621     0.00     0.00  
>> type_register_static_array
>>    0.00      0.02     0.00 328886191     0.00     0.00  do_common_semihosting
>>    0.00      0.02     0.00 297215811     0.00     0.00  container_get
>>    0.00      0.02     0.00 292670030     0.00     0.00  arm_cpu_tlb_fill
>>    0.00      0.02     0.00 195416119     0.00     0.00  
>> arm_cpu_register_gdb_regs_for_features
>>    0.00      0.02     0.00 193326677     0.00     0.00  
>> object_type_get_instance_size
>>    0.00      0.02     0.00 182365829     0.00     0.00  tcg_op_insert_after
>>    0.00      0.02     0.00 150668458     0.00     0.00  plugin_gen_tb_end
>>    0.00      0.02     0.00 142171940     0.00     0.00  gen_new_label
>>    0.00      0.02     0.00 133200628     0.00     0.00  
>> smbios_build_type_38_table
>>    0.00      0.02     0.00 130540338     0.00     0.00  
>> object_dynamic_cast_assert
>>    0.00      0.02     0.00 129223195     0.00     0.00  cpu_loop_exit_atomic
>>    0.00      0.02     0.00 121759298     0.00     0.00  tcg_remove_ops_after
>>    0.00      0.02     0.00 116887887     0.00     0.00  in_code_gen_buffer
>>    0.00      0.02     0.00 111803833     0.00     0.00  tcg_emit_op
>>    0.00      0.02     0.00 106052221     0.00     0.00  
>> object_class_dynamic_cast_assert
>>    0.00      0.02     0.00 99704054     0.00     0.00  
>> __jit_debug_register_code
>>    0.00      0.02     0.00 97812458     0.00     0.00  object_get_class
>>    0.00      0.02     0.00 88952594     0.00     0.00  tcg_splitwx_to_rx
>>    0.00      0.02     0.00 85790920     0.00     0.00  
>> object_class_dynamic_cast
>>    0.00      0.02     0.00 73780673     0.00     0.00  helper_exit_atomic
>>    0.00      0.02     0.00 65337482     0.00     0.00  tcg_op_supported
>>    0.00      0.02     0.00 61213619     0.00     0.00  tcg_func_start
>>    0.00      0.02     0.00 54477684     0.00     0.00  tcg_flush_softmmu_tlb
>>    0.00      0.02     0.00 53968980     0.00     0.00  tcg_temp_new_internal
>>    0.00      0.02     0.00 51526008     0.00     0.00  qemu_in_vcpu_thread
>>    0.00      0.02     0.00 40750952     0.00     0.00  pflash_cfi02_register
>>    0.00      0.02     0.00 38039442     0.00     0.00  tcg_gen_op2
>>    0.00      0.02     0.00 37068039     0.00     0.00  tcg_gen_op1
>>    0.00      0.02     0.00 36473276     0.00     0.00  tcg_gen_op3
>>    0.00      0.02     0.00 36310225     0.00     0.00  gen_gvec_uaba
>>    0.00      0.02     0.00 30985436     0.00     0.00  tb_set_jmp_target
>>    0.00      0.02     0.00 30291796     0.00     0.00  tcg_constant_internal
>>    0.00      0.02     0.00 29857950     0.00     0.00  ssi_transfer
>> * execute-in-place=false
>> Each sample counts as 0.01 seconds.
>>    %   cumulative   self              self     total
>>   time   seconds   seconds    calls  ns/call  ns/call  name
>>   40.00      0.02     0.02   551149    36.29    36.29  
>> aspeed_board_init_flashes
>>   20.00      0.03     0.01  3937238     2.54     2.54  
>> register_cp_regs_for_features
>>   20.00      0.04     0.01   674096    14.83    14.83  gen_gvec_uaba
>>   20.00      0.05     0.01   457461    21.86    21.86  
>> finalize_target_page_bits
>>    0.00      0.05     0.00  5364258     0.00     0.00  arm_gt_hvtimer_cb
>>    0.00      0.05     0.00  2467532     0.00     0.00  
>> helper_neon_narrow_sat_s8
>>    0.00      0.05     0.00  2431860     0.00     0.00  opb_opb2fsi_address
>>    0.00      0.05     0.00  1828453     0.00     0.00  cpsr_read
>>    0.00      0.05     0.00  1820659     0.00     0.00  cpu_get_tb_cpu_state
>>    0.00      0.05     0.00  1441344     0.00     0.00  arm_cpu_tlb_fill
>>    0.00      0.05     0.00  1427177     0.00     0.00  cxl_usp_to_cstate
>>    0.00      0.05     0.00  1161059     0.00     5.85  aarch64_sync_64_to_32
>>    0.00      0.05     0.00   886523     0.00     0.00  helper_iwmmxt_maxsb
>>    0.00      0.05     0.00   831393     0.00     0.00  arm_log_exception
>>    0.00      0.05     0.00   746940     0.00     0.00  
>> helper_v7m_preserve_fp_state
>>    0.00      0.05     0.00   728354     0.00     0.00  hmp_calc_dirty_rate
>>    0.00      0.05     0.00   681634     0.00     0.00  helper_sadd8
>>    0.00      0.05     0.00   487743     0.00     7.14  
>> qmp_query_cpu_definitions
>>    0.00      0.05     0.00   420528     0.00     0.00  
>> arm_v7m_cpu_do_interrupt
>>    0.00      0.05     0.00   382245     0.00     0.00  helper_ssub8
>>    0.00      0.05     0.00   374192     0.00     0.00  helper_usub8
>>    0.00      0.05     0.00   347199     0.00     0.00  usb_msd_load_request
>>    0.00      0.05     0.00   325862     0.00     0.00  target_disas
>>    0.00      0.05     0.00   322375     0.00     0.00  arm_hcrx_el2_eff
>>    0.00      0.05     0.00   317835     0.00     0.00  
>> virtio_bus_device_iommu_enabled
>>    0.00      0.05     0.00   309559     0.00     0.00  
>> mig_throttle_counter_reset
>>    0.00      0.05     0.00   301557     0.00     0.00  ram_bytes_remaining
>>    0.00      0.05     0.00   292888     0.00     0.00  helper_v7m_blxns
>>    0.00      0.05     0.00   289093     0.00     0.00  tpm_util_show_buffer
>>    0.00      0.05     0.00   274156     0.00     0.00  helper_sxtb16
>>    0.00      0.05     0.00   273588     0.00     0.00  write_v7m_exception
>>    0.00      0.05     0.00   271619     0.00     0.00  page_size_init
>>    0.00      0.05     0.00   270247     0.00     0.00  
>> qemu_fdt_setprop_sized_cells_from_array
>>    0.00      0.05     0.00   229643     0.00    14.69  helper_neon_addl_u32


-- 
Alex Bennée



reply via email to

[Prev in Thread] Current Thread [Next in Thread]