Re: [PATCH v5 00/13] WIP: Use Intel DSA accelerator to offload zero page

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v5 00/13] WIP: Use Intel DSA accelerator to offload zero page

From:	Michael S. Tsirkin
Subject:	Re: [PATCH v5 00/13] WIP: Use Intel DSA accelerator to offload zero page checking in multifd live migration.
Date:	Thu, 11 Jul 2024 18:49:27 -0400

On Thu, Jul 11, 2024 at 02:52:35PM -0700, Yichen Wang wrote:
> * Performance:
> 
> We use two Intel 4th generation Xeon servers for testing.
> 
> Architecture:        x86_64
> CPU(s):              192
> Thread(s) per core:  2
> Core(s) per socket:  48
> Socket(s):           2
> NUMA node(s):        2
> Vendor ID:           GenuineIntel
> CPU family:          6
> Model:               143
> Model name:          Intel(R) Xeon(R) Platinum 8457C
> Stepping:            8
> CPU MHz:             2538.624
> CPU max MHz:         3800.0000
> CPU min MHz:         800.0000
> 
> We perform multifd live migration with below setup:
> 1. VM has 100GB memory. 
> 2. Use the new migration option multifd-set-normal-page-ratio to control the 
> total
> size of the payload sent over the network.
> 3. Use 8 multifd channels.
> 4. Use tcp for live migration.
> 4. Use CPU to perform zero page checking as the baseline.
> 5. Use one DSA device to offload zero page checking to compare with the 
> baseline.
> 6. Use "perf sched record" and "perf sched timehist" to analyze CPU usage.
> 
> A) Scenario 1: 50% (50GB) normal pages on an 100GB vm.
> 
>       CPU usage
> 
>       |---------------|---------------|---------------|---------------|
>       |               |comm           |runtime(msec)  |totaltime(msec)|
>       |---------------|---------------|---------------|---------------|
>       |Baseline       |live_migration |5657.58        |               |
>       |               |multifdsend_0  |3931.563       |               |
>       |               |multifdsend_1  |4405.273       |               |
>       |               |multifdsend_2  |3941.968       |               |
>       |               |multifdsend_3  |5032.975       |               |
>       |               |multifdsend_4  |4533.865       |               |
>       |               |multifdsend_5  |4530.461       |               |
>       |               |multifdsend_6  |5171.916       |               |
>       |               |multifdsend_7  |4722.769       |41922          |
>       |---------------|---------------|---------------|---------------|
>       |DSA            |live_migration |6129.168       |               |
>       |               |multifdsend_0  |2954.717       |               |
>       |               |multifdsend_1  |2766.359       |               |
>       |               |multifdsend_2  |2853.519       |               |
>       |               |multifdsend_3  |2740.717       |               |
>       |               |multifdsend_4  |2824.169       |               |
>       |               |multifdsend_5  |2966.908       |               |
>       |               |multifdsend_6  |2611.137       |               |
>       |               |multifdsend_7  |3114.732       |               |
>       |               |dsa_completion |3612.564       |32568          |
>       |---------------|---------------|---------------|---------------|
> 
> Baseline total runtime is calculated by adding up all multifdsend_X
> and live_migration threads runtime. DSA offloading total runtime is
> calculated by adding up all multifdsend_X, live_migration and
> dsa_completion threads runtime. 41922 msec VS 32568 msec runtime and
> that is 23% total CPU usage savings.


Here the DSA was mostly idle.

Sounds good but a question: what if several qemu instances are
migrated in parallel?

Some accelerators tend to basically stall if several tasks
are trying to use them at the same time.

Where is the boundary here?




-- 
MST

[Prev in Thread]

Current Thread

[Next in Thread]

[PATCH v5 05/13] util/dsa: Implement DSA task asynchronous completion thread model., (continued)
- [PATCH v5 05/13] util/dsa: Implement DSA task asynchronous completion thread model., Yichen Wang, 2024/07/11
- [PATCH v5 07/13] util/dsa: Implement DSA task asynchronous submission and wait for completion., Yichen Wang, 2024/07/11
- [PATCH v5 06/13] util/dsa: Implement zero page checking in DSA task., Yichen Wang, 2024/07/11
- [PATCH v5 08/13] migration/multifd: Add new migration option for multifd DSA offloading., Yichen Wang, 2024/07/11
  - Re: [PATCH v5 08/13] migration/multifd: Add new migration option for multifd DSA offloading., Yichen Wang, 2024/07/11
    - Re: [PATCH v5 08/13] migration/multifd: Add new migration option for multifd DSA offloading., Fabiano Rosas, 2024/07/16
    - Re: [PATCH v5 08/13] migration/multifd: Add new migration option for multifd DSA offloading., Fabiano Rosas, 2024/07/17
  - Re: [PATCH v5 08/13] migration/multifd: Add new migration option for multifd DSA offloading., Fabiano Rosas, 2024/07/17
- [PATCH v5 09/13] migration/multifd: Prepare to introduce DSA acceleration on the multifd path., Yichen Wang, 2024/07/11
  - Re: [PATCH v5 09/13] migration/multifd: Prepare to introduce DSA acceleration on the multifd path., Fabiano Rosas, 2024/07/17
- Re: [PATCH v5 00/13] WIP: Use Intel DSA accelerator to offload zero page checking in multifd live migration., Michael S. Tsirkin <=
  - RE: [PATCH v5 00/13] WIP: Use Intel DSA accelerator to offload zero page checking in multifd live migration., Liu, Yuan1, 2024/07/15
    - Re: [PATCH v5 00/13] WIP: Use Intel DSA accelerator to offload zero page checking in multifd live migration., Michael S. Tsirkin, 2024/07/15
    - RE: [PATCH v5 00/13] WIP: Use Intel DSA accelerator to offload zero page checking in multifd live migration., Liu, Yuan1, 2024/07/15
    - Re: [PATCH v5 00/13] WIP: Use Intel DSA accelerator to offload zero page checking in multifd live migration., Michael S. Tsirkin, 2024/07/15
    - RE: [PATCH v5 00/13] WIP: Use Intel DSA accelerator to offload zero page checking in multifd live migration., Liu, Yuan1, 2024/07/15
    - RE: [PATCH v5 00/13] WIP: Use Intel DSA accelerator to offload zero page checking in multifd live migration., Liu, Yuan1, 2024/07/15
    - Re: [PATCH v5 00/13] WIP: Use Intel DSA accelerator to offload zero page checking in multifd live migration., Michael S. Tsirkin, 2024/07/15
    - RE: [PATCH v5 00/13] WIP: Use Intel DSA accelerator to offload zero page checking in multifd live migration., Liu, Yuan1, 2024/07/15
    - Re: [PATCH v5 00/13] WIP: Use Intel DSA accelerator to offload zero page checking in multifd live migration., Michael S. Tsirkin, 2024/07/15
    - RE: [PATCH v5 00/13] WIP: Use Intel DSA accelerator to offload zero page checking in multifd live migration., Liu, Yuan1, 2024/07/15

Prev by Date: [PATCH v8 10/13] target/riscv: Start counters from both mhpmcounter and mcountinhibit
Next by Date: Re: [PATCH v6 4/7] tests/tcg: add mechanism to run specific tests with plugins
Previous by thread: Re: [PATCH v5 09/13] migration/multifd: Prepare to introduce DSA acceleration on the multifd path.
Next by thread: RE: [PATCH v5 00/13] WIP: Use Intel DSA accelerator to offload zero page checking in multifd live migration.
Index(es):
- Date
- Thread