[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v3 0/6] trace: [tcg] Optimize per-vCPU tracing s
From: |
Lluís Vilanova |
Subject: |
Re: [Qemu-devel] [PATCH v3 0/6] trace: [tcg] Optimize per-vCPU tracing states with separate TB caches |
Date: |
Mon, 26 Dec 2016 20:41:07 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) |
Richard Henderson writes:
> On 12/23/2016 10:51 AM, Lluís Vilanova wrote:
>>> On 12/22/2016 10:35 AM, Lluís Vilanova wrote:
>>>> To handle both issues, this series replicates the shared physical TB cache,
>>>> creating a separate physical TB cache for every combination of event states
>>>> (those with the 'vcpu' and 'tcg' properties). Then, all vCPUs tracing the
>>>> same
>>>> events will use the same physical TB cache.
>>
>>> Why do we need to "split the physical TB cache" as opposed to simply
>>> including
>>> the trace state into the TB hash function?
>>
>> Mmmm, that's an interesting alternative I did not consider. Are you aiming at
>> minimizing the changes, or do you also think it would be more efficient?
> I suspect that it will be more efficient.
I discovered that I ran my tests with "--enable-debug". Without it, the
performance changes for this series look within the noise range (great news!
guest code events are now for free), and the TB hash function approach should
show the same type of results on the hot path:
* vanilla, statically disabled
real 0m2,259s
user 0m2,252s
sys 0m0,004s
* vanilla, statically enabled (overhead: 2.18x)
real 0m4,921s
user 0m4,912s
sys 0m0,008s
* multi-tb, statically disabled (overhead: 1.00x)
real 0m2,269s
user 0m2,248s
sys 0m0,020s
* multi-tb, statically enabled (overhead: 0.98x)
real 0m2,223s
user 0m2,204s
sys 0m0,016s
The only drawback I see with going for your proposal is the possibility of a
higher TB collision rate when users set vCPUs with different tracing states
(after all, my series has a separate QHT for each combination of tracing
states).
>> The dynamic tracing state would then be an arbitrarily long bitmap (defined
>> by
>> the number of events with the 'vcpu' property), so I'm not sure how to fit it
>> into the hashing function with minimal collisions (the bitmap is now limited
>> to
>> an unsigned long to use it as an index to the TB cache "matrix").
> You could consider that index a unique identifier for the tracing state, and
> then only compare and hash that integer.
Yes, I get that. I meant that the bitmap has a theoretically unbounded size. But
we can limit that to some arbitrary number to keep the hashing function fixed
(in your example, you're using 32 bits, which should be enough).
>> The other drawback I see is that then it would also take longer to compute
>> the
>> hashing function, instead of the simpler array indexing. As a benefit,
>> workloads
>> with a high frequency of TB-flushing operations might be a bit faster (there
>> would be a single QHT).
> I don't see adding one more integer to the hashing function to be significant
> at all. Certainly not the 15% that you describe in your cover letter.
See results above.
>> If someone can provide me the code for the modified hash lookup function to
>> account for the trace dstate bitmap contents, I will integrate it and
>> measure if
>> there is any important change in performance.
> Something like the following should do it. There are two /* cpu->??? */
> markers that would need to be filled in.
Thanks for the code! If you think the TB hash function is a less invasive
change, then I'll send a new series with that.
> If you can reduce the tracing identifier to 8 bits, that would be excellent.
> I've been wanting to make some other changes to TB hashing, and that would fit
> in well with a second "flags" value.
With all the vCPU events I have in the queue, it should be on the order of 16
bits. But we cannot shrink it down to 8 bits at compile time now, since we no
longer have a define with the number of vCPU events.
Cheers,
Lluis
- [Qemu-devel] [PATCH v3 0/6] trace: [tcg] Optimize per-vCPU tracing states with separate TB caches, Lluís Vilanova, 2016/12/22
- [Qemu-devel] [PATCH v3 2/6] trace: Make trace_get_vcpu_event_count() inlinable, Lluís Vilanova, 2016/12/22
- [Qemu-devel] [PATCH v3 6/6] trace: [tcg, trivial] Re-align generated code, Lluís Vilanova, 2016/12/22
- [Qemu-devel] [PATCH v3 4/6] exec: [tcg] Switch physical TB cache based on vCPU tracing state, Lluís Vilanova, 2016/12/22
- [Qemu-devel] [PATCH v3 1/6] exec: [tcg] Refactor flush of per-CPU virtual TB cache, Lluís Vilanova, 2016/12/22
- [Qemu-devel] [PATCH v3 3/6] exec: [tcg] Use multiple physical TB caches, Lluís Vilanova, 2016/12/22
- [Qemu-devel] [PATCH v3 5/6] trace: [tcg] Do not generate TCG code to trace dinamically-disabled events, Lluís Vilanova, 2016/12/22
- Re: [Qemu-devel] [PATCH v3 0/6] trace: [tcg] Optimize per-vCPU tracing states with separate TB caches, Richard Henderson, 2016/12/23