[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 2/3] target/arm: add FEAT_TLBIRANGE support
From: |
Richard Henderson |
Subject: |
Re: [PATCH 2/3] target/arm: add FEAT_TLBIRANGE support |
Date: |
Tue, 15 Dec 2020 08:55:14 -0600 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 |
On 12/14/20 2:23 PM, Rebecca Cran wrote:
> ARMv8.4 adds the mandatory FEAT_TLBIRANGE, which provides instructions
> for invalidating ranges of entries.
>
> Signed-off-by: Rebecca Cran <rebecca@nuviainc.com>
> ---
> accel/tcg/cputlb.c | 24 ++
> include/exec/exec-all.h | 39 +++
> target/arm/helper.c | 273 ++++++++++++++++++++
> 3 files changed, 336 insertions(+)
>
> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
> index 42ab79c1a582..103f363b42f3 100644
> --- a/accel/tcg/cputlb.c
> +++ b/accel/tcg/cputlb.c
> @@ -603,6 +603,30 @@ void tlb_flush_page(CPUState *cpu, target_ulong addr)
> tlb_flush_page_by_mmuidx(cpu, addr, ALL_MMUIDX_BITS);
> }
>
> +void tlb_flush_page_range_by_mmuidx(CPUState *cpu, target_ulong addr,
> + int num_pages, uint16_t idxmap)
> +{
> + int i;
> +
> + for (i = 0; i < num_pages; i++) {
> + tlb_flush_page_by_mmuidx(cpu, addr + (i * TARGET_PAGE_SIZE), idxmap);
> + }
> +}
> +
> +void tlb_flush_page_range_by_mmuidx_all_cpus_synced(CPUState *src_cpu,
> + target_ulong addr,
> + int num_pages,
> + uint16_t idxmap)
> +{
> + int i;
> +
> + for (i = 0; i < num_pages; i++) {
> + tlb_flush_page_by_mmuidx_all_cpus_synced(src_cpu,
> + addr + (i *
> TARGET_PAGE_SIZE),
> + idxmap);
> + }
> +}
This is a poor way to structure these functions, because each of these calls is
synchronized. You want to do the cross-cpu call once for the entire set of
pages, synchronizing once at the end.
In addition, tlb_flush_page is insufficient for aarch64, because of TBI. We
need a version of tlb_flush_page_bits that takes the length of the flush.
This *could* be implemented as a full flush, in the short term.
You could round the length outward to a mask, then merge the low-bit mask of
the length with the high-bit mask of TBI. That will catch a few more pages
than architecturally required, but less than a full flush.
Certainly I don't think you ever want to perform this loop 32 (max num) * 16
(max scale) * 64 (max page size) = 32768 times.
r~