qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v4 12/21] target-arm: A64: Implement DC ZVA


From: Peter Maydell
Subject: Re: [Qemu-devel] [PATCH v4 12/21] target-arm: A64: Implement DC ZVA
Date: Fri, 7 Mar 2014 15:11:37 +0000

On 7 March 2014 14:51, Richard Henderson <address@hidden> wrote:
> On 03/06/2014 11:32 AM, Peter Maydell wrote:
>> +/**
>> + * tlb_vaddr_to_host:
>> + * @env: CPUArchState
>> + * @addr: guest virtual address to look up
>> + * @mmu_idx: MMU index to use for lookup
>> + *
>> + * Look up the specified guest virtual index in the TCG softmmu TLB.
>> + * If the TLB contains a host virtual address suitable for direct RAM
>> + * access, then return it. Otherwise (TLB miss, TLB entry is for an
>> + * I/O access, etc) return NULL.
>> + *
>> + * This is the equivalent of the initial fast-path code used by
>> + * TCG backends for guest load and store accesses.
>> + */
>> +static inline void *tlb_vaddr_to_host(CPUArchState *env, target_ulong addr,
>> +                                      int mmu_idx)
>> +{
>> +    int index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
>> +    target_ulong tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
>
> Somewhere I think the function name or at least the block comment should
> indicate that this lookup is for writing, since we hard-code addr_write here.

Doh, yes. I forgot that when I was shifting the code into
a more general function. Is it worth parameterising this for
read vs write lookups?

>> +void HELPER(dc_zva)(CPUARMState *env, uint64_t vaddr_in)
>> +{
>> +    /* Implement DC ZVA, which zeroes a fixed-length block of memory.
>> +     * Note that we do not implement the (architecturally mandated)
>> +     * alignment fault for attempts to use this on Device memory
>> +     * (which matches the usual QEMU behaviour of not implementing either
>> +     * alignment faults or any memory attribute handling).
>> +     */
>> +
>> +    ARMCPU *cpu = arm_env_get_cpu(env);
>> +    uint64_t blocklen = 4 << cpu->dcz_blocksize;
>> +    uint64_t vaddr = vaddr_in & ~(blocklen - 1);
>> +
>> +#ifndef CONFIG_USER_ONLY
>> +    {
>> +        /* Slightly awkwardly, QEMU's TARGET_PAGE_SIZE may be less than
>> +         * the block size so we might have to do more than one TLB lookup.
>> +         * We know that in fact for any v8 CPU the page size is at least 4K
>> +         * and the block size must be 2K or less, but TARGET_PAGE_SIZE is 
>> only
>> +         * 1K as an artefact of legacy v5 subpage support being present in 
>> the
>> +         * same QEMU executable.
>> +         */
>> +        int maxidx = DIV_ROUND_UP(blocklen, TARGET_PAGE_SIZE);
>> +        void *hostaddr[maxidx];
>
> What's the maximum blocksize?  Did you really need dynamic allocation here?

Max blocksize architecturally currently is 2K. Dynamic
allocation seemed cleaner code than hardcoding "this will
always have either 1 or 2 elements", though. (The field
in the "how big is a block?" register would allow more than
2K, since that is encoded as 0b1001 in a 4 bit field.)


>
>> +        int try, i;
>> +
>> +        for (try = 0; try < 2; try++) {
>> +
>> +            for (i = 0; i < maxidx; i++) {
>> +                hostaddr[i] = tlb_vaddr_to_host(env,
>> +                                                vaddr + TARGET_PAGE_SIZE * 
>> i,
>> +                                                cpu_mmu_index(env));
>> +                if (!hostaddr[i]) {
>> +                    break;
>> +                }
>> +            }
>> +            if (i == maxidx) {
>> +                /* If it's all in the TLB it's fair game for just writing 
>> to;
>> +                 * we know we don't need to update dirty status, etc.
>> +                 */
>> +                for (i = 0; i < maxidx - 1; i++) {
>> +                    memset(hostaddr[i], 0, TARGET_PAGE_SIZE);
>> +                }
>> +                memset(hostaddr[i], 0, blocklen - (i * TARGET_PAGE_SIZE));
>> +                return;
>> +            }
>> +            /* OK, try a store and see if we can populate the tlb. This
>> +             * might cause an exception if the memory isn't writable,
>> +             * in which case we will longjmp out of here. We must for
>> +             * this purpose use the actual register value passed to us
>> +             * so that we get the fault address right.
>> +             */
>> +            cpu_stb_data(env, vaddr_in, 0);
>> +            /* Now we can populate the other TLB entries, if any */
>> +            for (i = 0; i < maxidx; i++) {
>> +                uint64_t va = vaddr + TARGET_PAGE_SIZE * i;
>> +                if (va != (vaddr_in & TARGET_PAGE_MASK)) {
>> +                    cpu_stb_data(env, va, 0);
>> +                }
>> +            }
>
> cpu_stb_data doesn't take into account user vs kernel mode accesses.

...so what does it use for the mmu index?

>  Maybe
> better off using helper_ret_stb_mmu, and passing along GETRA().

OK.

> As a bonus, you'll have accurate exceptions should the access throw, so you
> don't need to force the save of PC before calling the helper.  Which... I 
> don't
> see you doing, so perhaps there's a bug here at the moment.

Mmm. (In system mode we'll save PC as a side effect of having
an accessfn defined for the DC_ZVA reginfo.)

thanks
-- PMM



reply via email to

[Prev in Thread] Current Thread [Next in Thread]