qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v3 11/11] translate-all: add tb hash bucket info


From: Richard Henderson
Subject: Re: [Qemu-devel] [PATCH v3 11/11] translate-all: add tb hash bucket info to 'info jit' dump
Date: Sun, 24 Apr 2016 12:46:08 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.7.1

On 04/22/2016 04:57 PM, Emilio G. Cota wrote:
On Fri, Apr 22, 2016 at 12:59:52 -0700, Richard Henderson wrote:
FWIW, so that I could get an idea of how the stats change as we improve the
hashing, I inserted the attachment 1 patch between patches 5 and 6, and with
attachment 2 attempting to fix the accounting for patches 9 and 10.

For qht, I dislike the approach of reporting "avg chain" per-element,
instead of per-bucket. Performance for a bucket whose entries are
all valid is virtually the same as that of a bucket that only
has one valid element; thus, with per-bucket reporting, we'd say that
the chain lenght is 1 in both cases, i.e. "perfect". With per-element
reporting, we'd report 4 (on a 64-bit host, since that's the value of
QHT_BUCKET_ENTRIES) when the bucket is full, which IMO gives the
wrong idea (users would think they're in trouble, when they're not).

But otherwise you have no way of knowing how full the buckets are. The bucket size is just something that you have to keep in mind.

If those numbers are off, then either this
     assert(hinfo.used_entries ==
            tcg_ctx.tb_ctx.nb_tbs - tcg_ctx.tb_ctx.tb_phys_invalidate_count);
should trigger, or the accounting isn't right.

I think I used an NDEBUG build, so these weren't effective.

Only the second report ("after 7/11") seems good (taking into account
lack of precision of just 3 decimals):
   5.26*32582=171381.32 ~= 171367
which leads me to believe that you've used the TB and invalidate
counts from that test.

The TB and invalidate numbers are repeatable; the same every time.

You can try, but I think performance wouldn't be great, because
the comparison function would be called way too often due to the
ht using open addressing. The problem there is not only the comparisons
themselves, but the all the cache lines needed to read the fields of
the comparison. I haven't tested libiberty's htable but I did test
the htable in concurrencykit[1], which also uses open addressing.

You are right that having the full hash for primary comparison is a big win, especially with how complex our comparison functions are. And you're right that we have to have two of them.

This led me to a design that had buckets with a small set of
hash & pointer pairs, all in the same cache line as the head (then
I discovered somebody else had thought of this, and that's why there's
a link to the CLHT paper in qht.c).

Fair.  It's a good design.


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]