qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 2/3] sh: movca.l cancel by ocbi


From: Shin-ichiro KAWASAKI
Subject: Re: [Qemu-devel] [PATCH 2/3] sh: movca.l cancel by ocbi
Date: Tue, 13 Jan 2009 09:21:18 +0900
User-agent: Thunderbird 2.0.0.19 (Windows/20081209)

I've completely passed over Vladimir-san's nice patch on movca-ocbi,
even though it is already in qemu-sh staging repository.

http://lists.gnu.org/archive/html/qemu-devel/2008-12/msg00605.html

But now, I can review the patch.  I'll do it later.

Regards,
Shin-ichiro KAWASAKI


Shin-ichiro KAWASAKI wrote:
Thank you for your commmnts!

Edgar E. Iglesias wrote:
On Sun, Jan 11, 2009 at 06:14:40PM +0900, Shin-ichiro KAWASAKI wrote:
Current sh4's "movca.l" instruction is implemented in a same way
as "mov.l". Then, write to memory cannot be canceled by "ocbi"
cache control instrunction.  This makes text area broken on
cache flush by linux kernel.

This patch delays "movca.l" execution and provide the chance to cancel it for "ocbi".
# Thank you Edgar, for your advice!

This patch does,
- on executing "movca.l", just records what and where movca should store data.
- on executing "ocbi", find the corresponding record by movca,
  and delete it to cancel.
- lets TCG produce "delayed_movca" instruction at the first
  instruction which is neither "movca.l" nor "ocbi".
- on executing "delayed_movca", does the data store task,
  according to the record.

Hello,

I think your patch will catch the linux dflush sequence but I've got a
few comments.

There is a complication with the delayed stores and potential mmu
exceptions that they might generate. I think with this approach
you'll take those faults at the next instruction which might not
even be a load/store. Not sure if that can cause problems for sh.
Maybe you could avoid that by making movca a load + store and the
ocbi a store with the original value loaded at movca. But you'll
still get into trouble with exceptions that might break the
movca/ocbi sequence.
That's an idea. Current implementation is "delay and do-or-cancel" way.
But your idea is "do and commit-or-rollback" way.
I'll try to implement it to avoid mmu fault by movca.
The mmu fault by movca will cause invoke exception handler, and break
movca/ocbi sequence.  It will be a quite rare case, however, I'll consider
it for next version.

I don't it's worth having a dynamically sized movca delay buffer.
The movca insn has a restricted addressing mode making it a bit hard
to use back-to-back without alu insns in between. IIU your code
correctly you flush the buffer before every non movca/ocbi insn so I
guess that for most realistic use-cases you'll only need very few
(maybe only 1) entries in the delay buffer.
That is a point I wavered.
I investigated linux kernel and found following lines in
"__flush_dcache_segment_4way()" in "arch/sh/mm/cache-sh4.c".

    a0 = base_addr;
    a1 = a0 + way_incr;
    a2 = a1 + way_incr;
    a3 = a2 + way_incr;
    a0e = base_addr + extent_per_way;
    do {
        asm volatile("ldc %0, sr" : : "r" (sr_with_bl));
        asm volatile("movca.l r0, @%0\n\t"
                 "movca.l r0, @%1\n\t"
                 "movca.l r0, @%2\n\t"
                 "movca.l r0, @%3\n\t"
                 "ocbi @%0\n\t"
                 "ocbi @%1\n\t"
                 "ocbi @%2\n\t"
                 "ocbi @%3\n\t" : :
                 "r" (a0), "r" (a1), "r" (a2), "r" (a3));

Multiple movca.l insns executed consequently, and they access different lines. At least, 4 entries are needed for it. This code shows kernel implementation
decides how many number of movcas are invoked consequently.
That's the reason why I introudcued dynamic size change.
Signed-off-by: Shin-ichiro KAWASAKI <address@hidden>

Index: trunk/target-sh4/op_helper.c
===================================================================
--- trunk/target-sh4/op_helper.c    (revision 6133)
+++ trunk/target-sh4/op_helper.c    (working copy)
@@ -20,6 +20,7 @@
 #include <assert.h>
 #include "exec.h"
 #include "helper.h"
+void *qemu_mallocz(size_t size);
#ifndef CONFIG_USER_ONLY @@ -604,3 +605,57 @@
     d.ll = t0;
     return float64_to_int32_round_to_zero(d.d, &env->fp_status);
 }
+
+void helper_movca(uint32_t val, uint32_t addr)
+{
+    delayed_movca_t *cur = &env->movca_list;
+    delayed_movca_t *prev = NULL;
+    while (cur) {
+    if (!cur->valid) {

Might be overkill but you can merge here, i.e:

    if (!cur->valid || cur->addr == addr) {


+        cur->valid = 1;
+        cur->value = val;
+        cur->addr = addr;
+        return;
+    }
+    prev = cur;
+    cur = cur->next;
+    }
+
+    /* movca entry shortage. allocate it. */
+    prev->next = cur = qemu_mallocz(sizeof(delayed_movca_t));
+    if (cur == NULL) {
+    printf("out of memory for delayed movca. @%08x\n", addr);
+    return;
+    }
+    cur->valid = 1;
+    cur->value = val;
+    cur->addr = addr;
+}
+
+void helper_ocbi(uint32_t addr)
+{
+    delayed_movca_t *cur = &env->movca_list;
+
+    while (cur) {
+    if (cur->valid && cur->addr == addr) { /* found! */
+        cur->valid = 0;
+        return;
+    }
+    cur = cur->next;
+    }

I think you can quite easy catch a few more movca use cases if you
dont compare line offsets, i.e for sh4 you'd mask the addr & ~(32 - 1).
Also, you should not return until you've scanned the entire movca delay
buffer as there might be multiple stores to the same line that need to be
ignored.

example:
    addr &= ~31;
    while (cur) {
       if (cur->valid && (cur->addr & ~31) == addr)
           cur->valid = 0;
       cur = cur->next;
    }

That's a good advice.  I'll try to implement it for next version.

Thanks again for your review!

Regards,
Shin-ichiro KAWASAKI






reply via email to

[Prev in Thread] Current Thread [Next in Thread]