qemu-s390x
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: chacha20-s390 broken in 8.2.0 in TCG on s390x


From: Philippe Mathieu-Daudé
Subject: Re: chacha20-s390 broken in 8.2.0 in TCG on s390x
Date: Wed, 3 Jan 2024 15:01:02 +0100
User-agent: Mozilla Thunderbird

On 3/1/24 12:53, Philippe Mathieu-Daudé wrote:
Hi Richard,

On 3/1/24 09:54, Michael Tokarev wrote:
03.01.2024 03:22, Richard Henderson wrote:
On 12/22/23 01:51, Michael Tokarev wrote:
...
git bisect points to this commit:

commit ab84dc398b3b702b0c692538b947ef65dbbdf52f
Author: Richard Henderson <richard.henderson@linaro.org>
Date:   Wed Aug 23 23:04:24 2023 -0700

     tcg/optimize: Optimize env memory operations

So far, this seems to work on amd64 host, but fails on s390x host -
where this has been observed so far.  Maybe it also fails in some
other combinations too, I don't yet know.  Just finished bisecting
it on s390x.

I haven't been able to build a reproducer for this.
Have you an image or kernel you can share?

Sure.

Here's my actual testing "image": http://www.corpit.ru/mjt/tmp/s390x-chacha.tar.gz

It contains vmlinuz and initrd - generated on a debian s390x system using standard
debian tools.

Actual command line I used when doing bisection:

  ~/qemu/b/qemu-system-s390x -append "root=/dev/vda rw" -nographic -smp 2 -drive format=raw,file=vmlinuz,if=virtio -no-user-config -m 1G -kernel vmlinuz -initrd initrd -snapshot

I had a quick look at the reproducer and reduced the code
area to:

void tcg_optimize(TCGContext *s)
{
     ...
         switch (opc) {
         case INDEX_op_ld_vec:
             done = fold_tcg_ld_memcopy(&ctx, op);


static bool fold_tcg_ld_memcopy(OptContext *ctx, TCGOp *op)
{
     ...
     if (src && src->base_type == type) {
         return tcg_opt_gen_mov(ctx, op, temp_arg(dst), temp_arg(src));
     }


static bool tcg_opt_gen_mov(OptContext *ctx, TCGOp *op, TCGArg dst, TCGArg src)
{
     ...
     switch (ctx->type) {
     case TCG_TYPE_V128:
         new_op = INDEX_op_mov_vec;


By disabling this optimization, the test succeeds.

Looking at commit 4caad79f8d ("tcg/s390x: Support 128-bit load/store")
and remembering the constraints change on PPC LQ in
https://lore.kernel.org/qemu-devel/20240102013456.131846-1-richard.henderson@linaro.org/
I wondered if LPQ constraints are correct, but I disabled
TCG_TARGET_HAS_qemu_ldst_i128 and the bug persists (so
re-enabled).

Then disabling TCG_TARGET_HAS_v64 and TCG_TARGET_HAS_v128 the bug
disappears.

Reducing a bit further, it works when disabling rotli_vec opcode
(commit 22cb37b417 "tcg/s390x: Implement vector shift operations"):

---
diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index fbee43d3b0..5f147661e8 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -2918,3 +2918,5 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
     case INDEX_op_orc_vec:
+        return 1;
     case INDEX_op_rotli_vec:
+        return TCG_TARGET_HAS_roti_vec;
     case INDEX_op_rotls_vec:
diff --git a/tcg/s390x/tcg-target.h b/tcg/s390x/tcg-target.h
index e69b0d2ddd..5c18146a40 100644
--- a/tcg/s390x/tcg-target.h
+++ b/tcg/s390x/tcg-target.h
@@ -152,3 +152,3 @@ extern uint64_t s390_facilities[3];
 #define TCG_TARGET_HAS_abs_vec        1
-#define TCG_TARGET_HAS_roti_vec       1
+#define TCG_TARGET_HAS_roti_vec       0
 #define TCG_TARGET_HAS_rots_vec       1
---



reply via email to

[Prev in Thread] Current Thread [Next in Thread]