Re: chacha20-s390 broken in 8.2.0 in TCG on s390x

qemu-s390x

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: chacha20-s390 broken in 8.2.0 in TCG on s390x

From:	Philippe Mathieu-Daudé
Subject:	Re: chacha20-s390 broken in 8.2.0 in TCG on s390x
Date:	Wed, 3 Jan 2024 15:01:02 +0100
User-agent:	Mozilla Thunderbird

On 3/1/24 12:53, Philippe Mathieu-Daudé wrote:

Hi Richard,

On 3/1/24 09:54, Michael Tokarev wrote:

03.01.2024 03:22, Richard Henderson wrote:
On 12/22/23 01:51, Michael Tokarev wrote:
...
git bisect points to this commit:

commit ab84dc398b3b702b0c692538b947ef65dbbdf52f
Author: Richard Henderson <richard.henderson@linaro.org>
Date:   Wed Aug 23 23:04:24 2023 -0700

     tcg/optimize: Optimize env memory operations

So far, this seems to work on amd64 host, but fails on s390x host -
where this has been observed so far.  Maybe it also fails in some
other combinations too, I don't yet know.  Just finished bisecting
it on s390x.
I haven't been able to build a reproducer for this.
Have you an image or kernel you can share?
Sure.
Here's my actual testing "image":http://www.corpit.ru/mjt/tmp/s390x-chacha.tar.gz
It contains vmlinuz and initrd - generated on a debian s390x systemusing standard
debian tools.

Actual command line I used when doing bisection:
~/qemu/b/qemu-system-s390x -append "root=/dev/vda rw" -nographic-smp 2 -drive format=raw,file=vmlinuz,if=virtio -no-user-config -m 1G-kernel vmlinuz -initrd initrd -snapshot


I had a quick look at the reproducer and reduced the code
area to:

void tcg_optimize(TCGContext *s)
{
     ...
         switch (opc) {
         case INDEX_op_ld_vec:
             done = fold_tcg_ld_memcopy(&ctx, op);


static bool fold_tcg_ld_memcopy(OptContext *ctx, TCGOp *op)
{
     ...
     if (src && src->base_type == type) {
         return tcg_opt_gen_mov(ctx, op, temp_arg(dst), temp_arg(src));
     }

static bool tcg_opt_gen_mov(OptContext *ctx, TCGOp *op, TCGArg dst,TCGArg src)

{
     ...
     switch (ctx->type) {
     case TCG_TYPE_V128:
         new_op = INDEX_op_mov_vec;


By disabling this optimization, the test succeeds.

Looking at commit 4caad79f8d ("tcg/s390x: Support 128-bit load/store")
and remembering the constraints change on PPC LQ in
https://lore.kernel.org/qemu-devel/20240102013456.131846-1-richard.henderson@linaro.org/
I wondered if LPQ constraints are correct, but I disabled
TCG_TARGET_HAS_qemu_ldst_i128 and the bug persists (so
re-enabled).

Then disabling TCG_TARGET_HAS_v64 and TCG_TARGET_HAS_v128 the bug
disappears.


Reducing a bit further, it works when disabling rotli_vec opcode
(commit 22cb37b417 "tcg/s390x: Implement vector shift operations"):

---
diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index fbee43d3b0..5f147661e8 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc

@@ -2918,3 +2918,5 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGTypetype, unsigned vece)

     case INDEX_op_orc_vec:
+        return 1;
     case INDEX_op_rotli_vec:
+        return TCG_TARGET_HAS_roti_vec;
     case INDEX_op_rotls_vec:
diff --git a/tcg/s390x/tcg-target.h b/tcg/s390x/tcg-target.h
index e69b0d2ddd..5c18146a40 100644
--- a/tcg/s390x/tcg-target.h
+++ b/tcg/s390x/tcg-target.h
@@ -152,3 +152,3 @@ extern uint64_t s390_facilities[3];
 #define TCG_TARGET_HAS_abs_vec        1
-#define TCG_TARGET_HAS_roti_vec       1
+#define TCG_TARGET_HAS_roti_vec       0
 #define TCG_TARGET_HAS_rots_vec       1
---

[Prev in Thread]

Current Thread

[Next in Thread]

Re: chacha20-s390 broken in 8.2.0 in TCG on s390x, Philippe Mathieu-Daudé, 2024/01/03
- Re: chacha20-s390 broken in 8.2.0 in TCG on s390x, Philippe Mathieu-Daudé <=
  - Re: chacha20-s390 broken in 8.2.0 in TCG on s390x, Philippe Mathieu-Daudé, 2024/01/03
    - Re: chacha20-s390 broken in 8.2.0 in TCG on s390x, Richard Henderson, 2024/01/03
    - Re: chacha20-s390 broken in 8.2.0 in TCG on s390x, Michael Tokarev, 2024/01/17
    - Re: chacha20-s390 broken in 8.2.0 in TCG on s390x, Alex Bennée, 2024/01/17
    - Re: chacha20-s390 broken in 8.2.0 in TCG on s390x, Philippe Mathieu-Daudé, 2024/01/17

Prev by Date: Re: [PATCH 04/20] tcg/s390x: Implement vector NAND, NOR, EQV
Next by Date: Re: chacha20-s390 broken in 8.2.0 in TCG on s390x
Previous by thread: Re: chacha20-s390 broken in 8.2.0 in TCG on s390x
Next by thread: Re: chacha20-s390 broken in 8.2.0 in TCG on s390x
Index(es):
- Date
- Thread