[PULL,52/80] tcg/s390x: Support 128-bit load/store

Message ID	20230516194145.1749305-53-richard.henderson@linaro.org
State	Superseded
Headers	show Delivered-To: patch@linaro.org Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; From: Richard Henderson <richard.henderson@linaro.org> To: qemu-devel@nongnu.org Cc: Peter Maydell <peter.maydell@linaro.org> Subject: [PULL 52/80] tcg/s390x: Support 128-bit load/store Date: Tue, 16 May 2023 12:41:17 -0700 Message-Id: <20230516194145.1749305-53-richard.henderson@linaro.org> In-Reply-To: <20230516194145.1749305-1-richard.henderson@linaro.org> References: <20230516194145.1749305-1-richard.henderson@linaro.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=2607:f8b0:4864:20::1036; envelope-from=richard.henderson@linaro.org; helo=mail-pj1-x1036.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action Precedence: list Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org
Series	[PULL,01/80] tcg/i386: Set P_REXW in tcg_out_addi_ptr \| expand [PULL,01/80] tcg/i386: Set P_REXW in tcg_out_addi_ptr [PULL,02/80] include/exec/memop: Add MO_ATOM_* [PULL,03/80] accel/tcg: Honor atomicity of loads [PULL,04/80] accel/tcg: Honor atomicity of stores [PULL,05/80] tcg: Unify helper_{be,le}_{ld,st}* [PULL,06/80] accel/tcg: Implement helper_{ld,st}_mmu for user-only [PULL,07/80] tcg/tci: Use helper_{ld,st}_mmu for user-only [PULL,08/80] tcg: Add 128-bit guest memory primitives [PULL,09/80] meson: Detect atomic128 support with optimization [PULL,10/80] tcg/i386: Add have_atomic16 [PULL,11/80] tcg/aarch64: Detect have_lse, have_lse2 for linux [PULL,12/80] tcg/aarch64: Detect have_lse, have_lse2 for darwin [PULL,13/80] tcg/i386: Use full load/store helpers in user-only mode [PULL,14/80] tcg/aarch64: Use full load/store helpers in user-only mode [PULL,15/80] tcg/ppc: Use full load/store helpers in user-only mode [PULL,16/80] tcg/loongarch64: Use full load/store helpers in user-only mode [PULL,17/80] tcg/riscv: Use full load/store helpers in user-only mode [PULL,18/80] tcg/arm: Adjust constraints on qemu_ld/st [PULL,19/80] tcg/arm: Use full load/store helpers in user-only mode [PULL,20/80] tcg/mips: Use full load/store helpers in user-only mode [PULL,21/80] tcg/s390x: Use full load/store helpers in user-only mode [PULL,22/80] tcg/sparc64: Allocate %g2 as a third temporary [PULL,23/80] tcg/sparc64: Rename tcg_out_movi_imm13 to tcg_out_movi_s13 [PULL,24/80] target/sparc64: Remove tcg_out_movi_s13 case from tcg_out_movi_imm32 [PULL,25/80] tcg/sparc64: Rename tcg_out_movi_imm32 to tcg_out_movi_u32 [PULL,26/80] tcg/sparc64: Split out tcg_out_movi_s32 [PULL,27/80] tcg/sparc64: Use standard slow path for softmmu [PULL,28/80] accel/tcg: Remove helper_unaligned_{ld,st} [PULL,29/80] tcg/loongarch64: Check the host supports unaligned accesses [PULL,30/80] tcg/loongarch64: Support softmmu unaligned accesses [PULL,31/80] tcg/riscv: Support softmmu unaligned accesses [PULL,32/80] tcg: Introduce tcg_target_has_memory_bswap [PULL,33/80] tcg: Add INDEX_op_qemu_{ld,st}_i128 [PULL,34/80] tcg: Introduce tcg_out_movext3 [PULL,35/80] tcg: Merge tcg_out_helper_load_regs into caller [PULL,36/80] tcg: Support TCG_TYPE_I128 in tcg_out_{ld, st}_helper_{args, ret} [PULL,37/80] tcg: Introduce atom_and_align_for_opc [PULL,38/80] tcg/i386: Use atom_and_align_for_opc [PULL,39/80] tcg/aarch64: Use atom_and_align_for_opc [PULL,40/80] tcg/arm: Use atom_and_align_for_opc [PULL,41/80] tcg/loongarch64: Use atom_and_align_for_opc [PULL,42/80] tcg/mips: Use atom_and_align_for_opc [PULL,43/80] tcg/ppc: Use atom_and_align_for_opc [PULL,44/80] tcg/riscv: Use atom_and_align_for_opc [PULL,45/80] tcg/s390x: Use atom_and_align_for_opc [PULL,46/80] tcg/sparc64: Use atom_and_align_for_opc [PULL,47/80] tcg/i386: Honor 64-bit atomicity in 32-bit mode [PULL,48/80] tcg/i386: Support 128-bit load/store with have_atomic16 [PULL,49/80] tcg/aarch64: Rename temporaries [PULL,50/80] tcg/aarch64: Support 128-bit load/store [PULL,51/80] tcg/ppc: Support 128-bit load/store [PULL,52/80] tcg/s390x: Support 128-bit load/store [PULL,53/80] tcg: Split out memory ops to tcg-op-ldst.c [PULL,54/80] tcg: Widen gen_insn_data to uint64_t [PULL,55/80] accel/tcg: Widen tcg-ldst.h addresses to uint64_t [PULL,56/80] tcg: Widen helper_{ld,st}_i128 addresses to uint64_t [PULL,57/80] tcg: Widen helper_atomic_* addresses to uint64_t [PULL,58/80] tcg: Widen tcg_gen_code pc_start argument to uint64_t [PULL,59/80] accel/tcg: Merge gen_mem_wrapped with plugin_gen_empty_mem_callback [PULL,60/80] accel/tcg: Merge do_gen_mem_cb into caller [PULL,61/80] tcg: Reduce copies for plugin_gen_mem_callbacks [PULL,62/80] accel/tcg: Widen plugin_gen_empty_mem_callback to i64 [PULL,63/80] tcg: Add addr_type to TCGContext [PULL,64/80] tcg: Remove TCGv from tcg_gen_qemu_{ld,st}_* [PULL,65/80] tcg: Remove TCGv from tcg_gen_atomic_* [PULL,66/80] tcg: Split INDEX_op_qemu_{ld, st}* for guest address size [PULL,67/80] tcg/tci: Elimnate TARGET_LONG_BITS, target_ulong [PULL,68/80] tcg/i386: Always enable TCG_TARGET_HAS_extr[lh]_i64_i32 [PULL,69/80] tcg/i386: Conditionalize tcg_out_extu_i32_i64 [PULL,70/80] tcg/i386: Adjust type of tlb_mask [PULL,71/80] tcg/i386: Remove TARGET_LONG_BITS, TCG_TYPE_TL [PULL,72/80] tcg/arm: Remove TARGET_LONG_BITS [PULL,73/80] tcg/aarch64: Remove USE_GUEST_BASE [PULL,74/80] tcg/aarch64: Remove TARGET_LONG_BITS, TCG_TYPE_TL [PULL,75/80] tcg/loongarch64: Remove TARGET_LONG_BITS, TCG_TYPE_TL [PULL,76/80] tcg/mips: Remove TARGET_LONG_BITS, TCG_TYPE_TL [PULL,77/80] tcg: Remove TARGET_LONG_BITS, TCG_TYPE_TL [PULL,78/80] tcg: Add page_bits and page_mask to TCGContext [PULL,79/80] tcg: Add tlb_dyn_max_bits to TCGContext [PULL,80/80] tcg: Split out exec/user/guest-base.h

Message ID

20230516194145.1749305-53-richard.henderson@linaro.org

State

Superseded

Headers

Received-SPF: pass (google.com: domain of
 qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as
 permitted sender) client-ip=209.51.188.17;
From: Richard Henderson <richard.henderson@linaro.org>
To: qemu-devel@nongnu.org
Cc: Peter Maydell <peter.maydell@linaro.org>
Subject: [PULL 52/80] tcg/s390x: Support 128-bit load/store
Date: Tue, 16 May 2023 12:41:17 -0700
Message-Id: <20230516194145.1749305-53-richard.henderson@linaro.org>
In-Reply-To: <20230516194145.1749305-1-richard.henderson@linaro.org>
References: <20230516194145.1749305-1-richard.henderson@linaro.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Received-SPF: pass client-ip=2607:f8b0:4864:20::1036;
 envelope-from=richard.henderson@linaro.org; helo=mail-pj1-x1036.google.com
X-Spam_score_int: -20
X-Spam_score: -2.1
X-Spam_bar: --
X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
 DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,
 RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001,
 T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org
Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org

Series

[PULL,01/80] tcg/i386: Set P_REXW in tcg_out_addi_ptr | expand

Commit Message

Richard Henderson May 16, 2023, 7:41 p.m. UTC

Use LPQ/STPQ when 16-byte atomicity is required.
Note that these instructions require 16-byte alignment.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/s390x/tcg-target-con-set.h |   2 +
 tcg/s390x/tcg-target.h         |   2 +-
 tcg/s390x/tcg-target.c.inc     | 103 ++++++++++++++++++++++++++++++++-
 3 files changed, 103 insertions(+), 4 deletions(-)

diff --git a/tcg/s390x/tcg-target-con-set.h b/tcg/s390x/tcg-target-con-set.h
index ecc079bb6d..cbad91b2b5 100644
--- a/tcg/s390x/tcg-target-con-set.h
+++ b/tcg/s390x/tcg-target-con-set.h
@@ -14,6 +14,7 @@  C_O0_I2(r, r)
 C_O0_I2(r, ri)
 C_O0_I2(r, rA)
 C_O0_I2(v, r)
+C_O0_I3(o, m, r)
 C_O1_I1(r, r)
 C_O1_I1(v, r)
 C_O1_I1(v, v)
@@ -36,6 +37,7 @@  C_O1_I2(v, v, v)
 C_O1_I3(v, v, v, v)
 C_O1_I4(r, r, ri, rI, r)
 C_O1_I4(r, r, rA, rI, r)
+C_O2_I1(o, m, r)
 C_O2_I2(o, m, 0, r)
 C_O2_I2(o, m, r, r)
 C_O2_I3(o, m, 0, 1, r)
diff --git a/tcg/s390x/tcg-target.h b/tcg/s390x/tcg-target.h
index 170007bea5..ec96952172 100644
--- a/tcg/s390x/tcg-target.h
+++ b/tcg/s390x/tcg-target.h
@@ -140,7 +140,7 @@  extern uint64_t s390_facilities[3];
 #define TCG_TARGET_HAS_muluh_i64      0
 #define TCG_TARGET_HAS_mulsh_i64      0
 
-#define TCG_TARGET_HAS_qemu_ldst_i128 0
+#define TCG_TARGET_HAS_qemu_ldst_i128 1
 
 #define TCG_TARGET_HAS_v64            HAVE_FACILITY(VECTOR)
 #define TCG_TARGET_HAS_v128           HAVE_FACILITY(VECTOR)
diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index 8e34b214fc..835daa51fa 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -243,6 +243,7 @@  typedef enum S390Opcode {
     RXY_LLGF    = 0xe316,
     RXY_LLGH    = 0xe391,
     RXY_LMG     = 0xeb04,
+    RXY_LPQ     = 0xe38f,
     RXY_LRV     = 0xe31e,
     RXY_LRVG    = 0xe30f,
     RXY_LRVH    = 0xe31f,
@@ -253,6 +254,7 @@  typedef enum S390Opcode {
     RXY_STG     = 0xe324,
     RXY_STHY    = 0xe370,
     RXY_STMG    = 0xeb24,
+    RXY_STPQ    = 0xe38e,
     RXY_STRV    = 0xe33e,
     RXY_STRVG   = 0xe32f,
     RXY_STRVH   = 0xe33f,
@@ -1577,7 +1579,18 @@  typedef struct {
 
 bool tcg_target_has_memory_bswap(MemOp memop)
 {
-    return true;
+    TCGAtomAlign aa;
+
+    if ((memop & MO_SIZE) <= MO_64) {
+        return true;
+    }
+
+    /*
+     * Reject 16-byte memop with 16-byte atomicity,
+     * but do allow a pair of 64-bit operations.
+     */
+    aa = atom_and_align_for_opc(tcg_ctx, memop, MO_ATOM_IFALIGN, true);
+    return aa.atom <= MO_64;
 }
 
 static void tcg_out_qemu_ld_direct(TCGContext *s, MemOp opc, TCGReg data,
@@ -1734,13 +1747,13 @@  static TCGLabelQemuLdst *prepare_host_addr(TCGContext *s, HostAddress *h,
 {
     TCGLabelQemuLdst *ldst = NULL;
     MemOp opc = get_memop(oi);
+    MemOp s_bits = opc & MO_SIZE;
     unsigned a_mask;
 
-    h->aa = atom_and_align_for_opc(s, opc, MO_ATOM_IFALIGN, false);
+    h->aa = atom_and_align_for_opc(s, opc, MO_ATOM_IFALIGN, s_bits == MO_128);
     a_mask = (1 << h->aa.align) - 1;
 
 #ifdef CONFIG_SOFTMMU
-    unsigned s_bits = opc & MO_SIZE;
     unsigned s_mask = (1 << s_bits) - 1;
     int mem_index = get_mmuidx(oi);
     int fast_off = TLB_MASK_TABLE_OFS(mem_index);
@@ -1865,6 +1878,80 @@  static void tcg_out_qemu_st(TCGContext* s, TCGReg data_reg, TCGReg addr_reg,
     }
 }
 
+static void tcg_out_qemu_ldst_i128(TCGContext *s, TCGReg datalo, TCGReg datahi,
+                                   TCGReg addr_reg, MemOpIdx oi, bool is_ld)
+{
+    TCGLabel *l1 = NULL, *l2 = NULL;
+    TCGLabelQemuLdst *ldst;
+    HostAddress h;
+    bool need_bswap;
+    bool use_pair;
+    S390Opcode insn;
+
+    ldst = prepare_host_addr(s, &h, addr_reg, oi, is_ld);
+
+    use_pair = h.aa.atom < MO_128;
+    need_bswap = get_memop(oi) & MO_BSWAP;
+
+    if (!use_pair) {
+        /*
+         * Atomicity requires we use LPQ.  If we've already checked for
+         * 16-byte alignment, that's all we need.  If we arrive with
+         * lesser alignment, we have determined that less than 16-byte
+         * alignment can be satisfied with two 8-byte loads.
+         */
+        if (h.aa.align < MO_128) {
+            use_pair = true;
+            l1 = gen_new_label();
+            l2 = gen_new_label();
+
+            tcg_out_insn(s, RI, TMLL, addr_reg, 15);
+            tgen_branch(s, 7, l1); /* CC in {1,2,3} */
+        }
+
+        tcg_debug_assert(!need_bswap);
+        tcg_debug_assert(datalo & 1);
+        tcg_debug_assert(datahi == datalo - 1);
+        insn = is_ld ? RXY_LPQ : RXY_STPQ;
+        tcg_out_insn_RXY(s, insn, datahi, h.base, h.index, h.disp);
+
+        if (use_pair) {
+            tgen_branch(s, S390_CC_ALWAYS, l2);
+            tcg_out_label(s, l1);
+        }
+    }
+    if (use_pair) {
+        TCGReg d1, d2;
+
+        if (need_bswap) {
+            d1 = datalo, d2 = datahi;
+            insn = is_ld ? RXY_LRVG : RXY_STRVG;
+        } else {
+            d1 = datahi, d2 = datalo;
+            insn = is_ld ? RXY_LG : RXY_STG;
+        }
+
+        if (h.base == d1 || h.index == d1) {
+            tcg_out_insn(s, RXY, LAY, TCG_TMP0, h.base, h.index, h.disp);
+            h.base = TCG_TMP0;
+            h.index = TCG_REG_NONE;
+            h.disp = 0;
+        }
+        tcg_out_insn_RXY(s, insn, d1, h.base, h.index, h.disp);
+        tcg_out_insn_RXY(s, insn, d2, h.base, h.index, h.disp + 8);
+    }
+    if (l2) {
+        tcg_out_label(s, l2);
+    }
+
+    if (ldst) {
+        ldst->type = TCG_TYPE_I128;
+        ldst->datalo_reg = datalo;
+        ldst->datahi_reg = datahi;
+        ldst->raddr = tcg_splitwx_to_rx(s->code_ptr);
+    }
+}
+
 static void tcg_out_exit_tb(TCGContext *s, uintptr_t a0)
 {
     /* Reuse the zeroing that exists for goto_ptr.  */
@@ -2222,6 +2309,12 @@  static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_qemu_st_i64:
         tcg_out_qemu_st(s, args[0], args[1], args[2], TCG_TYPE_I64);
         break;
+    case INDEX_op_qemu_ld_i128:
+        tcg_out_qemu_ldst_i128(s, args[0], args[1], args[2], args[3], true);
+        break;
+    case INDEX_op_qemu_st_i128:
+        tcg_out_qemu_ldst_i128(s, args[0], args[1], args[2], args[3], false);
+        break;
 
     case INDEX_op_ld16s_i64:
         tcg_out_mem(s, 0, RXY_LGH, args[0], args[1], TCG_REG_NONE, args[2]);
@@ -3099,6 +3192,10 @@  static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_qemu_st_i64:
     case INDEX_op_qemu_st_i32:
         return C_O0_I2(r, r);
+    case INDEX_op_qemu_ld_i128:
+        return C_O2_I1(o, m, r);
+    case INDEX_op_qemu_st_i128:
+        return C_O0_I3(o, m, r);
 
     case INDEX_op_deposit_i32:
     case INDEX_op_deposit_i64:

[PULL,52/80] tcg/s390x: Support 128-bit load/store

Commit Message

Patch