diff mbox

Fix PR rtl-optimization/78437

Message ID 1482247.PF6A0dtnAY@polaris
State New
Headers show

Commit Message

Eric Botcazou Nov. 23, 2016, 9:35 a.m. UTC
Hi,

this is a wrong code regression at -O2 on the mainline for Alpha coming from 
the REE pass (Alpha is one of the 3 architectures enabling REE at -O2 but I'm 
probably going to enable it for 64-bit SPARC too).  The problem arises when a 
copy is needed in combine_reaching_defs:

  /* If the destination operand of the extension is a different
     register than the source operand, then additional restrictions
     are needed.  Note we have to handle cases where we have nested
     extensions in the source operand.  */
  bool copy_needed
    = (REGNO (SET_DEST (PATTERN (cand->insn)))
       != REGNO (get_extended_src_reg (SET_SRC (PATTERN (cand->insn)))));
  if (copy_needed)
    {
      /* Considering transformation of
	 (set (reg1) (expression))
	 ...
	 (set (reg2) (any_extend (reg1)))

	 into

	 (set (reg2) (any_extend (expression)))
	 (set (reg1) (reg2))

Note that the mode of reg1 is changed by the transformation, e.g. from QImode 
to DImode, so the transformation can change the upper bits of reg1.  But there 
is no general check that this will not affect the other reaching uses of reg1:

	 (set (reg1:QI) (expression:QI))
	 ...
	 (set (reg2:DI) (any_extend:DI (reg1:QI)))
         ...
         (use (reg1:DI))

I initially thought that this would only matter for WORD_REGISTER_OPERATIONS 
targets like Alpha, where the first set in QImode can implicitly set the whole 
DImode register so the use reads well-defined upper bits, but I now wonder 
whether this also matters for !WORD_REGISTER_OPERATIONS targets, e.g. x86:

         (set (reg1:DI) ...
         ...
	 (set (reg1:QI) (expression:QI))
	 ...
	 (set (reg2:DI) (any_extend:DI (reg1:QI)))
         ...
         (use (reg1:DI))

where the use reads well-defined upper bits from the very first set.

Hence the 2 attached versions of the fix: pr78437-1.diff only adds the check 
for WORD_REGISTER_OPERATIONS targets and was confirmed by Uros as fixing the 
regression of gcc.dg/atomic/stdatomic-compare-exchange-[1,2].c on Alpha, while 
pr78437-2.diff adds the check for all targets and was also tested on x86-64.

Thoughts?


	PR rtl-optimization/78437
	* ree.c (get_uses): New function.
	(combine_reaching_defs): When a copy is needed, return false if any
	reaching use of the source register reads it in a mode larger than
	the mode it is set in[ and if WORD_REGISTER_OPERATIONS is true].

-- 
Eric Botcazou

Comments

Paolo Bonzini Nov. 23, 2016, 10:18 a.m. UTC | #1
On 23/11/2016 10:35, Eric Botcazou wrote:
> I now wonder 

> whether this also matters for !WORD_REGISTER_OPERATIONS targets, e.g. x86:

> 

>          (set (reg1:DI) ...

>          ...

> 	 (set (reg1:QI) (expression:QI))

> 	 ...

> 	 (set (reg2:DI) (any_extend:DI (reg1:QI)))

>          ...

>          (use (reg1:DI))

> 

> where the use reads well-defined upper bits from the very first set.


Does it really do that with a (set (reg1:QI)), as opposed to a
(set (strict_low_part (subreg:QI (reg1:DI)))?

Paolo
Eric Botcazou Nov. 23, 2016, 10:26 a.m. UTC | #2
> Does it really do that with a (set (reg1:QI)), as opposed to a

> (set (strict_low_part (subreg:QI (reg1:DI)))?


That's the question (note that REE runs after register allocation).

-- 
Eric Botcazou
Paolo Bonzini Nov. 23, 2016, 10:32 a.m. UTC | #3
On 23/11/2016 11:26, Eric Botcazou wrote:
>> Does it really do that with a (set (reg1:QI)), as opposed to a

>> (set (strict_low_part (subreg:QI (reg1:DI)))?

> 

> That's the question (note that REE runs after register allocation).


IIRC, strict_low_part is required even after register allocation, and
patterns such as this one in config/i386/i386.md seem to confirm it:

(define_split
  [(set (match_operand:SWI248 0 "any_QIreg_operand")
        (and:SWI248 (match_dup 0)
                    (const_int -256)))
   (clobber (reg:CC FLAGS_REG))]
  "(!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun))
   && reload_completed"
  [(set (strict_low_part (match_dup 1)) (const_int 0))]
  "operands[1] = gen_lowpart (QImode, operands[0]);")

(Note the "&& reload_completed").

Thanks,

Paolo
Bernd Schmidt Nov. 23, 2016, 12:26 p.m. UTC | #4
On 11/23/2016 11:18 AM, Paolo Bonzini wrote:
>

>

> On 23/11/2016 10:35, Eric Botcazou wrote:

>> I now wonder

>> whether this also matters for !WORD_REGISTER_OPERATIONS targets, e.g. x86:

>>

>>          (set (reg1:DI) ...

>>          ...

>> 	 (set (reg1:QI) (expression:QI))

>> 	 ...

>> 	 (set (reg2:DI) (any_extend:DI (reg1:QI)))

>>          ...

>>          (use (reg1:DI))

>>

>> where the use reads well-defined upper bits from the very first set.

>

> Does it really do that with a (set (reg1:QI)), as opposed to a

> (set (strict_low_part (subreg:QI (reg1:DI)))?


It would need strict_low_part unless we're dealing with subwords. I 
think the patch should maybe check for that in the !W_R_O case.

Does WORD_REGISTER_OPERATIONS really buy much on targets that use it? 
This sort of situation seems very surprising and unfortunate.


Bernd
Jeff Law Nov. 23, 2016, 6:02 p.m. UTC | #5
On 11/23/2016 05:26 AM, Bernd Schmidt wrote:
> On 11/23/2016 11:18 AM, Paolo Bonzini wrote:

>>

>>

>> On 23/11/2016 10:35, Eric Botcazou wrote:

>>> I now wonder

>>> whether this also matters for !WORD_REGISTER_OPERATIONS targets, e.g.

>>> x86:

>>>

>>>          (set (reg1:DI) ...

>>>          ...

>>>      (set (reg1:QI) (expression:QI))

>>>      ...

>>>      (set (reg2:DI) (any_extend:DI (reg1:QI)))

>>>          ...

>>>          (use (reg1:DI))

>>>

>>> where the use reads well-defined upper bits from the very first set.

>>

>> Does it really do that with a (set (reg1:QI)), as opposed to a

>> (set (strict_low_part (subreg:QI (reg1:DI)))?

>

> It would need strict_low_part unless we're dealing with subwords. I

> think the patch should maybe check for that in the !W_R_O case.

>

> Does WORD_REGISTER_OPERATIONS really buy much on targets that use it?

> This sort of situation seems very surprising and unfortunate.

It certainly was helpful on the PA back in the day.  Less sure on modern 
systems.
jeff
Jeff Law Nov. 23, 2016, 6:56 p.m. UTC | #6
On 11/23/2016 03:32 AM, Paolo Bonzini wrote:
>

>

> On 23/11/2016 11:26, Eric Botcazou wrote:

>>> Does it really do that with a (set (reg1:QI)), as opposed to a

>>> (set (strict_low_part (subreg:QI (reg1:DI)))?

>>

>> That's the question (note that REE runs after register allocation).

>

> IIRC, strict_low_part is required even after register allocation, and

> patterns such as this one in config/i386/i386.md seem to confirm it:

Right.  The strict_low_part is required after register allocation. 
Without the strict_low_part, the contents of the bits outside the mode 
of the strict_low_part are undefined.

Jeff
Eric Botcazou Nov. 23, 2016, 11:19 p.m. UTC | #7
> It would need strict_low_part unless we're dealing with subwords. I

> think the patch should maybe check for that in the !W_R_O case.


The code already does the check, it simply won't mess with strict_low_part.

> Does WORD_REGISTER_OPERATIONS really buy much on targets that use it?


Yes, it makes it possible for the combiner to eliminate 2 redundant extensions 
on Alpha for the reduced testcase provided by Uros (that's why there is the 
mode mismatch).  And it's even more helpful if the loads are not explicitly 
extended, like on SPARC (in combination with LOAD_EXTEND_OP of course).

-- 
Eric Botcazou
diff mbox

Patch

Index: ree.c
===================================================================
--- ree.c	(revision 242632)
+++ ree.c	(working copy)
@@ -499,6 +499,35 @@  get_defs (rtx_insn *insn, rtx reg, vec<r
   return ref_chain;
 }
 
+/* Get all the reaching uses of an instruction.  The uses are desired for REG
+   set in INSN.  Return use list or NULL if a use is missing or irregular.  */
+
+static struct df_link *
+get_uses (rtx_insn *insn, rtx reg)
+{
+  df_ref def;
+  struct df_link *ref_chain, *ref_link;
+
+  FOR_EACH_INSN_DEF (def, insn)
+    if (REGNO (DF_REF_REG (def)) == REGNO (reg))
+      break;
+
+  gcc_assert (def != NULL);
+
+  ref_chain = DF_REF_CHAIN (def);
+
+  for (ref_link = ref_chain; ref_link; ref_link = ref_link->next)
+    {
+      /* Problem getting some use for this instruction.  */
+      if (ref_link->ref == NULL)
+        return NULL;
+      if (DF_REF_CLASS (ref_link->ref) != DF_REF_REGULAR)
+	return NULL;
+    }
+
+  return ref_chain;
+}
+
 /* Return true if INSN is
      (SET (reg REGNO (def_reg)) (if_then_else (cond) (REG x1) (REG x2)))
    and store x1 and x2 in REG_1 and REG_2.  */
@@ -827,6 +856,24 @@  combine_reaching_defs (ext_cand *cand, c
       if (reg_overlap_mentioned_p (tmp_reg, SET_DEST (PATTERN (cand->insn))))
 	return false;
 
+      /* We must make sure that changing the mode of SRC_REG as destination
+	 register will not affect its reaching uses, which may read its value
+	 in a larger mode because either 1) DEF_INSN implicitly sets it in
+	 word mode for WORD_REGISTER_OPERATIONS targets or 2) DEF_INSN does
+	 not modify its upper bits for !WORD_REGISTER_OPERATIONS targets.  */
+      const unsigned int prec
+	= GET_MODE_PRECISION (GET_MODE (SET_DEST (*dest_sub_rtx)));
+      if (prec < BITS_PER_WORD)
+	{
+	  struct df_link *uses = get_uses (def_insn, src_reg);
+	  if (!uses)
+	    return false;
+
+	  for (df_link *use = uses; use; use = use->next)
+	    if (GET_MODE_PRECISION (GET_MODE (*DF_REF_LOC (use->ref))) > prec)
+	      return false;
+	}
+
       /* The destination register of the extension insn must not be
 	 used or set between the def_insn and cand->insn exclusive.  */
       if (reg_used_between_p (SET_DEST (PATTERN (cand->insn)),