Message ID | 20231122111415.815147-2-maxim.kuvyrkov@linaro.org |
---|---|
State | Accepted |
Commit | 0c42d1782e48d8ad578ace2065cce9b3615f97c0 |
Headers | show |
Series | [v3,1/8] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior | expand |
Hi Vladimir, Hi Jeff, Richard and Alexander have reviewed this patch and [I assume] have no further comments. OK to merge? On Wed, 22 Nov 2023 at 15:14, Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org> wrote: > This patch avoids sched-deps.cc:find_inc() creating exponential number > of dependencies, which become memory and compilation time hogs. > Consider example (simplified from PR96388) ... > === > sp=sp-4 // sp_insnA > mem_insnA1[sp+A1] > ... > mem_insnAN[sp+AN] > sp=sp-4 // sp_insnB > mem_insnB1[sp+B1] > ... > mem_insnBM[sp+BM] > === > > [For simplicity, let's assume find_inc(backwards==true)]. > In this example find_modifiable_mems() will arrange for mem_insnA* > to be able to pass sp_insnA, and, while doing this, will create > dependencies between all mem_insnA*s and sp_insnB -- because sp_insnB > is a consumer of sp_insnA. After this sp_insnB will have N new > backward dependencies. > Then find_modifiable_mems() gets to mem_insnB*s and starts to create > N new dependencies for _every_ mem_insnB*. This gets us N*M new > dependencies. > > In PR96833's testcase N and M are 10k-15k, which causes RAM usage of > 30GB and compilation time of 30 minutes, with sched2 accounting for > 95% of both metrics. After this patch the RAM usage is down to 1GB > and compilation time is down to 3-4 minutes, with sched2 no longer > standing out on -ftime-report or memory usage. > > gcc/ChangeLog: > > PR rtl-optimization/96388 > PR rtl-optimization/111554 > * sched-deps.cc (find_inc): Avoid exponential behavior. > --- > gcc/sched-deps.cc | 48 +++++++++++++++++++++++++++++++++++++++++++---- > 1 file changed, 44 insertions(+), 4 deletions(-) > > diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc > index c23218890f3..005fc0f567e 100644 > --- a/gcc/sched-deps.cc > +++ b/gcc/sched-deps.cc > @@ -4779,24 +4779,59 @@ parse_add_or_inc (struct mem_inc_info *mii, > rtx_insn *insn, bool before_mem) > /* Once a suitable mem reference has been found and the corresponding data > in MII has been filled in, this function is called to find a suitable > add or inc insn involving the register we found in the memory > - reference. */ > + reference. > + If successful, this function will create additional dependencies > between > + - mii->inc_insn's producers and mii->mem_insn as a consumer (if > backwards) > + - mii->inc_insn's consumers and mii->mem_insn as a producer (if > !backwards). > +*/ > > static bool > find_inc (struct mem_inc_info *mii, bool backwards) > { > sd_iterator_def sd_it; > dep_t dep; > + sd_list_types_def mem_deps = backwards ? SD_LIST_HARD_BACK : > SD_LIST_FORW; > + int n_mem_deps = sd_lists_size (mii->mem_insn, mem_deps); > > - sd_it = sd_iterator_start (mii->mem_insn, > - backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW); > + sd_it = sd_iterator_start (mii->mem_insn, mem_deps); > while (sd_iterator_cond (&sd_it, &dep)) > { > dep_node_t node = DEP_LINK_NODE (*sd_it.linkp); > rtx_insn *pro = DEP_PRO (dep); > rtx_insn *con = DEP_CON (dep); > - rtx_insn *inc_cand = backwards ? pro : con; > + rtx_insn *inc_cand; > + int n_inc_deps; > + > if (DEP_NONREG (dep) || DEP_MULTIPLE (dep)) > goto next; > + > + if (backwards) > + { > + inc_cand = pro; > + n_inc_deps = sd_lists_size (inc_cand, SD_LIST_BACK); > + } > + else > + { > + inc_cand = con; > + n_inc_deps = sd_lists_size (inc_cand, SD_LIST_FORW); > + } > + > + /* In the FOR_EACH_DEP loop below we will create additional > n_inc_deps > + for mem_insn. This by itself is not a problem, since each > mem_insn > + will have only a few inc_insns associated with it. However, if > + we consider that a single inc_insn may have a lot of mem_insns, > AND, > + on top of that, a few other inc_insns associated with it -- > + those _other inc_insns_ will get (n_mem_deps * number of MEM > insns) > + dependencies created for them. This may cause an exponential > + growth of memory usage and scheduling time. > + See PR96388 for details. > + We [heuristically] use n_inc_deps as a proxy for the number of MEM > + insns, and drop opportunities for breaking modifiable_mem > dependencies > + when dependency lists grow beyond reasonable size. */ > + if (n_mem_deps * n_inc_deps > + >= param_max_pending_list_length * param_max_pending_list_length) > + goto next; > + > if (parse_add_or_inc (mii, inc_cand, backwards)) > { > struct dep_replacement *desc; > @@ -4838,6 +4873,11 @@ find_inc (struct mem_inc_info *mii, bool backwards) > desc->insn = mii->mem_insn; > move_dep_link (DEP_NODE_BACK (node), INSN_HARD_BACK_DEPS (con), > INSN_SPEC_BACK_DEPS (con)); > + > + /* Make sure that n_inc_deps above is consistent with > dependencies > + we create. */ > + gcc_assert (mii->inc_insn == inc_cand); > + > if (backwards) > { > FOR_EACH_DEP (mii->inc_insn, SD_LIST_BACK, sd_it, dep) > -- > 2.34.1 > >
On 1/15/24 07:56, Maxim Kuvyrkov wrote: > Hi Vladimir, > Hi Jeff, > > Richard and Alexander have reviewed this patch and [I assume] have no > further comments. OK to merge? > > I trust Richard and Alexander therefore I did not do additional review of the patches and have no any comment. Richard's or Alexander's approval is enough for comitting the patches.
On 1/15/24 05:56, Maxim Kuvyrkov wrote: > Hi Vladimir, > Hi Jeff, > > Richard and Alexander have reviewed this patch and [I assume] have no > further comments. OK to merge? I think the question is whether or not we're too late. I know that Richard S has held off on his late-combine pass and I'm holding off on the ext-dce work due to the fact that we're well past stage1 close. I think the release managers ought to have the final say on this. jeff
On Tue, Jan 16, 2024 at 3:52 PM Jeff Law <jeffreyalaw@gmail.com> wrote: > > > > On 1/15/24 05:56, Maxim Kuvyrkov wrote: > > Hi Vladimir, > > Hi Jeff, > > > > Richard and Alexander have reviewed this patch and [I assume] have no > > further comments. OK to merge? > I think the question is whether or not we're too late. I know that > Richard S has held off on his late-combine pass and I'm holding off on > the ext-dce work due to the fact that we're well past stage1 close. > > I think the release managers ought to have the final say on this. I'm fine with this now, it doesn't change code generation. Richard. > > jeff
> On Jan 17, 2024, at 10:51, Richard Biener <richard.guenther@gmail.com> wrote: > > On Tue, Jan 16, 2024 at 3:52 PM Jeff Law <jeffreyalaw@gmail.com> wrote: >> >> >> >> On 1/15/24 05:56, Maxim Kuvyrkov wrote: >>> Hi Vladimir, >>> Hi Jeff, >>> >>> Richard and Alexander have reviewed this patch and [I assume] have no >>> further comments. OK to merge? >> I think the question is whether or not we're too late. I know that >> Richard S has held off on his late-combine pass and I'm holding off on >> the ext-dce work due to the fact that we're well past stage1 close. >> >> I think the release managers ought to have the final say on this. > > I'm fine with this now, it doesn't change code generation. Thanks, Richard. I'll merge the fix for PR96388 and PR111554 -- patch 1/8. I'll commit cleanups and improvements to scheduler logging -- patches 2/8 - 8/8 -- when stage1 opens. Regards, -- Maxim Kuvyrkov https://www.linaro.org
On Wed, Jan 17, 2024 at 8:39 AM Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org> wrote: > > > On Jan 17, 2024, at 10:51, Richard Biener <richard.guenther@gmail.com> wrote: > > > > On Tue, Jan 16, 2024 at 3:52 PM Jeff Law <jeffreyalaw@gmail.com> wrote: > >> > >> > >> > >> On 1/15/24 05:56, Maxim Kuvyrkov wrote: > >>> Hi Vladimir, > >>> Hi Jeff, > >>> > >>> Richard and Alexander have reviewed this patch and [I assume] have no > >>> further comments. OK to merge? > >> I think the question is whether or not we're too late. I know that > >> Richard S has held off on his late-combine pass and I'm holding off on > >> the ext-dce work due to the fact that we're well past stage1 close. > >> > >> I think the release managers ought to have the final say on this. > > > > I'm fine with this now, it doesn't change code generation. > > Thanks, Richard. > > I'll merge the fix for PR96388 and PR111554 -- patch 1/8. I'll commit cleanups and improvements to scheduler logging -- patches 2/8 - 8/8 -- when stage1 opens. This seems to have caused a compare-debug bootstrap issue on x86_64-linux, gcc/fortran/f95-lang.o differs does n_mem_deps or n_inc_deps include debug insns? Richard. > Regards, > > -- > Maxim Kuvyrkov > https://www.linaro.org >
> On Jan 17, 2024, at 19:02, Richard Biener <richard.guenther@gmail.com> wrote: > > On Wed, Jan 17, 2024 at 8:39 AM Maxim Kuvyrkov > <maxim.kuvyrkov@linaro.org> wrote: >> >>> On Jan 17, 2024, at 10:51, Richard Biener <richard.guenther@gmail.com> wrote: >>> >>> On Tue, Jan 16, 2024 at 3:52 PM Jeff Law <jeffreyalaw@gmail.com> wrote: >>>> >>>> >>>> >>>> On 1/15/24 05:56, Maxim Kuvyrkov wrote: >>>>> Hi Vladimir, >>>>> Hi Jeff, >>>>> >>>>> Richard and Alexander have reviewed this patch and [I assume] have no >>>>> further comments. OK to merge? >>>> I think the question is whether or not we're too late. I know that >>>> Richard S has held off on his late-combine pass and I'm holding off on >>>> the ext-dce work due to the fact that we're well past stage1 close. >>>> >>>> I think the release managers ought to have the final say on this. >>> >>> I'm fine with this now, it doesn't change code generation. >> >> Thanks, Richard. >> >> I'll merge the fix for PR96388 and PR111554 -- patch 1/8. I'll commit cleanups and improvements to scheduler logging -- patches 2/8 - 8/8 -- when stage1 opens. > > This seems to have caused a compare-debug bootstrap issue on x86_64-linux, > > gcc/fortran/f95-lang.o differs > > does n_mem_deps or n_inc_deps include debug insns? Thanks, investigating. -- Maxim Kuvyrkov https://www.linaro.org
> On Jan 17, 2024, at 19:05, Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org> wrote: > >> On Jan 17, 2024, at 19:02, Richard Biener <richard.guenther@gmail.com> wrote: >> >> On Wed, Jan 17, 2024 at 8:39 AM Maxim Kuvyrkov >> <maxim.kuvyrkov@linaro.org> wrote: >>> >>>> On Jan 17, 2024, at 10:51, Richard Biener <richard.guenther@gmail.com> wrote: >>>> >>>> On Tue, Jan 16, 2024 at 3:52 PM Jeff Law <jeffreyalaw@gmail.com> wrote: >>>>> >>>>> >>>>> >>>>> On 1/15/24 05:56, Maxim Kuvyrkov wrote: >>>>>> Hi Vladimir, >>>>>> Hi Jeff, >>>>>> >>>>>> Richard and Alexander have reviewed this patch and [I assume] have no >>>>>> further comments. OK to merge? >>>>> I think the question is whether or not we're too late. I know that >>>>> Richard S has held off on his late-combine pass and I'm holding off on >>>>> the ext-dce work due to the fact that we're well past stage1 close. >>>>> >>>>> I think the release managers ought to have the final say on this. >>>> >>>> I'm fine with this now, it doesn't change code generation. >>> >>> Thanks, Richard. >>> >>> I'll merge the fix for PR96388 and PR111554 -- patch 1/8. I'll commit cleanups and improvements to scheduler logging -- patches 2/8 - 8/8 -- when stage1 opens. >> >> This seems to have caused a compare-debug bootstrap issue on x86_64-linux, >> >> gcc/fortran/f95-lang.o differs >> >> does n_mem_deps or n_inc_deps include debug insns? > > Thanks, investigating. Hi Richard, Yes, both n_mem_deps or n_inc_deps include debug insns, I posted a patch for this in https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643267.html . Testing it now. If you prefer, I can revert the fix for PR96388 and PR111554. Kind regards, -- Maxim Kuvyrkov https://www.linaro.org
On Wed, Jan 17, 2024 at 7:02 AM Richard Biener <richard.guenther@gmail.com> wrote: > > On Wed, Jan 17, 2024 at 8:39 AM Maxim Kuvyrkov > <maxim.kuvyrkov@linaro.org> wrote: > > > > > On Jan 17, 2024, at 10:51, Richard Biener <richard.guenther@gmail.com> wrote: > > > > > > On Tue, Jan 16, 2024 at 3:52 PM Jeff Law <jeffreyalaw@gmail.com> wrote: > > >> > > >> > > >> > > >> On 1/15/24 05:56, Maxim Kuvyrkov wrote: > > >>> Hi Vladimir, > > >>> Hi Jeff, > > >>> > > >>> Richard and Alexander have reviewed this patch and [I assume] have no > > >>> further comments. OK to merge? > > >> I think the question is whether or not we're too late. I know that > > >> Richard S has held off on his late-combine pass and I'm holding off on > > >> the ext-dce work due to the fact that we're well past stage1 close. > > >> > > >> I think the release managers ought to have the final say on this. > > > > > > I'm fine with this now, it doesn't change code generation. > > > > Thanks, Richard. > > > > I'll merge the fix for PR96388 and PR111554 -- patch 1/8. I'll commit cleanups and improvements to scheduler logging -- patches 2/8 - 8/8 -- when stage1 opens. > > This seems to have caused a compare-debug bootstrap issue on x86_64-linux, > > gcc/fortran/f95-lang.o differs > > does n_mem_deps or n_inc_deps include debug insns? > > Richard. FWIW, I opened: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113456
diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc index c23218890f3..005fc0f567e 100644 --- a/gcc/sched-deps.cc +++ b/gcc/sched-deps.cc @@ -4779,24 +4779,59 @@ parse_add_or_inc (struct mem_inc_info *mii, rtx_insn *insn, bool before_mem) /* Once a suitable mem reference has been found and the corresponding data in MII has been filled in, this function is called to find a suitable add or inc insn involving the register we found in the memory - reference. */ + reference. + If successful, this function will create additional dependencies between + - mii->inc_insn's producers and mii->mem_insn as a consumer (if backwards) + - mii->inc_insn's consumers and mii->mem_insn as a producer (if !backwards). +*/ static bool find_inc (struct mem_inc_info *mii, bool backwards) { sd_iterator_def sd_it; dep_t dep; + sd_list_types_def mem_deps = backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW; + int n_mem_deps = sd_lists_size (mii->mem_insn, mem_deps); - sd_it = sd_iterator_start (mii->mem_insn, - backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW); + sd_it = sd_iterator_start (mii->mem_insn, mem_deps); while (sd_iterator_cond (&sd_it, &dep)) { dep_node_t node = DEP_LINK_NODE (*sd_it.linkp); rtx_insn *pro = DEP_PRO (dep); rtx_insn *con = DEP_CON (dep); - rtx_insn *inc_cand = backwards ? pro : con; + rtx_insn *inc_cand; + int n_inc_deps; + if (DEP_NONREG (dep) || DEP_MULTIPLE (dep)) goto next; + + if (backwards) + { + inc_cand = pro; + n_inc_deps = sd_lists_size (inc_cand, SD_LIST_BACK); + } + else + { + inc_cand = con; + n_inc_deps = sd_lists_size (inc_cand, SD_LIST_FORW); + } + + /* In the FOR_EACH_DEP loop below we will create additional n_inc_deps + for mem_insn. This by itself is not a problem, since each mem_insn + will have only a few inc_insns associated with it. However, if + we consider that a single inc_insn may have a lot of mem_insns, AND, + on top of that, a few other inc_insns associated with it -- + those _other inc_insns_ will get (n_mem_deps * number of MEM insns) + dependencies created for them. This may cause an exponential + growth of memory usage and scheduling time. + See PR96388 for details. + We [heuristically] use n_inc_deps as a proxy for the number of MEM + insns, and drop opportunities for breaking modifiable_mem dependencies + when dependency lists grow beyond reasonable size. */ + if (n_mem_deps * n_inc_deps + >= param_max_pending_list_length * param_max_pending_list_length) + goto next; + if (parse_add_or_inc (mii, inc_cand, backwards)) { struct dep_replacement *desc; @@ -4838,6 +4873,11 @@ find_inc (struct mem_inc_info *mii, bool backwards) desc->insn = mii->mem_insn; move_dep_link (DEP_NODE_BACK (node), INSN_HARD_BACK_DEPS (con), INSN_SPEC_BACK_DEPS (con)); + + /* Make sure that n_inc_deps above is consistent with dependencies + we create. */ + gcc_assert (mii->inc_insn == inc_cand); + if (backwards) { FOR_EACH_DEP (mii->inc_insn, SD_LIST_BACK, sd_it, dep)