Message ID | 20170614140209.29847-4-alex.bennee@linaro.org |
---|---|
State | New |
Headers | show |
Series | Fixes for TCG hangs | expand |
On 06/14/2017 07:02 AM, Alex Bennée wrote: > While the next TB would detect the exit flag has been set there is no > point if we can exit sooner. We also check cpu->interrupt_request as > some front-ends can set it rather than using the cpu_interrupt() API > call and would normally be expecting the IRQ to get picked up on the > previously fairly regular exits from the run loop. > > Signed-off-by: Alex Bennée<alex.bennee@linaro.org> > --- > tcg-runtime.c | 47 +++++++++++++++++++++++++++-------------------- > 1 file changed, 27 insertions(+), 20 deletions(-) I can confirm this fixes the Alpha hang as well. r~
On 14/06/2017 17:45, Richard Henderson wrote: > While the next TB would detect the exit flag has been set there is no > point if we can exit sooner. We also check cpu->interrupt_request as > some front-ends can set it rather than using the cpu_interrupt() API > call and would normally be expecting the IRQ to get picked up on the > previously fairly regular exits from the run loop. This is not what happens actually; it's not about front-ends setting cpu->interrupt_request, it's about front-ends doing exit_tb when they wanted to re-evaluate cpu_handle_interrupt. cpu_exit is used when device code causes a rising edge in cpu->interrupt_request. What we have here is that the MSR write causes cc->cpu_exec_interrupt's return value to change from false to true. I think this is a band-aid, and would rather fix the front-ends as in Emilio's patch. For Alpha my guess would be: It's okay as a last resort I guess, but there's still a lot of time before 2.9. Paolodiff --git a/target/alpha/translate.c b/target/alpha/translate.c index 7c45ae360c..6e2ee3f958 100644 --- a/target/alpha/translate.c +++ b/target/alpha/translate.c @@ -1198,7 +1198,9 @@ static ExitStatus gen_call_pal(DisasContext *ctx, int palcode) tcg_gen_andi_i64(tmp, ctx->ir[IR_A0], PS_INT_MASK); tcg_gen_st8_i64(tmp, cpu_env, offsetof(CPUAlphaState, ps)); tcg_temp_free(tmp); - break; + + /* Reevaluate interrupts */ + return EXIT_PC_STALE; case 0x36: /* RDPS */
On 06/14/2017 09:08 AM, Paolo Bonzini wrote: > I think this is a band-aid, and would rather fix the front-ends as in > Emilio's patch. For Alpha my guess would be: > > diff --git a/target/alpha/translate.c b/target/alpha/translate.c > index 7c45ae360c..6e2ee3f958 100644 > --- a/target/alpha/translate.c > +++ b/target/alpha/translate.c > @@ -1198,7 +1198,9 @@ static ExitStatus gen_call_pal(DisasContext *ctx, int palcode) > tcg_gen_andi_i64(tmp, ctx->ir[IR_A0], PS_INT_MASK); > tcg_gen_st8_i64(tmp, cpu_env, offsetof(CPUAlphaState, ps)); > tcg_temp_free(tmp); > - break; > + > + /* Reevaluate interrupts */ > + return EXIT_PC_STALE; > > case 0x36: > /* RDPS */ Thanks! You're right that adjusting SWPIPL along these lines does fix the problem for Alpha. Given that Alpha would typically hang in arch_idle, I'd been focusing primarily on WTINT. r~
On 14/06/2017 18:51, Richard Henderson wrote: > On 06/14/2017 09:08 AM, Paolo Bonzini wrote: >> I think this is a band-aid, and would rather fix the front-ends as in >> Emilio's patch. For Alpha my guess would be: >> >> diff --git a/target/alpha/translate.c b/target/alpha/translate.c >> index 7c45ae360c..6e2ee3f958 100644 >> --- a/target/alpha/translate.c >> +++ b/target/alpha/translate.c >> @@ -1198,7 +1198,9 @@ static ExitStatus gen_call_pal(DisasContext >> *ctx, int palcode) >> tcg_gen_andi_i64(tmp, ctx->ir[IR_A0], PS_INT_MASK); >> tcg_gen_st8_i64(tmp, cpu_env, offsetof(CPUAlphaState, >> ps)); >> tcg_temp_free(tmp); >> - break; >> + >> + /* Reevaluate interrupts */ >> + return EXIT_PC_STALE; >> case 0x36: >> /* RDPS */ > > Thanks! > > You're right that adjusting SWPIPL along these lines does fix the > problem for Alpha. Given that Alpha would typically hang in arch_idle, > I'd been focusing primarily on WTINT. And MIPS: The others seem okay. Paolodiff --git a/target/mips/translate.c b/target/mips/translate.c index 559f8fed89..244f3cb9ab 100644 --- a/target/mips/translate.c +++ b/target/mips/translate.c @@ -13403,8 +13403,9 @@ static void gen_pool32axf (CPUMIPSState *env, DisasContext *ctx, int rt, int rs) save_cpu_state(ctx, 1); gen_helper_ei(t0, cpu_env); gen_store_gpr(t0, rs); - /* Stop translation as we may have switched the execution mode */ - ctx->bstate = BS_STOP; + /* BS_STOP isn't good enough here, reevaluate cpu_mips_hw_interrupts_enabled. */ + gen_save_pc(ctx->pc + 4); + ctx->bstate = BS_EXCP; tcg_temp_free(t0); } break;
Paolo Bonzini <pbonzini@redhat.com> writes: > On 14/06/2017 17:45, Richard Henderson wrote: >> While the next TB would detect the exit flag has been set there is no >> point if we can exit sooner. We also check cpu->interrupt_request as >> some front-ends can set it rather than using the cpu_interrupt() API >> call and would normally be expecting the IRQ to get picked up on the >> previously fairly regular exits from the run loop. > > This is not what happens actually; it's not about front-ends setting > cpu->interrupt_request, it's about front-ends doing exit_tb when they > wanted to re-evaluate cpu_handle_interrupt. > > cpu_exit is used when device code causes a rising edge in > cpu->interrupt_request. What we have here is that the MSR write causes > cc->cpu_exec_interrupt's return value to change from false to true. > > I think this is a band-aid, and would rather fix the front-ends as in > Emilio's patch. It seems a shame to cause all msr accesses to trigger and exit when we only care about the unmasking case. How about: Author: Alex Bennée <alex.bennee@linaro.org> Date: Wed Jun 14 18:46:01 2017 +0100 target/arm/op_helper: ensure we exit the run-loop When IRQs are un-masked we need to ensure the run-loop is exited so we can evaluate arm_cpu_do_interrupt. Signed-off-by: Alex Bennée <alex.bennee@linaro.org>diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c index 2a85666579..7e67bb3db2 100644 --- a/target/arm/op_helper.c +++ b/target/arm/op_helper.c @@ -835,6 +835,9 @@ void HELPER(msr_i_pstate)(CPUARMState *env, uint32_t op, uint32_t imm) break; case 0x1f: /* DAIFClear */ env->daif &= ~((imm << 6) & PSTATE_DAIF); + /* This may result in pending IRQs being unmasked so ensure we + exit the loop */ + cpu_exit(ENV_GET_CPU(env)); break; default: g_assert_not_reached();
On 06/14/2017 10:49 AM, Alex Bennée wrote: >> I think this is a band-aid, and would rather fix the front-ends as in >> Emilio's patch. > > It seems a shame to cause all msr accesses to trigger and exit when we > only care about the unmasking case. How about: > > > Author: Alex Bennée <alex.bennee@linaro.org> > Date: Wed Jun 14 18:46:01 2017 +0100 > > target/arm/op_helper: ensure we exit the run-loop > > When IRQs are un-masked we need to ensure the run-loop is exited so we > can evaluate arm_cpu_do_interrupt. > > Signed-off-by: Alex Bennée <alex.bennee@linaro.org> > > diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c > index 2a85666579..7e67bb3db2 100644 > --- a/target/arm/op_helper.c > +++ b/target/arm/op_helper.c > @@ -835,6 +835,9 @@ void HELPER(msr_i_pstate)(CPUARMState *env, uint32_t op, uint32_t imm) > break; > case 0x1f: /* DAIFClear */ > env->daif &= ~((imm << 6) & PSTATE_DAIF); > + /* This may result in pending IRQs being unmasked so ensure we > + exit the loop */ > + cpu_exit(ENV_GET_CPU(env)); That works for me. And I guess that takes care of any potential problems with A32 as well? r~
On 06/14/2017 10:08 AM, Paolo Bonzini wrote: > And MIPS: > > diff --git a/target/mips/translate.c b/target/mips/translate.c > index 559f8fed89..244f3cb9ab 100644 > --- a/target/mips/translate.c > +++ b/target/mips/translate.c > @@ -13403,8 +13403,9 @@ static void gen_pool32axf (CPUMIPSState *env, DisasContext *ctx, int rt, int rs) > save_cpu_state(ctx, 1); > gen_helper_ei(t0, cpu_env); > gen_store_gpr(t0, rs); > - /* Stop translation as we may have switched the execution mode */ > - ctx->bstate = BS_STOP; > + /* BS_STOP isn't good enough here, reevaluate cpu_mips_hw_interrupts_enabled. */ > + gen_save_pc(ctx->pc + 4); > + ctx->bstate = BS_EXCP; > tcg_temp_free(t0); > } > break; > > The others seem okay. Thanks for this bit. We also need to fix SSM for s390x. r~
Richard Henderson <rth@twiddle.net> writes: > On 06/14/2017 10:08 AM, Paolo Bonzini wrote: >> And MIPS: >> >> diff --git a/target/mips/translate.c b/target/mips/translate.c >> index 559f8fed89..244f3cb9ab 100644 >> --- a/target/mips/translate.c >> +++ b/target/mips/translate.c >> @@ -13403,8 +13403,9 @@ static void gen_pool32axf (CPUMIPSState *env, DisasContext *ctx, int rt, int rs) >> save_cpu_state(ctx, 1); >> gen_helper_ei(t0, cpu_env); >> gen_store_gpr(t0, rs); >> - /* Stop translation as we may have switched the execution mode */ >> - ctx->bstate = BS_STOP; >> + /* BS_STOP isn't good enough here, reevaluate cpu_mips_hw_interrupts_enabled. */ >> + gen_save_pc(ctx->pc + 4); >> + ctx->bstate = BS_EXCP; >> tcg_temp_free(t0); >> } >> break; >> >> The others seem okay. > > Thanks for this bit. We also need to fix SSM for s390x. If your rolling a series for all these can you also pick up Thomas Huth's fix for --accel? -- Alex Bennée
On 14 June 2017 at 18:49, Alex Bennée <alex.bennee@linaro.org> wrote: > diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c > index 2a85666579..7e67bb3db2 100644 > --- a/target/arm/op_helper.c > +++ b/target/arm/op_helper.c > @@ -835,6 +835,9 @@ void HELPER(msr_i_pstate)(CPUARMState *env, uint32_t op, uint32_t imm) > break; > case 0x1f: /* DAIFClear */ > env->daif &= ~((imm << 6) & PSTATE_DAIF); > + /* This may result in pending IRQs being unmasked so ensure we > + exit the loop */ > + cpu_exit(ENV_GET_CPU(env)); > break; > default: > g_assert_not_reached(); The 'op' field we're switching on here is just a constant from the instruction encoding, so I'd rather see us identify that in translate-a64.c and end the TB or whatever when we need to, rather than doing the longjump-out-of-here that cpu_exit() does at runtime. thanks -- PMM
On 06/14/2017 12:11 PM, Peter Maydell wrote: > On 14 June 2017 at 18:49, Alex Bennée <alex.bennee@linaro.org> wrote: >> diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c >> index 2a85666579..7e67bb3db2 100644 >> --- a/target/arm/op_helper.c >> +++ b/target/arm/op_helper.c >> @@ -835,6 +835,9 @@ void HELPER(msr_i_pstate)(CPUARMState *env, uint32_t op, uint32_t imm) >> break; >> case 0x1f: /* DAIFClear */ >> env->daif &= ~((imm << 6) & PSTATE_DAIF); >> + /* This may result in pending IRQs being unmasked so ensure we >> + exit the loop */ >> + cpu_exit(ENV_GET_CPU(env)); >> break; >> default: >> g_assert_not_reached(); > > The 'op' field we're switching on here is just a constant > from the instruction encoding, so I'd rather see us > identify that in translate-a64.c and end the TB or > whatever when we need to, rather than doing the > longjump-out-of-here that cpu_exit() does at runtime. cpu_exit isn't the longjmp; this is just a set of exit_request and icount_decr. That said, you're right that we can do this more directly from the translator. r~
On 06/14/2017 12:07 PM, Alex Bennée wrote: > > Richard Henderson <rth@twiddle.net> writes: > >> On 06/14/2017 10:08 AM, Paolo Bonzini wrote: >>> And MIPS: >>> >>> diff --git a/target/mips/translate.c b/target/mips/translate.c >>> index 559f8fed89..244f3cb9ab 100644 >>> --- a/target/mips/translate.c >>> +++ b/target/mips/translate.c >>> @@ -13403,8 +13403,9 @@ static void gen_pool32axf (CPUMIPSState *env, DisasContext *ctx, int rt, int rs) >>> save_cpu_state(ctx, 1); >>> gen_helper_ei(t0, cpu_env); >>> gen_store_gpr(t0, rs); >>> - /* Stop translation as we may have switched the execution mode */ >>> - ctx->bstate = BS_STOP; >>> + /* BS_STOP isn't good enough here, reevaluate cpu_mips_hw_interrupts_enabled. */ >>> + gen_save_pc(ctx->pc + 4); >>> + ctx->bstate = BS_EXCP; >>> tcg_temp_free(t0); >>> } >>> break; >>> >>> The others seem okay. >> >> Thanks for this bit. We also need to fix SSM for s390x. > > If your rolling a series for all these can you also pick up Thomas > Huth's fix for --accel? Will do. r~
On Wed, Jun 14, 2017 at 12:43:00 -0700, Richard Henderson wrote: > On 06/14/2017 12:07 PM, Alex Bennée wrote: > >If your rolling a series for all these can you also pick up Thomas > >Huth's fix for --accel? > > Will do. Just a heads up since I see the patch is in your tcg-next branch: Paolo included this patch in a recent pull request ("Misc patches for 2017-06-15"), although the pull has yet to happen. https://lists.nongnu.org/archive/html/qemu-devel/2017-06/msg03682.html E.
diff --git a/tcg-runtime.c b/tcg-runtime.c index f4bfa9cea6..a025a6f194 100644 --- a/tcg-runtime.c +++ b/tcg-runtime.c @@ -147,28 +147,35 @@ uint64_t HELPER(ctpop_i64)(uint64_t arg) void *HELPER(lookup_tb_ptr)(CPUArchState *env, target_ulong addr) { CPUState *cpu = ENV_GET_CPU(env); - unsigned int addr_hash = tb_jmp_cache_hash_func(addr); void *code_ptr = NULL; - TranslationBlock *tb; - - tb = atomic_rcu_read(&cpu->tb_jmp_cache[addr_hash]); - if (likely(tb)) { - target_ulong cs_base, pc; - uint32_t flags; - - cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags); - - if (likely(tb->pc == addr && tb->cs_base == cs_base && - tb->flags == flags)) { - code_ptr = tb->tc_ptr; - } else { - /* If we didn't find it in the jmp_cache we still might - * find it in the global tb_htable - */ - tb = tb_htable_lookup(cpu, addr, cs_base, flags); - if (likely(tb)) { - atomic_set(&cpu->tb_jmp_cache[addr_hash], tb); + + /* If there is an interrupt pending request or the TCG exit flag + * has been set we might as well stop here and return to the main + * loop. + */ + if (!cpu->icount_decr.u16.high && !cpu->interrupt_request) { + unsigned int addr_hash = tb_jmp_cache_hash_func(addr); + TranslationBlock *tb; + + tb = atomic_rcu_read(&cpu->tb_jmp_cache[addr_hash]); + if (likely(tb)) { + target_ulong cs_base, pc; + uint32_t flags; + + cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags); + + if (likely(tb->pc == addr && tb->cs_base == cs_base && + tb->flags == flags)) { code_ptr = tb->tc_ptr; + } else { + /* If we didn't find it in the jmp_cache we still might + * find it in the global tb_htable + */ + tb = tb_htable_lookup(cpu, addr, cs_base, flags); + if (likely(tb)) { + atomic_set(&cpu->tb_jmp_cache[addr_hash], tb); + code_ptr = tb->tc_ptr; + } } } }
While the next TB would detect the exit flag has been set there is no point if we can exit sooner. We also check cpu->interrupt_request as some front-ends can set it rather than using the cpu_interrupt() API call and would normally be expecting the IRQ to get picked up on the previously fairly regular exits from the run loop. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> --- tcg-runtime.c | 47 +++++++++++++++++++++++++++-------------------- 1 file changed, 27 insertions(+), 20 deletions(-) -- 2.13.0