Message ID | 20170403124524.10824-1-alex.bennee@linaro.org |
---|---|
Headers | show |
Series | MTTCG and record/replay fixes for rc3 | expand |
On 03/04/2017 14:45, Alex Bennée wrote: > cpus: check cpu->running in cpu_get_icount_raw() > > I'm not sure the race happens and once outside of cpu->running the > icount counters should be zero. However it seems a sensible > precaution. Yeah, I think this is unnecessary with patch 7's new assertions. > I think the cpus: patches should probably go into the next > pull-request while we see if we can come up with a better final > solution for fixing record/replay. However given how long this > regression has run during the release candidate process I wanted to > update everyone on the current status and get feedback ASAP. I agree. I'm not sure exactly how the final race happens, but if it causes divergence it would be caught later by the record/replay mechanism, I think. Paolo
Paolo Bonzini <pbonzini@redhat.com> writes: > On 03/04/2017 14:45, Alex Bennée wrote: >> cpus: check cpu->running in cpu_get_icount_raw() >> >> I'm not sure the race happens and once outside of cpu->running the >> icount counters should be zero. However it seems a sensible >> precaution. > > Yeah, I think this is unnecessary with patch 7's new assertions. I can drop the patch. >> I think the cpus: patches should probably go into the next >> pull-request while we see if we can come up with a better final >> solution for fixing record/replay. However given how long this >> regression has run during the release candidate process I wanted to >> update everyone on the current status and get feedback ASAP. > > I agree. I'm not sure exactly how the final race happens, but if it > causes divergence it would be caught later by the record/replay > mechanism, I think. It's odd because everything should be sequenced by the BQL. The main-loop holds the BQL while writing out checkpoints and everything that can trigger output to the replay stream should be under BQL as well: - VIRTUAL timers in the outer loop - MMIO triggered events (block, char, audio) - Interrupt processing In fact I wonder if replay_mutex could just be dropped and the BQL used to protect all of this stuff. I'll have to experiment with some asserts to see if this is every not the case. -- Alex Bennée