Message ID | fe235a1f65bb6c86d2afcdf52d85f80ae728dcc5.1683722688.git.geert+renesas@glider.be |
---|---|
State | Superseded |
Headers | show |
Series | iopoll: Busy loop and timeout improvements | expand |
* Geert Uytterhoeven <geert+renesas@glider.be> [230510 13:23]: > It is considered good practice to call cpu_relax() in busy loops, see > Documentation/process/volatile-considered-harmful.rst. This can not > only lower CPU power consumption or yield to a hyperthreaded twin > processor, but also allows an architecture to mitigate hardware issues > (e.g. ARM Erratum 754327 for Cortex-A9 prior to r2p0) in the > architecture-specific cpu_relax() implementation. > > In addition, cpu_relax() is also a compiler barrier. It is not > immediately obvious that the @op argument "function" will result in an > actual function call (e.g. in case of inlining). > > Where a function call is a C sequence point, this is lost on inlining. > Therefore, with agressive enough optimization it might be possible for > the compiler to hoist the: > > (val) = op(args); > > "load" out of the loop because it doesn't see the value changing. The > addition of cpu_relax() would inhibit this. > > As the iopoll helpers lack calls to cpu_relax(), people are sometimes > reluctant to use them, and may fall back to open-coded polling loops > (including cpu_relax() calls) instead. > > Fix this by adding calls to cpu_relax() to the iopoll helpers: > - For the non-atomic case, it is sufficient to call cpu_relax() in > case of a zero sleep-between-reads value, as a call to > usleep_range() is a safe barrier otherwise. However, it doesn't > hurt to add the call regardless, for simplicity, and for similarity > with the atomic case below. > - For the atomic case, cpu_relax() must be called regardless of the > sleep-between-reads value, as there is no guarantee all > architecture-specific implementations of udelay() handle this. Reviewed-by: Tony Lindgren <tony@atomide.com>
On Wed, 10 May 2023 at 15:23, Geert Uytterhoeven <geert+renesas@glider.be> wrote: > > It is considered good practice to call cpu_relax() in busy loops, see > Documentation/process/volatile-considered-harmful.rst. This can not > only lower CPU power consumption or yield to a hyperthreaded twin > processor, but also allows an architecture to mitigate hardware issues > (e.g. ARM Erratum 754327 for Cortex-A9 prior to r2p0) in the > architecture-specific cpu_relax() implementation. > > In addition, cpu_relax() is also a compiler barrier. It is not > immediately obvious that the @op argument "function" will result in an > actual function call (e.g. in case of inlining). > > Where a function call is a C sequence point, this is lost on inlining. > Therefore, with agressive enough optimization it might be possible for > the compiler to hoist the: > > (val) = op(args); > > "load" out of the loop because it doesn't see the value changing. The > addition of cpu_relax() would inhibit this. > > As the iopoll helpers lack calls to cpu_relax(), people are sometimes > reluctant to use them, and may fall back to open-coded polling loops > (including cpu_relax() calls) instead. > > Fix this by adding calls to cpu_relax() to the iopoll helpers: > - For the non-atomic case, it is sufficient to call cpu_relax() in > case of a zero sleep-between-reads value, as a call to > usleep_range() is a safe barrier otherwise. However, it doesn't > hurt to add the call regardless, for simplicity, and for similarity > with the atomic case below. > - For the atomic case, cpu_relax() must be called regardless of the > sleep-between-reads value, as there is no guarantee all > architecture-specific implementations of udelay() handle this. > > Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> > Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> > Acked-by: Arnd Bergmann <arnd@arndb.de> Makes sense to me! Feel free to add: Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Kind regards Uffe > --- > v2: > - Add Acked-by, > - Add compiler barrier and inlining explanation (thanks, Peter!). > --- > include/linux/iopoll.h | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/include/linux/iopoll.h b/include/linux/iopoll.h > index 2c8860e406bd8cae..0417360a6db9b0d6 100644 > --- a/include/linux/iopoll.h > +++ b/include/linux/iopoll.h > @@ -53,6 +53,7 @@ > } \ > if (__sleep_us) \ > usleep_range((__sleep_us >> 2) + 1, __sleep_us); \ > + cpu_relax(); \ > } \ > (cond) ? 0 : -ETIMEDOUT; \ > }) > @@ -95,6 +96,7 @@ > } \ > if (__delay_us) \ > udelay(__delay_us); \ > + cpu_relax(); \ > } \ > (cond) ? 0 : -ETIMEDOUT; \ > }) > -- > 2.34.1 >
diff --git a/include/linux/iopoll.h b/include/linux/iopoll.h index 2c8860e406bd8cae..0417360a6db9b0d6 100644 --- a/include/linux/iopoll.h +++ b/include/linux/iopoll.h @@ -53,6 +53,7 @@ } \ if (__sleep_us) \ usleep_range((__sleep_us >> 2) + 1, __sleep_us); \ + cpu_relax(); \ } \ (cond) ? 0 : -ETIMEDOUT; \ }) @@ -95,6 +96,7 @@ } \ if (__delay_us) \ udelay(__delay_us); \ + cpu_relax(); \ } \ (cond) ? 0 : -ETIMEDOUT; \ })