Message ID | c7db7e804c453629c116d508558eaf46477a2d73.1731708405.git.len.brown@intel.com |
---|---|
State | New |
Headers | show |
Series | [v2] ACPI: Replace msleep() with usleep_range() in acpi_os_sleep(). | expand |
On Sat, Nov 16, 2024 at 12:11 AM Len Brown <lenb@kernel.org> wrote: > > From: Len Brown <len.brown@intel.com> > > Replace msleep() with usleep_range() in acpi_os_sleep(). > > This has a significant user-visible performance benefit > on some ACPI flows on some systems. eg. Kernel resume > time of a Dell XPS-13-9300 drops from 1943ms to 1127ms (42%). Sure. And the argument seems to be that it is better to always use more resources in a given path (ACPI sleep in this particular case) than to be somewhat inaccurate which is visible in some cases. This would mean that hrtimers should always be used everywhere, but they aren't. While I have nothing against addressing the short sleeps issue where the msleep() inaccuracy is too large, I don't see why this requires using a hrtimer with no slack in all cases. The argument seems to be that the short sleeps case is hard to distinguish from the other cases, but I'm not sure about this. Also, something like this might work, but for some reason you don't want to do it: if (ms >= 12 * MSEC_PER_SEC / HZ) { msleep(ms); } else { u64 us = ms * USEC_PER_MSEC; usleep_range(us, us / 8); } > usleep_range(min, min) is used because there is scant > opportunity for timer coalescing during ACPI flows > related to system suspend, resume (or initialization). > > ie. During these flows usleep_range(min, max) is observed to > be effectvely be the same as usleep_range(max, max). > > Similarly, msleep() for long sleeps is not considered because > these flows almost never have opportunities to coalesce > with other activity on jiffie boundaries, leaving no > measurably benefit to rounding up to jiffie boundaries. > > Background: > > acpi_os_sleep() supports the ACPI AML Sleep(msec) operator, > and it must not return before the requested number of msec. > > Until Linux-3.13, this contract was sometimes violated by using > schedule_timeout_interruptible(j), which could return early. > > Since Linux-3.13, acpi_os_sleep() uses msleep(), > which doesn't return early, but is still subject > to long delays due to the low resolution of the jiffie clock. > > Linux-6.12 removed a stray jiffie from msleep: commit 4381b895f544 > ("timers: Remove historical extra jiffie for timeout in msleep()") > The 4ms savings is material for some durations, > but msleep is still generally too course. eg msleep(5) > on a 250HZ system still takes 11.9ms. > > System resume performance of a Dell XPS 13 9300: > > Linux-6.11: > msleep HZ 250 2460 ms > > Linux-6.12: > msleep HZ 250 1943 ms > msleep HZ 1000 1233 ms > usleep HZ 250 1127 ms > usleep HZ 1000 1130 ms > > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=216263 > Signed-off-by: Len Brown <len.brown@intel.com> > Suggested-by: Arjan van de Ven <arjan@linux.intel.com> > Tested-by: Todd Brandt <todd.e.brandt@intel.com> > --- > drivers/acpi/osl.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c > index 70af3fbbebe5..daf87e33b8ea 100644 > --- a/drivers/acpi/osl.c > +++ b/drivers/acpi/osl.c > @@ -607,7 +607,9 @@ acpi_status acpi_os_remove_interrupt_handler(u32 gsi, acpi_osd_handler handler) > > void acpi_os_sleep(u64 ms) > { > - msleep(ms); > + u64 us = ms * USEC_PER_MSEC; > + > + usleep_range(us, us); > } > > void acpi_os_stall(u32 us) > -- > 2.43.0 >
Hi Rafael, Len, On 18-Nov-24 12:03 PM, Rafael J. Wysocki wrote: > On Sat, Nov 16, 2024 at 12:11 AM Len Brown <lenb@kernel.org> wrote: >> >> From: Len Brown <len.brown@intel.com> >> >> Replace msleep() with usleep_range() in acpi_os_sleep(). >> >> This has a significant user-visible performance benefit >> on some ACPI flows on some systems. eg. Kernel resume >> time of a Dell XPS-13-9300 drops from 1943ms to 1127ms (42%). > > Sure. > > And the argument seems to be that it is better to always use more > resources in a given path (ACPI sleep in this particular case) than to > be somewhat inaccurate which is visible in some cases. > > This would mean that hrtimers should always be used everywhere, but they aren't. > > While I have nothing against addressing the short sleeps issue where > the msleep() inaccuracy is too large, I don't see why this requires > using a hrtimer with no slack in all cases. > > The argument seems to be that the short sleeps case is hard to > distinguish from the other cases, but I'm not sure about this. > > Also, something like this might work, but for some reason you don't > want to do it: > > if (ms >= 12 * MSEC_PER_SEC / HZ) { > msleep(ms); > } else { > u64 us = ms * USEC_PER_MSEC; > > usleep_range(us, us / 8); > } FWIW I was thinking the same thing, that it would be good to still use msleep when the sleep is > (MSEC_PER_SEC / HZ), not sure why you added the 12 there ? Surely something like a sleep longer then 3 timerticks (I know we have NOHZ but still) would already be long enough to not worry about msleep slack ? And I assume the usleep_range(us, us / 8); is a typo ? Ma can never be less then max, maybe you meant: usleep_range(us, us + 8) ? OTOH it is not like we will hit these ACPI acpi_os_sleep() calls multiple times per second all the time. On a normal idle system I expect there to not be that many calls (could still be a few from ACPI managed devices going into + out of runtime-pm regularly). And if don't hit acpi_os_sleep() calls multiple times per second then the chances of time coalescing are not that big anyways. Still I think that finding something middle ground between always sleeping the exact min time and the old msleep() call, as Rafael is proposing, would be good IMHO. Regards, Hans > >> usleep_range(min, min) is used because there is scant >> opportunity for timer coalescing during ACPI flows >> related to system suspend, resume (or initialization). >> >> ie. During these flows usleep_range(min, max) is observed to >> be effectvely be the same as usleep_range(max, max). >> >> Similarly, msleep() for long sleeps is not considered because >> these flows almost never have opportunities to coalesce >> with other activity on jiffie boundaries, leaving no >> measurably benefit to rounding up to jiffie boundaries. >> >> Background: >> >> acpi_os_sleep() supports the ACPI AML Sleep(msec) operator, >> and it must not return before the requested number of msec. >> >> Until Linux-3.13, this contract was sometimes violated by using >> schedule_timeout_interruptible(j), which could return early. >> >> Since Linux-3.13, acpi_os_sleep() uses msleep(), >> which doesn't return early, but is still subject >> to long delays due to the low resolution of the jiffie clock. >> >> Linux-6.12 removed a stray jiffie from msleep: commit 4381b895f544 >> ("timers: Remove historical extra jiffie for timeout in msleep()") >> The 4ms savings is material for some durations, >> but msleep is still generally too course. eg msleep(5) >> on a 250HZ system still takes 11.9ms. >> >> System resume performance of a Dell XPS 13 9300: >> >> Linux-6.11: >> msleep HZ 250 2460 ms >> >> Linux-6.12: >> msleep HZ 250 1943 ms >> msleep HZ 1000 1233 ms >> usleep HZ 250 1127 ms >> usleep HZ 1000 1130 ms >> >> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=216263 >> Signed-off-by: Len Brown <len.brown@intel.com> >> Suggested-by: Arjan van de Ven <arjan@linux.intel.com> >> Tested-by: Todd Brandt <todd.e.brandt@intel.com> >> --- >> drivers/acpi/osl.c | 4 +++- >> 1 file changed, 3 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c >> index 70af3fbbebe5..daf87e33b8ea 100644 >> --- a/drivers/acpi/osl.c >> +++ b/drivers/acpi/osl.c >> @@ -607,7 +607,9 @@ acpi_status acpi_os_remove_interrupt_handler(u32 gsi, acpi_osd_handler handler) >> >> void acpi_os_sleep(u64 ms) >> { >> - msleep(ms); >> + u64 us = ms * USEC_PER_MSEC; >> + >> + usleep_range(us, us); >> } >> >> void acpi_os_stall(u32 us) >> -- >> 2.43.0 >> >
Hi Hans, On Mon, Nov 18, 2024 at 12:38 PM Hans de Goede <hdegoede@redhat.com> wrote: > > Hi Rafael, Len, > > On 18-Nov-24 12:03 PM, Rafael J. Wysocki wrote: > > On Sat, Nov 16, 2024 at 12:11 AM Len Brown <lenb@kernel.org> wrote: > >> > >> From: Len Brown <len.brown@intel.com> > >> > >> Replace msleep() with usleep_range() in acpi_os_sleep(). > >> > >> This has a significant user-visible performance benefit > >> on some ACPI flows on some systems. eg. Kernel resume > >> time of a Dell XPS-13-9300 drops from 1943ms to 1127ms (42%). > > > > Sure. > > > > And the argument seems to be that it is better to always use more > > resources in a given path (ACPI sleep in this particular case) than to > > be somewhat inaccurate which is visible in some cases. > > > > This would mean that hrtimers should always be used everywhere, but they aren't. > > > > While I have nothing against addressing the short sleeps issue where > > the msleep() inaccuracy is too large, I don't see why this requires > > using a hrtimer with no slack in all cases. > > > > The argument seems to be that the short sleeps case is hard to > > distinguish from the other cases, but I'm not sure about this. > > > > Also, something like this might work, but for some reason you don't > > want to do it: > > > > if (ms >= 12 * MSEC_PER_SEC / HZ) { > > msleep(ms); > > } else { > > u64 us = ms * USEC_PER_MSEC; > > > > usleep_range(us, us / 8); Should be usleep_range(us, us + us / 8); (I notoriously confuse this API). > > } > > FWIW I was thinking the same thing, that it would be good to still > use msleep when the sleep is > (MSEC_PER_SEC / HZ), not sure > why you added the 12 there ? Surely something like a sleep longer > then 3 timerticks (I know we have NOHZ but still) would already be > long enough to not worry about msleep slack ? The typical msleep() overhead in 6.12 appears to be 1.5 jiffy which is 1.5 * MSEC_PER_SEC / HZ and I want the usleep() delta to be less than this, so delta = ms / 8 <= 1.5 * MSEC_PER_SEC / HZ > And I assume the usleep_range(us, us / 8); is a typo ? Ma can > never be less then max, maybe you meant: usleep_range(us, us + 8) ? No, please see above. > OTOH it is not like we will hit these ACPI acpi_os_sleep() > calls multiple times per second all the time. On a normal idle > system I expect there to not be that many calls (could still > be a few from ACPI managed devices going into + out of > runtime-pm regularly). And if don't hit acpi_os_sleep() calls > multiple times per second then the chances of time coalescing > are not that big anyways. > > Still I think that finding something middle ground between always > sleeping the exact min time and the old msleep() call, as Rafael > is proposing, would be good IMHO. Thanks for the feedback! > >> usleep_range(min, min) is used because there is scant > >> opportunity for timer coalescing during ACPI flows > >> related to system suspend, resume (or initialization). > >> > >> ie. During these flows usleep_range(min, max) is observed to > >> be effectvely be the same as usleep_range(max, max). > >> > >> Similarly, msleep() for long sleeps is not considered because > >> these flows almost never have opportunities to coalesce > >> with other activity on jiffie boundaries, leaving no > >> measurably benefit to rounding up to jiffie boundaries. > >> > >> Background: > >> > >> acpi_os_sleep() supports the ACPI AML Sleep(msec) operator, > >> and it must not return before the requested number of msec. > >> > >> Until Linux-3.13, this contract was sometimes violated by using > >> schedule_timeout_interruptible(j), which could return early. > >> > >> Since Linux-3.13, acpi_os_sleep() uses msleep(), > >> which doesn't return early, but is still subject > >> to long delays due to the low resolution of the jiffie clock. > >> > >> Linux-6.12 removed a stray jiffie from msleep: commit 4381b895f544 > >> ("timers: Remove historical extra jiffie for timeout in msleep()") > >> The 4ms savings is material for some durations, > >> but msleep is still generally too course. eg msleep(5) > >> on a 250HZ system still takes 11.9ms. > >> > >> System resume performance of a Dell XPS 13 9300: > >> > >> Linux-6.11: > >> msleep HZ 250 2460 ms > >> > >> Linux-6.12: > >> msleep HZ 250 1943 ms > >> msleep HZ 1000 1233 ms > >> usleep HZ 250 1127 ms > >> usleep HZ 1000 1130 ms > >> > >> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=216263 > >> Signed-off-by: Len Brown <len.brown@intel.com> > >> Suggested-by: Arjan van de Ven <arjan@linux.intel.com> > >> Tested-by: Todd Brandt <todd.e.brandt@intel.com> > >> --- > >> drivers/acpi/osl.c | 4 +++- > >> 1 file changed, 3 insertions(+), 1 deletion(-) > >> > >> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c > >> index 70af3fbbebe5..daf87e33b8ea 100644 > >> --- a/drivers/acpi/osl.c > >> +++ b/drivers/acpi/osl.c > >> @@ -607,7 +607,9 @@ acpi_status acpi_os_remove_interrupt_handler(u32 gsi, acpi_osd_handler handler) > >> > >> void acpi_os_sleep(u64 ms) > >> { > >> - msleep(ms); > >> + u64 us = ms * USEC_PER_MSEC; > >> + > >> + usleep_range(us, us); > >> } > >> > >> void acpi_os_stall(u32 us) > >> -- > >> 2.43.0 > >> > > >
Hi Rafael, On 18-Nov-24 1:02 PM, Rafael J. Wysocki wrote: > Hi Hans, > > On Mon, Nov 18, 2024 at 12:38 PM Hans de Goede <hdegoede@redhat.com> wrote: >> >> Hi Rafael, Len, >> >> On 18-Nov-24 12:03 PM, Rafael J. Wysocki wrote: >>> On Sat, Nov 16, 2024 at 12:11 AM Len Brown <lenb@kernel.org> wrote: >>>> >>>> From: Len Brown <len.brown@intel.com> >>>> >>>> Replace msleep() with usleep_range() in acpi_os_sleep(). >>>> >>>> This has a significant user-visible performance benefit >>>> on some ACPI flows on some systems. eg. Kernel resume >>>> time of a Dell XPS-13-9300 drops from 1943ms to 1127ms (42%). >>> >>> Sure. >>> >>> And the argument seems to be that it is better to always use more >>> resources in a given path (ACPI sleep in this particular case) than to >>> be somewhat inaccurate which is visible in some cases. >>> >>> This would mean that hrtimers should always be used everywhere, but they aren't. >>> >>> While I have nothing against addressing the short sleeps issue where >>> the msleep() inaccuracy is too large, I don't see why this requires >>> using a hrtimer with no slack in all cases. >>> >>> The argument seems to be that the short sleeps case is hard to >>> distinguish from the other cases, but I'm not sure about this. >>> >>> Also, something like this might work, but for some reason you don't >>> want to do it: >>> >>> if (ms >= 12 * MSEC_PER_SEC / HZ) { >>> msleep(ms); >>> } else { >>> u64 us = ms * USEC_PER_MSEC; >>> >>> usleep_range(us, us / 8); > > Should be > > usleep_range(us, us + us / 8); > > (I notoriously confuse this API). I see. >>> } >> >> FWIW I was thinking the same thing, that it would be good to still >> use msleep when the sleep is > (MSEC_PER_SEC / HZ), not sure >> why you added the 12 there ? Surely something like a sleep longer >> then 3 timerticks (I know we have NOHZ but still) would already be >> long enough to not worry about msleep slack ? > > The typical msleep() overhead in 6.12 appears to be 1.5 jiffy which is > 1.5 * MSEC_PER_SEC / HZ and I want the usleep() delta to be less than > this, so > > delta = ms / 8 <= 1.5 * MSEC_PER_SEC / HZ Ok, that makes sense. But this probably requires a comment explaining this so that when someone looks at this in the future they understand where the 12 comes from. Where as the / 8 is just a choice right? I think it is decent choice, but still this is just a value you picked which should work nicely, right ? >> OTOH it is not like we will hit these ACPI acpi_os_sleep() >> calls multiple times per second all the time. On a normal idle >> system I expect there to not be that many calls (could still >> be a few from ACPI managed devices going into + out of >> runtime-pm regularly). And if don't hit acpi_os_sleep() calls >> multiple times per second then the chances of time coalescing >> are not that big anyways. >> >> Still I think that finding something middle ground between always >> sleeping the exact min time and the old msleep() call, as Rafael >> is proposing, would be good IMHO. > > Thanks for the feedback! You're welcome. Len any chance you can give Rafael's proposal a test run on the same Dell XPS 13 9300 and see what this means for the resume time ? If this gets close enough to your patch I think we should go with what Rafael is proposing. Regards, Hans >>>> usleep_range(min, min) is used because there is scant >>>> opportunity for timer coalescing during ACPI flows >>>> related to system suspend, resume (or initialization). >>>> >>>> ie. During these flows usleep_range(min, max) is observed to >>>> be effectvely be the same as usleep_range(max, max). >>>> >>>> Similarly, msleep() for long sleeps is not considered because >>>> these flows almost never have opportunities to coalesce >>>> with other activity on jiffie boundaries, leaving no >>>> measurably benefit to rounding up to jiffie boundaries. >>>> >>>> Background: >>>> >>>> acpi_os_sleep() supports the ACPI AML Sleep(msec) operator, >>>> and it must not return before the requested number of msec. >>>> >>>> Until Linux-3.13, this contract was sometimes violated by using >>>> schedule_timeout_interruptible(j), which could return early. >>>> >>>> Since Linux-3.13, acpi_os_sleep() uses msleep(), >>>> which doesn't return early, but is still subject >>>> to long delays due to the low resolution of the jiffie clock. >>>> >>>> Linux-6.12 removed a stray jiffie from msleep: commit 4381b895f544 >>>> ("timers: Remove historical extra jiffie for timeout in msleep()") >>>> The 4ms savings is material for some durations, >>>> but msleep is still generally too course. eg msleep(5) >>>> on a 250HZ system still takes 11.9ms. >>>> >>>> System resume performance of a Dell XPS 13 9300: >>>> >>>> Linux-6.11: >>>> msleep HZ 250 2460 ms >>>> >>>> Linux-6.12: >>>> msleep HZ 250 1943 ms >>>> msleep HZ 1000 1233 ms >>>> usleep HZ 250 1127 ms >>>> usleep HZ 1000 1130 ms >>>> >>>> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=216263 >>>> Signed-off-by: Len Brown <len.brown@intel.com> >>>> Suggested-by: Arjan van de Ven <arjan@linux.intel.com> >>>> Tested-by: Todd Brandt <todd.e.brandt@intel.com> >>>> --- >>>> drivers/acpi/osl.c | 4 +++- >>>> 1 file changed, 3 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c >>>> index 70af3fbbebe5..daf87e33b8ea 100644 >>>> --- a/drivers/acpi/osl.c >>>> +++ b/drivers/acpi/osl.c >>>> @@ -607,7 +607,9 @@ acpi_status acpi_os_remove_interrupt_handler(u32 gsi, acpi_osd_handler handler) >>>> >>>> void acpi_os_sleep(u64 ms) >>>> { >>>> - msleep(ms); >>>> + u64 us = ms * USEC_PER_MSEC; >>>> + >>>> + usleep_range(us, us); >>>> } >>>> >>>> void acpi_os_stall(u32 us) >>>> -- >>>> 2.43.0 >>>> >>> >> >
Hi Hans, On Mon, Nov 18, 2024 at 1:10 PM Hans de Goede <hdegoede@redhat.com> wrote: > > Hi Rafael, > > On 18-Nov-24 1:02 PM, Rafael J. Wysocki wrote: > > Hi Hans, > > > > On Mon, Nov 18, 2024 at 12:38 PM Hans de Goede <hdegoede@redhat.com> wrote: > >> > >> Hi Rafael, Len, > >> > >> On 18-Nov-24 12:03 PM, Rafael J. Wysocki wrote: > >>> On Sat, Nov 16, 2024 at 12:11 AM Len Brown <lenb@kernel.org> wrote: > >>>> > >>>> From: Len Brown <len.brown@intel.com> > >>>> > >>>> Replace msleep() with usleep_range() in acpi_os_sleep(). > >>>> > >>>> This has a significant user-visible performance benefit > >>>> on some ACPI flows on some systems. eg. Kernel resume > >>>> time of a Dell XPS-13-9300 drops from 1943ms to 1127ms (42%). > >>> > >>> Sure. > >>> > >>> And the argument seems to be that it is better to always use more > >>> resources in a given path (ACPI sleep in this particular case) than to > >>> be somewhat inaccurate which is visible in some cases. > >>> > >>> This would mean that hrtimers should always be used everywhere, but they aren't. > >>> > >>> While I have nothing against addressing the short sleeps issue where > >>> the msleep() inaccuracy is too large, I don't see why this requires > >>> using a hrtimer with no slack in all cases. > >>> > >>> The argument seems to be that the short sleeps case is hard to > >>> distinguish from the other cases, but I'm not sure about this. > >>> > >>> Also, something like this might work, but for some reason you don't > >>> want to do it: > >>> > >>> if (ms >= 12 * MSEC_PER_SEC / HZ) { > >>> msleep(ms); > >>> } else { > >>> u64 us = ms * USEC_PER_MSEC; > >>> > >>> usleep_range(us, us / 8); > > > > Should be > > > > usleep_range(us, us + us / 8); > > > > (I notoriously confuse this API). > > I see. > > >>> } > >> > >> FWIW I was thinking the same thing, that it would be good to still > >> use msleep when the sleep is > (MSEC_PER_SEC / HZ), not sure > >> why you added the 12 there ? Surely something like a sleep longer > >> then 3 timerticks (I know we have NOHZ but still) would already be > >> long enough to not worry about msleep slack ? > > > > The typical msleep() overhead in 6.12 appears to be 1.5 jiffy which is > > 1.5 * MSEC_PER_SEC / HZ and I want the usleep() delta to be less than > > this, so > > > > delta = ms / 8 <= 1.5 * MSEC_PER_SEC / HZ > > Ok, that makes sense. But this probably requires a comment explaining > this so that when someone looks at this in the future they understand > where the 12 comes from. Sure. > Where as the / 8 is just a choice right? I think it is decent choice, > but still this is just a value you picked which should work nicely, > right ? Right. I chose a power of 2 close to 10%. > >> OTOH it is not like we will hit these ACPI acpi_os_sleep() > >> calls multiple times per second all the time. On a normal idle > >> system I expect there to not be that many calls (could still > >> be a few from ACPI managed devices going into + out of > >> runtime-pm regularly). And if don't hit acpi_os_sleep() calls > >> multiple times per second then the chances of time coalescing > >> are not that big anyways. > >> > >> Still I think that finding something middle ground between always > >> sleeping the exact min time and the old msleep() call, as Rafael > >> is proposing, would be good IMHO. > > > > Thanks for the feedback! > > You're welcome. > > Len any chance you can give Rafael's proposal a test run on the > same Dell XPS 13 9300 and see what this means for the resume time ? > > If this gets close enough to your patch I think we should go with > what Rafael is proposing. Thanks! > >>>> usleep_range(min, min) is used because there is scant > >>>> opportunity for timer coalescing during ACPI flows > >>>> related to system suspend, resume (or initialization). > >>>> > >>>> ie. During these flows usleep_range(min, max) is observed to > >>>> be effectvely be the same as usleep_range(max, max). > >>>> > >>>> Similarly, msleep() for long sleeps is not considered because > >>>> these flows almost never have opportunities to coalesce > >>>> with other activity on jiffie boundaries, leaving no > >>>> measurably benefit to rounding up to jiffie boundaries. > >>>> > >>>> Background: > >>>> > >>>> acpi_os_sleep() supports the ACPI AML Sleep(msec) operator, > >>>> and it must not return before the requested number of msec. > >>>> > >>>> Until Linux-3.13, this contract was sometimes violated by using > >>>> schedule_timeout_interruptible(j), which could return early. > >>>> > >>>> Since Linux-3.13, acpi_os_sleep() uses msleep(), > >>>> which doesn't return early, but is still subject > >>>> to long delays due to the low resolution of the jiffie clock. > >>>> > >>>> Linux-6.12 removed a stray jiffie from msleep: commit 4381b895f544 > >>>> ("timers: Remove historical extra jiffie for timeout in msleep()") > >>>> The 4ms savings is material for some durations, > >>>> but msleep is still generally too course. eg msleep(5) > >>>> on a 250HZ system still takes 11.9ms. > >>>> > >>>> System resume performance of a Dell XPS 13 9300: > >>>> > >>>> Linux-6.11: > >>>> msleep HZ 250 2460 ms > >>>> > >>>> Linux-6.12: > >>>> msleep HZ 250 1943 ms > >>>> msleep HZ 1000 1233 ms > >>>> usleep HZ 250 1127 ms > >>>> usleep HZ 1000 1130 ms > >>>> > >>>> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=216263 > >>>> Signed-off-by: Len Brown <len.brown@intel.com> > >>>> Suggested-by: Arjan van de Ven <arjan@linux.intel.com> > >>>> Tested-by: Todd Brandt <todd.e.brandt@intel.com> > >>>> --- > >>>> drivers/acpi/osl.c | 4 +++- > >>>> 1 file changed, 3 insertions(+), 1 deletion(-) > >>>> > >>>> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c > >>>> index 70af3fbbebe5..daf87e33b8ea 100644 > >>>> --- a/drivers/acpi/osl.c > >>>> +++ b/drivers/acpi/osl.c > >>>> @@ -607,7 +607,9 @@ acpi_status acpi_os_remove_interrupt_handler(u32 gsi, acpi_osd_handler handler) > >>>> > >>>> void acpi_os_sleep(u64 ms) > >>>> { > >>>> - msleep(ms); > >>>> + u64 us = ms * USEC_PER_MSEC; > >>>> + > >>>> + usleep_range(us, us); > >>>> } > >>>> > >>>> void acpi_os_stall(u32 us) > >>>> -- > >>>> 2.43.0 > >>>> > >>> > >> > > >
> And the argument seems to be that it is better to always use more > resources in a given path (ACPI sleep in this particular case) than to > be somewhat inaccurate which is visible in some cases. > > This would mean that hrtimers should always be used everywhere, but they aren't. more or less rule of thumb is that regular timers are optimized for not firing case (e.g. timeouts that get deleted when the actual event happens) while hrtimers are optimized for the case where the timer is expected to fire. (I'm with you on the slack argument, some amount of slack, even if it is only a usec or two, is very helpful)
On Mon, Nov 18, 2024 at 3:35 PM Arjan van de Ven <arjan@linux.intel.com> wrote: > > > And the argument seems to be that it is better to always use more > > resources in a given path (ACPI sleep in this particular case) than to > > be somewhat inaccurate which is visible in some cases. > > > > This would mean that hrtimers should always be used everywhere, but they aren't. > > more or less rule of thumb is that regular timers are optimized for not firing case > (e.g. timeouts that get deleted when the actual event happens) while hrtimers > are optimized for the case where the timer is expected to fire. I've heard that, which makes me wonder why msleep() is still there. One thing that's rarely mentioned is that programming a timer in HW actually takes time, so if it is done too often, it hurts performance through latency (even if this is the TSC deadline timer). > (I'm with you on the slack argument, some amount of slack, even if it is only a usec or two, > is very helpful) Thanks!
On 11/19/2024 5:42 AM, Rafael J. Wysocki wrote: > On Mon, Nov 18, 2024 at 3:35 PM Arjan van de Ven <arjan@linux.intel.com> wrote: >> >>> And the argument seems to be that it is better to always use more >>> resources in a given path (ACPI sleep in this particular case) than to >>> be somewhat inaccurate which is visible in some cases. >>> >>> This would mean that hrtimers should always be used everywhere, but they aren't. >> >> more or less rule of thumb is that regular timers are optimized for not firing case >> (e.g. timeouts that get deleted when the actual event happens) while hrtimers >> are optimized for the case where the timer is expected to fire. > > I've heard that, which makes me wonder why msleep() is still there. > > One thing that's rarely mentioned is that programming a timer in HW > actually takes time, so if it is done too often, it hurts performance > through latency (even if this is the TSC deadline timer). yup and this is why you want to group events together "somewhat", and which is why we have slack, to allow that to happen > >> (I'm with you on the slack argument, some amount of slack, even if it is only a usec or two, >> is very helpful) > > Thanks!
On 20/11/2024 10:01, Pierre Gondois wrote: > > > On 11/18/24 13:02, Rafael J. Wysocki wrote: >> Hi Hans, >> >> On Mon, Nov 18, 2024 at 12:38 PM Hans de Goede <hdegoede@redhat.com> >> wrote: >>> >>> Hi Rafael, Len, >>> >>> On 18-Nov-24 12:03 PM, Rafael J. Wysocki wrote: >>>> On Sat, Nov 16, 2024 at 12:11 AM Len Brown <lenb@kernel.org> wrote: [...] > FWIW, testing the above version on an Arm Juno platform by executing > the following method: > > Method (SLEE, 1, Serialized) { > Sleep(Arg0) > } > > _wo: without patch > _w: with patch > - Values in ns. > - Requesting to sleep X ms > - Tested over 10 iterations > - HZ=250 > +------+------------+----------+------------+---------+-----------+ > | ms | mean_wo | std_wo | mean_w | std_w | ratio | > +------+------------+----------+------------+---------+-----------+ > | 1 | 8087797 | 2079703 | 1313920 | 55066 | -83.75429 | > | 2 | 7942471 | 2201985 | 2416064 | 111604 | -69.58044 | > | 3 | 8373704 | 144274 | 3632537 | 111037 | -56.61970 | > | 4 | 7946013 | 2214330 | 4606028 | 255838 | -42.03346 | > | 5 | 11418920 | 1673914 | 5955548 | 131862 | -47.84490 | > | 6 | 11427042 | 1677519 | 7045713 | 211439 | -38.34176 | > | 7 | 12301242 | 221580 | 8174633 | 330050 | -33.54628 | > | 8 | 11411606 | 1672182 | 9191048 | 431767 | -19.45877 | > | 9 | 16722304 | 1288625 | 10517284 | 103274 | -37.10625 | > | 10 | 16746542 | 1280385 | 11564426 | 417218 | -30.94439 | > | 20 | 24294957 | 70703 | 22756497 | 673936 | -6.33243 | > | 30 | 36284782 | 74340 | 34131455 | 391473 | -5.93452 | > | 40 | 44703162 | 1199709 | 45407108 | 289715 | 1.57471 | > | 50 | 56311282 | 281418 | 56098040 | 607739 | -0.37868 | > | 60 | 64225811 | 247587 | 64302246 | 132059 | 0.11901 | > | 70 | 76299457 | 99853 | 76282497 | 83910 | -0.02223 | > | 100 | 104214393 | 38642 | 104212524 | 244424 | -0.00179 | > | 1000 | 1016131215 | 245725 | 1017051744 | 2748280 | 0.09059 | > | 2000 | 2007711297 | 1325094 | 2007628922 | 1421807 | -0.00410 | > +------+------------+----------+------------+---------+-----------+ > - With the patch, the min sleep duration is never below the requested > sleep duration > > So indeed the penalty of using msleep is big for small sleep durations. Just to make sure, this is with Rafael's proposal, using msleep() for values >= 48ms = (12 * 1000/250)ms and usleep_range() otherwise?
On 11/20/2024 10:03 AM, Rafael J. Wysocki wrote: > On Tue, Nov 19, 2024 at 4:08 PM Arjan van de Ven <arjan@linux.intel.com> wrote: >> >> On 11/19/2024 5:42 AM, Rafael J. Wysocki wrote: >>> On Mon, Nov 18, 2024 at 3:35 PM Arjan van de Ven <arjan@linux.intel.com> wrote: >>>> >>>>> And the argument seems to be that it is better to always use more >>>>> resources in a given path (ACPI sleep in this particular case) than to >>>>> be somewhat inaccurate which is visible in some cases. >>>>> >>>>> This would mean that hrtimers should always be used everywhere, but they aren't. >>>> >>>> more or less rule of thumb is that regular timers are optimized for not firing case >>>> (e.g. timeouts that get deleted when the actual event happens) while hrtimers >>>> are optimized for the case where the timer is expected to fire. >>> >>> I've heard that, which makes me wonder why msleep() is still there. >>> >>> One thing that's rarely mentioned is that programming a timer in HW >>> actually takes time, so if it is done too often, it hurts performance >>> through latency (even if this is the TSC deadline timer). >> >> yup and this is why you want to group events together "somewhat", and which is why >> we have slack, to allow that to happen > > So what do you think would be the minimum slack to use in this case? > > I thought about something on the order of 199 us, but now I'm thinking > that 50 us would work too. Less than this - I'm not sure. 50 usec is likely more than enough in practice.
On Wed, Nov 20, 2024 at 1:50 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > 50 usec is likely more than enough in practice. > > And would you use the same slack value regardless of the sleep > duration, or make it somehow depend on the sleep duration? timerslack_ns is 50 usec for all user-space, no matter the duration. This part was done right -- it doesn't depend on the sleep duration. Coalescing depends on the pattern of wakeups over time. That pattern doesn't necessarily depend on sleep duration, so it is a hard sell to tie them together.
On Wed, Nov 20, 2024 at 8:18 PM Arjan van de Ven <arjan@linux.intel.com> wrote: > > On 11/20/2024 10:49 AM, Rafael J. Wysocki wrote: > >>> I thought about something on the order of 199 us, but now I'm thinking > >>> that 50 us would work too. Less than this - I'm not sure. > >> > >> 50 usec is likely more than enough in practice. > > > > And would you use the same slack value regardless of the sleep > > duration, or make it somehow depend on the sleep duration? > > I don't see why you'd make it dependent on the sleep duration > sure in theory the longer the sleep -- you could pick a fixed percentage > > but you're trying to amortize a theoretical timer register write, and a > cpu wakeup. the timer write is fixed cost and not THAT expensive after > some amount of this . the C state wake up --- sure that is more variable > but that is super important for high occurance things (thousands to millions > of times per hour). > If your ACPI sleeps are high occurance on a system I suspect you have way > bigger problems than an occasional extra wakeup I generally think that if you are sleeping relatively long, you may as well sacrifice some precision for avoiding system stress so to speak, so I've been considering something like flat 50 us for sleeps between 1 and 5 ms and then 1% of the sleep duration for longer sleeps.
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c index 70af3fbbebe5..daf87e33b8ea 100644 --- a/drivers/acpi/osl.c +++ b/drivers/acpi/osl.c @@ -607,7 +607,9 @@ acpi_status acpi_os_remove_interrupt_handler(u32 gsi, acpi_osd_handler handler) void acpi_os_sleep(u64 ms) { - msleep(ms); + u64 us = ms * USEC_PER_MSEC; + + usleep_range(us, us); } void acpi_os_stall(u32 us)