Message ID | 13679187.uLZWGnKmhe@rjwysocki.net |
---|---|
State | New |
Headers | show |
Series | cpuidle: teo: Fix and cleanups | expand |
On Fri, 2025-01-10 at 13:16 +0000, Christian Loehle wrote: > This would then enable intercept-detection only for <50% of the time, > another option is to not allow intercepts selecting a polling state, but > there were recent complaints about this exact behavior from Aboorva (+TO). > They don't have a low-latency non-polling state. What folks think about the following idea. Idle governor algorithm essentially predicts the sleep time (step 1), based on that, the idle state gets selected (step 2). What if we add a sleep time factor and expose it via sysfs. The predicted sleep time would be multiplied by the factor (between steps 1 and 2). Default factor value is 1 (or 100%). If users want teo be more hesitant selecting deeper idle states, they can set it to 0.5 (or 50%) or some other value < 1. I users want teo to be more enthusiastic about selecting deeper idle states, they set the factor to a value > 1. We could expose it via sysfs and allow changing in some reasonable range. The idea is to let users adjust the level of idle governor (teo in this case) enthusiasm regarding deep C-state. Thanks, Artem.
On Thu, Jan 23, 2025 at 5:54 PM Artem Bityutskiy <dedekind1@gmail.com> wrote: > > On Fri, 2025-01-10 at 13:16 +0000, Christian Loehle wrote: > > This would then enable intercept-detection only for <50% of the time, > > another option is to not allow intercepts selecting a polling state, but > > there were recent complaints about this exact behavior from Aboorva (+TO). > > They don't have a low-latency non-polling state. > > What folks think about the following idea. > > Idle governor algorithm essentially predicts the sleep time (step 1), based on > that, the idle state gets selected (step 2). No, it's kind of the reverse nowadays. Step 1 is to preselect an idle state using the idle duration information from the recent past and step 2 is to refine the selection using the sleep time information (unless step 1 by itself is sufficient). In particular, teo doesn't attempt to predict the idle duration. It basically picks up the deepest idle state that wouldn't have been too deep in the majority of cases taken into account. > What if we add a sleep time factor and expose it via sysfs. The predicted sleep > time would be multiplied by the factor (between steps 1 and 2). > > Default factor value is 1 (or 100%). If users want teo be more hesitant > selecting deeper idle states, they can set it to 0.5 (or 50%) or some other > value < 1. I users want teo to be more enthusiastic about selecting deeper idle > states, they set the factor to a value > 1. > > We could expose it via sysfs and allow changing in some reasonable range. > > The idea is to let users adjust the level of idle governor (teo in this case) > enthusiasm regarding deep C-state. For teo, that value from user space would effectively work as an upper boundary for the target residency of idle states to select and there is not too much difference between this and the existing PM (CPU latency) QoS (limiting target residency is equivalent to limiting exit latency). If you want a "latency bias" knob, for teo it might be something related to what is regarded as a "sufficient majority". Right now it is hardcoded to "over 50%", but it may be a different fraction in principle ((e.g. 5/8, 2/3 etc.) and it may be tunable within some limits.
--- a/drivers/cpuidle/governors/teo.c +++ b/drivers/cpuidle/governors/teo.c @@ -422,7 +422,8 @@ first_suitable_idx = i; } } - if (!idx && prev_intercept_idx) { + if (!idx && prev_intercept_idx && + !(drv->states[0].flags & CPUIDLE_FLAG_POLLING)) { /* * We have to query the sleep length here otherwise we don't * know after wakeup if our guess was correct.