Message ID | 20231211104855.558096-4-vincent.guittot@linaro.org |
---|---|
State | Accepted |
Commit | b3edde44e5d4504c23a176819865cd603fd16d6c |
Headers | show |
Series | consolidate and cleanup CPU capacity | expand |
On Mon, Dec 11, 2023 at 11:48:51AM +0100, Vincent Guittot wrote: > cpuinfo.max_freq can change at runtime because of boost as an example. This > implies that the value could be different than the one that has been > used when computing the capacity of a CPU. So, this seems very weird but I just finished a bisection of a failure that showed up during the merge window with the DT kselftests on meson-gxl-s905x-libretech-cc (Libretech Le Potato) running an arm64 defconfig. Looking at the commit this seems like a very surprising false result but I verified that the commit before this one runs fine and reproduced the failure, the bisect does seem to converge well also so it appears like there's something going on. Does this ring any bells for anyone? The test is timed out by the kselftest wrapper: kselftest: Running tests in dt TAP version 13 1..1 # timeout set to 45 # selftests: dt: test_unprobed_devices.sh # TAP version 13 # not ok 1 selftests: dt: test_unprobed_devices.sh # TIMEOUT 45 seconds with no obvious effect on the system, a successful run takes less than 20 seconds so it shouldn't be that the timing is marginal. I'm guessing that reading the one of the files under /sys/devices hung. No other boards in my lab are affected and I'm not immediately seeing anything remarkable about this board. Full log from a bad run at: https://lava.sirena.org.uk/scheduler/job/493329 and a good run: https://lava.sirena.org.uk/scheduler/job/492453 Bisect log (the skipped commits didn't build): git bisect start # status: waiting for both good and bad commits # bad: [4fbbed7872677b0a28ba8237169968171a61efbd] Merge tag 'timers-core-2024-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip git bisect bad 4fbbed7872677b0a28ba8237169968171a61efbd # status: waiting for good commit(s), bad commit known # good: [0dd3ee31125508cd67f7e7172247f05b7fd1753a] Linux 6.7 git bisect good 0dd3ee31125508cd67f7e7172247f05b7fd1753a # bad: [ba5afb9a84df2e6b26a1b6389b98849cd16ea757] fs: rework listmount() implementation git bisect bad ba5afb9a84df2e6b26a1b6389b98849cd16ea757 # bad: [de927f6c0b07d9e698416c5b287c521b07694cac] Merge tag 's390-6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux git bisect bad de927f6c0b07d9e698416c5b287c521b07694cac # bad: [35f11a3710cdcbbc5090d14017a6295454e0cc73] Merge tag 'mtd/for-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux git bisect bad 35f11a3710cdcbbc5090d14017a6295454e0cc73 # bad: [d30e51aa7b1f6fa7dd78d4598d1e4c047fcc3fb9] Merge tag 'slab-for-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab git bisect bad d30e51aa7b1f6fa7dd78d4598d1e4c047fcc3fb9 # skip: [968b80332432172dbbb773e749a43bdc846d1a13] Merge tag 'powerpc-6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux git bisect skip 968b80332432172dbbb773e749a43bdc846d1a13 # good: [372a34e66fb7f95124fadae9c600b231c35696a7] fs: replace f_rcuhead with f_task_work git bisect good 372a34e66fb7f95124fadae9c600b231c35696a7 # good: [ae24db43b3b427eb290b58d55179c32f0a7539d1] powerpc/ftrace: Remove nops after the call to ftrace_stub git bisect good ae24db43b3b427eb290b58d55179c32f0a7539d1 # skip: [3cf1d6a5fbf3f724d12b01635319924239d42c00] Merge tag 'm68k-for-v6.8-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k git bisect skip 3cf1d6a5fbf3f724d12b01635319924239d42c00 # good: [2a33e2ddc6ebf9b5468091aded8a38f57de9a580] splice: remove permission hook from do_splice_direct() git bisect good 2a33e2ddc6ebf9b5468091aded8a38f57de9a580 # good: [9b7e9e2f5d5c3d079ec46bc71b114012e362ea6e] fs: factor out backing_file_splice_{read,write}() helpers git bisect good 9b7e9e2f5d5c3d079ec46bc71b114012e362ea6e # good: [12c1b632d970c0138b4c5c65a1065e7d0604d272] fs: reformat idmapped mounts entry git bisect good 12c1b632d970c0138b4c5c65a1065e7d0604d272 # good: [d23627a7688fabff0096a7beaff1a93de76afaad] EDAC/igen6: Add Intel Raptor Lake-P SoCs support git bisect good d23627a7688fabff0096a7beaff1a93de76afaad # skip: [ab5f3fcb7c72094684760e0cd8954d8d570b5e83] Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux git bisect skip ab5f3fcb7c72094684760e0cd8954d8d570b5e83 # good: [eb183b2cd0a6549992eca3c4ada0b1bc1d9340f5] Revert "perf/arm_dmc620: Remove duplicate format attribute #defines" git bisect good eb183b2cd0a6549992eca3c4ada0b1bc1d9340f5 # good: [1ab33c03145d0f6c345823fc2da935d9a1a9e9fc] asm-generic: make sparse happy with odd-sized put_unaligned_*() git bisect good 1ab33c03145d0f6c345823fc2da935d9a1a9e9fc # good: [794c68b20408bb6899f90314e36e256924cc85a1] x86/CPU/AMD: Get rid of amd_erratum_1485[] git bisect good 794c68b20408bb6899f90314e36e256924cc85a1 # bad: [11137d384996bb05cf33c8163db271e1bac3f4bf] sched/fair: Simplify util_est git bisect bad 11137d384996bb05cf33c8163db271e1bac3f4bf # good: [9c0b4bb7f6303c9c4e2e34984c46f5a86478f84d] sched/cpufreq: Rework schedutil governor performance estimation git bisect good 9c0b4bb7f6303c9c4e2e34984c46f5a86478f84d # good: [599457ba15403037b489fe536266a3d5f9efaed7] cpufreq: Use the fixed and coherent frequency for scaling capacity git bisect good 599457ba15403037b489fe536266a3d5f9efaed7 # bad: [50b813b147e9eb6546a1fc49d4e703e6d23691f2] cpufreq/cppc: Move and rename cppc_cpufreq_{perf_to_khz|khz_to_perf}() git bisect bad 50b813b147e9eb6546a1fc49d4e703e6d23691f2 # bad: [15cbbd1d317e07b4e5c6aca5d4c5579539a82784] energy_model: Use a fixed reference frequency git bisect bad 15cbbd1d317e07b4e5c6aca5d4c5579539a82784 # bad: [b3edde44e5d4504c23a176819865cd603fd16d6c] cpufreq/schedutil: Use a fixed reference frequency git bisect bad b3edde44e5d4504c23a176819865cd603fd16d6c # first bad commit: [b3edde44e5d4504c23a176819865cd603fd16d6c] cpufreq/schedutil: Use a fixed reference frequency
Hi Mark, Could you tried this fix: https://lore.kernel.org/lkml/20240117190545.596057-1-vincent.guittot@linaro.org/ Regards, Vincent On Tue, 23 Jan 2024 at 02:55, Mark Brown <broonie@kernel.org> wrote: > > On Mon, Dec 11, 2023 at 11:48:51AM +0100, Vincent Guittot wrote: > > cpuinfo.max_freq can change at runtime because of boost as an example. This > > implies that the value could be different than the one that has been > > used when computing the capacity of a CPU. > > So, this seems very weird but I just finished a bisection of a failure > that showed up during the merge window with the DT kselftests on > meson-gxl-s905x-libretech-cc (Libretech Le Potato) running an arm64 > defconfig. Looking at the commit this seems like a very surprising > false result but I verified that the commit before this one runs fine > and reproduced the failure, the bisect does seem to converge well also > so it appears like there's something going on. Does this ring any bells > for anyone? > > The test is timed out by the kselftest wrapper: > > kselftest: Running tests in dt > TAP version 13 > 1..1 > # timeout set to 45 > # selftests: dt: test_unprobed_devices.sh > # TAP version 13 > # > not ok 1 selftests: dt: test_unprobed_devices.sh # TIMEOUT 45 seconds > > with no obvious effect on the system, a successful run takes less than > 20 seconds so it shouldn't be that the timing is marginal. I'm guessing > that reading the one of the files under /sys/devices hung. No other > boards in my lab are affected and I'm not immediately seeing anything > remarkable about this board. Full log from a bad run at: > > https://lava.sirena.org.uk/scheduler/job/493329 > > and a good run: > > https://lava.sirena.org.uk/scheduler/job/492453 > > Bisect log (the skipped commits didn't build): > > git bisect start > # status: waiting for both good and bad commits > # bad: [4fbbed7872677b0a28ba8237169968171a61efbd] Merge tag 'timers-core-2024-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip > git bisect bad 4fbbed7872677b0a28ba8237169968171a61efbd > # status: waiting for good commit(s), bad commit known > # good: [0dd3ee31125508cd67f7e7172247f05b7fd1753a] Linux 6.7 > git bisect good 0dd3ee31125508cd67f7e7172247f05b7fd1753a > # bad: [ba5afb9a84df2e6b26a1b6389b98849cd16ea757] fs: rework listmount() implementation > git bisect bad ba5afb9a84df2e6b26a1b6389b98849cd16ea757 > # bad: [de927f6c0b07d9e698416c5b287c521b07694cac] Merge tag 's390-6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux > git bisect bad de927f6c0b07d9e698416c5b287c521b07694cac > # bad: [35f11a3710cdcbbc5090d14017a6295454e0cc73] Merge tag 'mtd/for-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux > git bisect bad 35f11a3710cdcbbc5090d14017a6295454e0cc73 > # bad: [d30e51aa7b1f6fa7dd78d4598d1e4c047fcc3fb9] Merge tag 'slab-for-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab > git bisect bad d30e51aa7b1f6fa7dd78d4598d1e4c047fcc3fb9 > # skip: [968b80332432172dbbb773e749a43bdc846d1a13] Merge tag 'powerpc-6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux > git bisect skip 968b80332432172dbbb773e749a43bdc846d1a13 > # good: [372a34e66fb7f95124fadae9c600b231c35696a7] fs: replace f_rcuhead with f_task_work > git bisect good 372a34e66fb7f95124fadae9c600b231c35696a7 > # good: [ae24db43b3b427eb290b58d55179c32f0a7539d1] powerpc/ftrace: Remove nops after the call to ftrace_stub > git bisect good ae24db43b3b427eb290b58d55179c32f0a7539d1 > # skip: [3cf1d6a5fbf3f724d12b01635319924239d42c00] Merge tag 'm68k-for-v6.8-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k > git bisect skip 3cf1d6a5fbf3f724d12b01635319924239d42c00 > # good: [2a33e2ddc6ebf9b5468091aded8a38f57de9a580] splice: remove permission hook from do_splice_direct() > git bisect good 2a33e2ddc6ebf9b5468091aded8a38f57de9a580 > # good: [9b7e9e2f5d5c3d079ec46bc71b114012e362ea6e] fs: factor out backing_file_splice_{read,write}() helpers > git bisect good 9b7e9e2f5d5c3d079ec46bc71b114012e362ea6e > # good: [12c1b632d970c0138b4c5c65a1065e7d0604d272] fs: reformat idmapped mounts entry > git bisect good 12c1b632d970c0138b4c5c65a1065e7d0604d272 > # good: [d23627a7688fabff0096a7beaff1a93de76afaad] EDAC/igen6: Add Intel Raptor Lake-P SoCs support > git bisect good d23627a7688fabff0096a7beaff1a93de76afaad > # skip: [ab5f3fcb7c72094684760e0cd8954d8d570b5e83] Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux > git bisect skip ab5f3fcb7c72094684760e0cd8954d8d570b5e83 > # good: [eb183b2cd0a6549992eca3c4ada0b1bc1d9340f5] Revert "perf/arm_dmc620: Remove duplicate format attribute #defines" > git bisect good eb183b2cd0a6549992eca3c4ada0b1bc1d9340f5 > # good: [1ab33c03145d0f6c345823fc2da935d9a1a9e9fc] asm-generic: make sparse happy with odd-sized put_unaligned_*() > git bisect good 1ab33c03145d0f6c345823fc2da935d9a1a9e9fc > # good: [794c68b20408bb6899f90314e36e256924cc85a1] x86/CPU/AMD: Get rid of amd_erratum_1485[] > git bisect good 794c68b20408bb6899f90314e36e256924cc85a1 > # bad: [11137d384996bb05cf33c8163db271e1bac3f4bf] sched/fair: Simplify util_est > git bisect bad 11137d384996bb05cf33c8163db271e1bac3f4bf > # good: [9c0b4bb7f6303c9c4e2e34984c46f5a86478f84d] sched/cpufreq: Rework schedutil governor performance estimation > git bisect good 9c0b4bb7f6303c9c4e2e34984c46f5a86478f84d > # good: [599457ba15403037b489fe536266a3d5f9efaed7] cpufreq: Use the fixed and coherent frequency for scaling capacity > git bisect good 599457ba15403037b489fe536266a3d5f9efaed7 > # bad: [50b813b147e9eb6546a1fc49d4e703e6d23691f2] cpufreq/cppc: Move and rename cppc_cpufreq_{perf_to_khz|khz_to_perf}() > git bisect bad 50b813b147e9eb6546a1fc49d4e703e6d23691f2 > # bad: [15cbbd1d317e07b4e5c6aca5d4c5579539a82784] energy_model: Use a fixed reference frequency > git bisect bad 15cbbd1d317e07b4e5c6aca5d4c5579539a82784 > # bad: [b3edde44e5d4504c23a176819865cd603fd16d6c] cpufreq/schedutil: Use a fixed reference frequency > git bisect bad b3edde44e5d4504c23a176819865cd603fd16d6c > # first bad commit: [b3edde44e5d4504c23a176819865cd603fd16d6c] cpufreq/schedutil: Use a fixed reference frequency
On Tue, Jan 23, 2024 at 08:24:00AM +0100, Vincent Guittot wrote: > Could you tried this fix: > https://lore.kernel.org/lkml/20240117190545.596057-1-vincent.guittot@linaro.org/ That seems to fix the issue, thanks.
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index 4ee8ad70be99..95c3c097083e 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -114,6 +114,28 @@ static void sugov_deferred_update(struct sugov_policy *sg_policy) } } +/** + * get_capacity_ref_freq - get the reference frequency that has been used to + * correlate frequency and compute capacity for a given cpufreq policy. We use + * the CPU managing it for the arch_scale_freq_ref() call in the function. + * @policy: the cpufreq policy of the CPU in question. + * + * Return: the reference CPU frequency to compute a capacity. + */ +static __always_inline +unsigned long get_capacity_ref_freq(struct cpufreq_policy *policy) +{ + unsigned int freq = arch_scale_freq_ref(policy->cpu); + + if (freq) + return freq; + + if (arch_scale_freq_invariant()) + return policy->cpuinfo.max_freq; + + return policy->cur; +} + /** * get_next_freq - Compute a new frequency for a given cpufreq policy. * @sg_policy: schedutil policy object to compute the new frequency for. @@ -140,9 +162,9 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy, unsigned long util, unsigned long max) { struct cpufreq_policy *policy = sg_policy->policy; - unsigned int freq = arch_scale_freq_invariant() ? - policy->cpuinfo.max_freq : policy->cur; + unsigned int freq; + freq = get_capacity_ref_freq(policy); freq = map_util_freq(util, freq, max); if (freq == sg_policy->cached_raw_freq && !sg_policy->need_freq_update)