Message ID | 20240806174647.1050398-1-longman@redhat.com |
---|---|
State | Accepted |
Commit | 6d45e1c948a8b7ed6ceddb14319af69424db730c |
Headers | show |
Series | padata: Fix possible divide-by-0 panic in padata_mt_helper() | expand |
Waiman Long <longman@redhat.com> wrote: > > diff --git a/kernel/padata.c b/kernel/padata.c > index 53f4bc912712..0fa6c2895460 100644 > --- a/kernel/padata.c > +++ b/kernel/padata.c > @@ -517,6 +517,13 @@ void __init padata_do_multithreaded(struct padata_mt_job *job) > ps.chunk_size = max(ps.chunk_size, job->min_chunk); > ps.chunk_size = roundup(ps.chunk_size, job->align); > > + /* > + * chunk_size can be 0 if the caller sets min_chunk to 0. So force it > + * to at least 1 to prevent divide-by-0 panic in padata_mt_helper().` > + */ > + if (!ps.chunk_size) > + ps.chunk_size = 1U; Perhaps change the first ps.chunk_size assignment to use DIV_ROUND_UP instead? Thanks,
On Sat, Aug 10, 2024 at 09:30:25PM -0400, Waiman Long wrote: > > On 8/10/24 00:05, Herbert Xu wrote: > > Waiman Long <longman@redhat.com> wrote: > > > diff --git a/kernel/padata.c b/kernel/padata.c > > > index 53f4bc912712..0fa6c2895460 100644 > > > --- a/kernel/padata.c > > > +++ b/kernel/padata.c > > > @@ -517,6 +517,13 @@ void __init padata_do_multithreaded(struct padata_mt_job *job) > > > ps.chunk_size = max(ps.chunk_size, job->min_chunk); > > > ps.chunk_size = roundup(ps.chunk_size, job->align); > > > > > > + /* > > > + * chunk_size can be 0 if the caller sets min_chunk to 0. So force it > > > + * to at least 1 to prevent divide-by-0 panic in padata_mt_helper().` > > > + */ > > > + if (!ps.chunk_size) > > > + ps.chunk_size = 1U; > > Perhaps change the first ps.chunk_size assignment to use DIV_ROUND_UP > > instead? > > I think DIV_ROUND_UP() will exactly the same problem that if chunk_size is > 0, you still got a 0 result. round_up() only if the 2nd argument is a power > of 2 while with DIV_ROUND_UP(), the second argument can be any number except > 0. Unless I'm missing something chunk_size cannot be zero before the division because that's the first thing we check upon entry into this function. Cheers,
On Sat, Aug 10, 2024 at 11:11:07PM -0400, Waiman Long wrote: > > > Unless I'm missing something chunk_size cannot be zero before the > > division because that's the first thing we check upon entry into > > this function. > > chunk_size is initialized as > > ps.chunk_size = job->size / (ps.nworks * load_balance_factor); > > chunk_size will be 0 if job->size < (ps.nworks * load_balance_factor). If > min_chunk is 0, chunk_size will remain 0. That's why I was suggesting that you replace the division by DIV_ROUND_UP. That should ensure that ps.chunk_size is not zero. Cheers,
On 8/10/24 13:44, Kamlesh Gurudasani wrote: > Waiman Long <longman@redhat.com> writes: > >> We are hit with a not easily reproducible divide-by-0 panic in padata.c >> at bootup time. >> >> [ 10.017908] Oops: divide error: 0000 1 PREEMPT SMP NOPTI >> [ 10.017908] CPU: 26 PID: 2627 Comm: kworker/u1666:1 Not tainted 6.10.0-15.el10.x86_64 #1 >> [ 10.017908] Hardware name: Lenovo ThinkSystem SR950 [7X12CTO1WW]/[7X12CTO1WW], BIOS [PSE140J-2.30] 07/20/2021 >> [ 10.017908] Workqueue: events_unbound padata_mt_helper >> [ 10.017908] RIP: 0010:padata_mt_helper+0x39/0xb0 >> : >> [ 10.017963] Call Trace: >> [ 10.017968] <TASK> >> [ 10.018004] ? padata_mt_helper+0x39/0xb0 >> [ 10.018084] process_one_work+0x174/0x330 >> [ 10.018093] worker_thread+0x266/0x3a0 >> [ 10.018111] kthread+0xcf/0x100 >> [ 10.018124] ret_from_fork+0x31/0x50 >> [ 10.018138] ret_from_fork_asm+0x1a/0x30 >> [ 10.018147] </TASK> >> >> Looking at the padata_mt_helper() function, the only way a divide-by-0 >> panic can happen is when ps->chunk_size is 0. The way that chunk_size is >> initialized in padata_do_multithreaded(), chunk_size can be 0 when the >> min_chunk in the passed-in padata_mt_job structure is 0. >> >> Fix this divide-by-0 panic by making sure that chunk_size will be at >> least 1 no matter what the input parameters are. >> >> Fixes: 004ed42638f4 ("padata: add basic support for multithreaded jobs") >> Signed-off-by: Waiman Long <longman@redhat.com> >> --- >> kernel/padata.c | 7 +++++++ >> 1 file changed, 7 insertions(+) >> >> diff --git a/kernel/padata.c b/kernel/padata.c >> index 53f4bc912712..0fa6c2895460 100644 >> --- a/kernel/padata.c >> +++ b/kernel/padata.c >> @@ -517,6 +517,13 @@ void __init padata_do_multithreaded(struct padata_mt_job *job) >> ps.chunk_size = max(ps.chunk_size, job->min_chunk); >> ps.chunk_size = roundup(ps.chunk_size, job->align); >> >> + /* >> + * chunk_size can be 0 if the caller sets min_chunk to 0. So force it >> + * to at least 1 to prevent divide-by-0 panic in padata_mt_helper().` >> + */ > Thanks for the patch and detailed comment. >> + if (!ps.chunk_size) >> + ps.chunk_size = 1U; >> + > could it be > ps.chunk_size = max(ps.chunk_size, 1U); > > or can be merged with earlier max() > ps.chunk_size = max(ps.chunk_size, max(job->min_chunk, 1U)); > ps.chunk_size = roundup(ps.chunk_size, job->align); > > sits well with how entire file is written and compiler is optimizing > them to same level. I had actually thought about doing that as an alternative. I used the current patch to avoid putting too many max() calls there. I can go this route if you guys prefer this. Cheers, Longman
Waiman Long <longman@redhat.com> writes: > On 8/10/24 13:44, Kamlesh Gurudasani wrote: >> Waiman Long <longman@redhat.com> writes: >> ... >>> diff --git a/kernel/padata.c b/kernel/padata.c >>> index 53f4bc912712..0fa6c2895460 100644 >>> --- a/kernel/padata.c >>> +++ b/kernel/padata.c >>> @@ -517,6 +517,13 @@ void __init padata_do_multithreaded(struct padata_mt_job *job) >>> ps.chunk_size = max(ps.chunk_size, job->min_chunk); >>> ps.chunk_size = roundup(ps.chunk_size, job->align); >>> >>> + /* >>> + * chunk_size can be 0 if the caller sets min_chunk to 0. So force it >>> + * to at least 1 to prevent divide-by-0 panic in padata_mt_helper().` >>> + */ >> Thanks for the patch and detailed comment. >>> + if (!ps.chunk_size) >>> + ps.chunk_size = 1U; >>> + >> could it be >> ps.chunk_size = max(ps.chunk_size, 1U); >> >> or can be merged with earlier max() >> ps.chunk_size = max(ps.chunk_size, max(job->min_chunk, 1U)); >> ps.chunk_size = roundup(ps.chunk_size, job->align); >> >> sits well with how entire file is written and compiler is optimizing >> them to same level. > > I had actually thought about doing that as an alternative. I used the > current patch to avoid putting too many max() calls there. I can go this > route if you guys prefer this. Just curious, what is your reason for avoiding too many max() calls? Both if (!ps.chunk_size) ps.chunk_size = 1U; and ps.chunk_size = max(ps.chunk_size, 1U); are having same number of instructions [1]. [1] https://godbolt.org/z/ajrK59c67 We can avoid nested max(), though following would make it easier to understand. ps.chunk_size = max(ps.chunk_size, 1U); Cheers, Kamlesh > > Cheers, > Longman
On 8/11/24 01:44, Kamlesh Gurudasani wrote: > Waiman Long <longman@redhat.com> writes: > >> On 8/10/24 13:44, Kamlesh Gurudasani wrote: >>> Waiman Long <longman@redhat.com> writes: >>> > ... >>>> diff --git a/kernel/padata.c b/kernel/padata.c >>>> index 53f4bc912712..0fa6c2895460 100644 >>>> --- a/kernel/padata.c >>>> +++ b/kernel/padata.c >>>> @@ -517,6 +517,13 @@ void __init padata_do_multithreaded(struct padata_mt_job *job) >>>> ps.chunk_size = max(ps.chunk_size, job->min_chunk); >>>> ps.chunk_size = roundup(ps.chunk_size, job->align); >>>> >>>> + /* >>>> + * chunk_size can be 0 if the caller sets min_chunk to 0. So force it >>>> + * to at least 1 to prevent divide-by-0 panic in padata_mt_helper().` >>>> + */ >>> Thanks for the patch and detailed comment. >>>> + if (!ps.chunk_size) >>>> + ps.chunk_size = 1U; >>>> + >>> could it be >>> ps.chunk_size = max(ps.chunk_size, 1U); >>> >>> or can be merged with earlier max() >>> ps.chunk_size = max(ps.chunk_size, max(job->min_chunk, 1U)); >>> ps.chunk_size = roundup(ps.chunk_size, job->align); >>> >>> sits well with how entire file is written and compiler is optimizing >>> them to same level. >> I had actually thought about doing that as an alternative. I used the >> current patch to avoid putting too many max() calls there. I can go this >> route if you guys prefer this. > Just curious, what is your reason for avoiding too many max() calls? Both > if (!ps.chunk_size) > ps.chunk_size = 1U; > and > ps.chunk_size = max(ps.chunk_size, 1U); > > are having same number of instructions [1]. > > [1] https://godbolt.org/z/ajrK59c67 > > We can avoid nested max(), though following would make it easier to understand. > > ps.chunk_size = max(ps.chunk_size, 1U); That will certainly work. My current patch has been merged into the Linus tree. You are welcome to post another patch to clean it up if you want. Cheers, Longman > > Cheers, > Kamlesh > >> Cheers, >> Longman
On Mon, Aug 12, 2024 at 10:04:07AM -0400, Waiman Long wrote: > > Anyway, using DIV_ROUND_UP() is a slight change in behavior as chunk_size > will be increased by 1 in most cases. I am a bit hesitant to make this > change without looking into more detail about the rationale behind the > current code. I don't think it matters much. Just look at the two lines after the division, they're both rounding the value up. So clearly this is expected to handle the case where work gets bunched up into the first N CPUs, potentially leaving some CPUs unused. But Daniel wrote the code so he can have the last say of whether we should round up after the division or after the other two ops. Daniel? Cheers,
On Sat, Aug 17, 2024 at 03:12:37PM GMT, Herbert Xu wrote: > On Mon, Aug 12, 2024 at 10:04:07AM -0400, Waiman Long wrote: > > > > Anyway, using DIV_ROUND_UP() is a slight change in behavior as chunk_size > > will be increased by 1 in most cases. I am a bit hesitant to make this > > change without looking into more detail about the rationale behind the > > current code. > > I don't think it matters much. Just look at the two lines after > the division, they're both rounding the value up. So clearly this > is expected to handle the case where work gets bunched up into the > first N CPUs, potentially leaving some CPUs unused. Yeah, the caller is supposed to use min_chunk as a hint for what a reasonable amount of work is per thread and so avoid wasteful amounts of threads. > But Daniel wrote the code so he can have the last say of whether > we should round up after the division or after the other two ops. I think either way works fine with the three existing users and how they choose job->min_chunk and job->size. The DIV_ROUND_UP approach reads a bit nicer to me, but I can imagine oddball cases where rounding up is undesirable (say, near-zero values for size, min_chunk, and align; padata_work_alloc_mt returns many fewer works than requested; and a single unit of work is very expensive) so that rounding up makes a bigger difference. So, the way it now is seems ok. By the way, this bug must've happened coming from hugetlb_pages_alloc_boot(), right, Waiman? Because the other padata users have hardcoded min_chunk. I guess it was a case of h->max_huge_pages < num_node_state(N_MEMORY) * 2
On 8/19/24 18:29, Daniel Jordan wrote: > On Sat, Aug 17, 2024 at 03:12:37PM GMT, Herbert Xu wrote: >> On Mon, Aug 12, 2024 at 10:04:07AM -0400, Waiman Long wrote: >>> Anyway, using DIV_ROUND_UP() is a slight change in behavior as chunk_size >>> will be increased by 1 in most cases. I am a bit hesitant to make this >>> change without looking into more detail about the rationale behind the >>> current code. >> I don't think it matters much. Just look at the two lines after >> the division, they're both rounding the value up. So clearly this >> is expected to handle the case where work gets bunched up into the >> first N CPUs, potentially leaving some CPUs unused. > Yeah, the caller is supposed to use min_chunk as a hint for what a > reasonable amount of work is per thread and so avoid wasteful amounts of > threads. > >> But Daniel wrote the code so he can have the last say of whether >> we should round up after the division or after the other two ops. > I think either way works fine with the three existing users and how they > choose job->min_chunk and job->size. > > The DIV_ROUND_UP approach reads a bit nicer to me, but I can imagine > oddball cases where rounding up is undesirable (say, near-zero values > for size, min_chunk, and align; padata_work_alloc_mt returns many fewer > works than requested; and a single unit of work is very expensive) so > that rounding up makes a bigger difference. So, the way it now is seems > ok. > > > By the way, this bug must've happened coming from > hugetlb_pages_alloc_boot(), right, Waiman? Because the other padata > users have hardcoded min_chunk. I guess it was a case of > > h->max_huge_pages < num_node_state(N_MEMORY) * 2 > Yes, I guess the hugetlbfs caller is the cause of this div-by-0 problem. This is likely a bug that needs to be fixed. The current patch does guarantee that padata won't crash like that even with rogue caller. Cheers, Longman
On Mon, Aug 19, 2024 at 06:29:52PM -0400, Daniel Jordan wrote: > > The DIV_ROUND_UP approach reads a bit nicer to me, but I can imagine > oddball cases where rounding up is undesirable (say, near-zero values > for size, min_chunk, and align; padata_work_alloc_mt returns many fewer > works than requested; and a single unit of work is very expensive) so > that rounding up makes a bigger difference. So, the way it now is seems > ok. In that case let's do the max ahead of the align check: ps.chunk_size = max(ps.chunk_size, 1ul); ps.chunk_size = roundup(ps.chunk_size, job->align); If we do it after then it may come out unaligned (e.g., job->align = 8 and ps.chunk_size = 1). Cheers,
On Tue, Aug 20, 2024 at 12:06:47PM GMT, Herbert Xu wrote: > On Mon, Aug 19, 2024 at 06:29:52PM -0400, Daniel Jordan wrote: > > > > The DIV_ROUND_UP approach reads a bit nicer to me, but I can imagine > > oddball cases where rounding up is undesirable (say, near-zero values > > for size, min_chunk, and align; padata_work_alloc_mt returns many fewer > > works than requested; and a single unit of work is very expensive) so > > that rounding up makes a bigger difference. So, the way it now is seems > > ok. > > In that case let's do the max ahead of the align check: > > ps.chunk_size = max(ps.chunk_size, 1ul); > ps.chunk_size = roundup(ps.chunk_size, job->align); > > If we do it after then it may come out unaligned (e.g., job->align = 8 > and ps.chunk_size = 1). Sure, I think Kamlesh was the first to suggest max, so maybe Kamlesh would like to make the change. I'll send a patch otherwise.
Daniel Jordan <daniel.m.jordan@oracle.com> writes: > On Tue, Aug 20, 2024 at 12:06:47PM GMT, Herbert Xu wrote: >> On Mon, Aug 19, 2024 at 06:29:52PM -0400, Daniel Jordan wrote: >> > >> > The DIV_ROUND_UP approach reads a bit nicer to me, but I can imagine >> > oddball cases where rounding up is undesirable (say, near-zero values >> > for size, min_chunk, and align; padata_work_alloc_mt returns many fewer >> > works than requested; and a single unit of work is very expensive) so >> > that rounding up makes a bigger difference. So, the way it now is seems >> > ok. >> >> In that case let's do the max ahead of the align check: >> >> ps.chunk_size = max(ps.chunk_size, 1ul); >> ps.chunk_size = roundup(ps.chunk_size, job->align); >> >> If we do it after then it may come out unaligned (e.g., job->align = 8 >> and ps.chunk_size = 1). > > Sure, I think Kamlesh was the first to suggest max, so maybe Kamlesh > would like to make the change. I'll send a patch otherwise. Thanks for consideration, Daniel. I'll send a patch. cheers, Kamlesh
Kamlesh Gurudasani <kamlesh@ti.com> writes: > Daniel Jordan <daniel.m.jordan@oracle.com> writes: > >> On Tue, Aug 20, 2024 at 12:06:47PM GMT, Herbert Xu wrote: >>> On Mon, Aug 19, 2024 at 06:29:52PM -0400, Daniel Jordan wrote: >>> > >>> > The DIV_ROUND_UP approach reads a bit nicer to me, but I can imagine >>> > oddball cases where rounding up is undesirable (say, near-zero values >>> > for size, min_chunk, and align; padata_work_alloc_mt returns many fewer >>> > works than requested; and a single unit of work is very expensive) so >>> > that rounding up makes a bigger difference. So, the way it now is seems >>> > ok. >>> >>> In that case let's do the max ahead of the align check: >>> >>> ps.chunk_size = max(ps.chunk_size, 1ul); >>> ps.chunk_size = roundup(ps.chunk_size, job->align); >>> >>> If we do it after then it may come out unaligned (e.g., job->align = 8 >>> and ps.chunk_size = 1). >> >> Sure, I think Kamlesh was the first to suggest max, so maybe Kamlesh >> would like to make the change. I'll send a patch otherwise. > Thanks for consideration, Daniel. I'll send a patch. Sent. Just curious about one thing on line 495, nworks = max(job->size / max(job->min_chunk, job->align), 1ul); what happens if both min_chunk and align are 0. cheers, Kamlesh
On Thu, Aug 22, 2024 at 02:40:57AM GMT, Kamlesh Gurudasani wrote: > Kamlesh Gurudasani <kamlesh@ti.com> writes: > > > Daniel Jordan <daniel.m.jordan@oracle.com> writes: > > > >> On Tue, Aug 20, 2024 at 12:06:47PM GMT, Herbert Xu wrote: > >>> On Mon, Aug 19, 2024 at 06:29:52PM -0400, Daniel Jordan wrote: > >>> > > >>> > The DIV_ROUND_UP approach reads a bit nicer to me, but I can imagine > >>> > oddball cases where rounding up is undesirable (say, near-zero values > >>> > for size, min_chunk, and align; padata_work_alloc_mt returns many fewer > >>> > works than requested; and a single unit of work is very expensive) so > >>> > that rounding up makes a bigger difference. So, the way it now is seems > >>> > ok. > >>> > >>> In that case let's do the max ahead of the align check: > >>> > >>> ps.chunk_size = max(ps.chunk_size, 1ul); > >>> ps.chunk_size = roundup(ps.chunk_size, job->align); > >>> > >>> If we do it after then it may come out unaligned (e.g., job->align = 8 > >>> and ps.chunk_size = 1). > >> > >> Sure, I think Kamlesh was the first to suggest max, so maybe Kamlesh > >> would like to make the change. I'll send a patch otherwise. > > Thanks for consideration, Daniel. I'll send a patch. > Sent. > > Just curious about one thing on line 495, > > nworks = max(job->size / max(job->min_chunk, job->align), 1ul); > > what happens if both min_chunk and align are 0. That's a fair point. It's another of those things that's not supposed to happen, but it's worth making padata robust to it.
diff --git a/kernel/padata.c b/kernel/padata.c index 53f4bc912712..0fa6c2895460 100644 --- a/kernel/padata.c +++ b/kernel/padata.c @@ -517,6 +517,13 @@ void __init padata_do_multithreaded(struct padata_mt_job *job) ps.chunk_size = max(ps.chunk_size, job->min_chunk); ps.chunk_size = roundup(ps.chunk_size, job->align); + /* + * chunk_size can be 0 if the caller sets min_chunk to 0. So force it + * to at least 1 to prevent divide-by-0 panic in padata_mt_helper().` + */ + if (!ps.chunk_size) + ps.chunk_size = 1U; + list_for_each_entry(pw, &works, pw_list) if (job->numa_aware) { int old_node = atomic_read(&last_used_nid);
We are hit with a not easily reproducible divide-by-0 panic in padata.c at bootup time. [ 10.017908] Oops: divide error: 0000 1 PREEMPT SMP NOPTI [ 10.017908] CPU: 26 PID: 2627 Comm: kworker/u1666:1 Not tainted 6.10.0-15.el10.x86_64 #1 [ 10.017908] Hardware name: Lenovo ThinkSystem SR950 [7X12CTO1WW]/[7X12CTO1WW], BIOS [PSE140J-2.30] 07/20/2021 [ 10.017908] Workqueue: events_unbound padata_mt_helper [ 10.017908] RIP: 0010:padata_mt_helper+0x39/0xb0 : [ 10.017963] Call Trace: [ 10.017968] <TASK> [ 10.018004] ? padata_mt_helper+0x39/0xb0 [ 10.018084] process_one_work+0x174/0x330 [ 10.018093] worker_thread+0x266/0x3a0 [ 10.018111] kthread+0xcf/0x100 [ 10.018124] ret_from_fork+0x31/0x50 [ 10.018138] ret_from_fork_asm+0x1a/0x30 [ 10.018147] </TASK> Looking at the padata_mt_helper() function, the only way a divide-by-0 panic can happen is when ps->chunk_size is 0. The way that chunk_size is initialized in padata_do_multithreaded(), chunk_size can be 0 when the min_chunk in the passed-in padata_mt_job structure is 0. Fix this divide-by-0 panic by making sure that chunk_size will be at least 1 no matter what the input parameters are. Fixes: 004ed42638f4 ("padata: add basic support for multithreaded jobs") Signed-off-by: Waiman Long <longman@redhat.com> --- kernel/padata.c | 7 +++++++ 1 file changed, 7 insertions(+)