diff mbox series

[V3] firmware: google: Test spinlock on panic path to avoid lockups

Message ID 20220819155059.451674-1-gpiccoli@igalia.com
State New
Headers show
Series [V3] firmware: google: Test spinlock on panic path to avoid lockups | expand

Commit Message

Guilherme G. Piccoli Aug. 19, 2022, 3:50 p.m. UTC
Currently the gsmi driver registers a panic notifier as well as
reboot and die notifiers. The callbacks registered are called in
atomic and very limited context - for instance, panic disables
preemption and local IRQs, also all secondary CPUs (not executing
the panic path) are shutdown.

With that said, taking a spinlock in this scenario is a dangerous
invitation for lockup scenarios. So, fix that by checking if the
spinlock is free to acquire in the panic notifier callback - if not,
bail-out and avoid a potential hang.

Fixes: 74c5b31c6618 ("driver: Google EFI SMI")
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: David Gow <davidgow@google.com>
Cc: Julius Werner <jwerner@chromium.org>
Reviewed-by: Evan Green <evgreen@chromium.org>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
---


This is a re-submission of the patch - it was in a series [0], but
Greg suggested me to resubmit individually in order it gets picked
by the relevant maintainers, instead of asking them to merge
individual patches from a series. Notice I've trimmed a bit the CC
list, it was bigger due to the patch being in a series...

This is truly the V3 of the patch, below is the diff between versions:

V3:
- added Evan's review tag - thanks!

V2:
- do not use spin_trylock anymore, to avoid messing with
non-panic paths; now we just check the spinlock state in
the panic notifier before taking it. Thanks Evan for the review!

[0] https://lore.kernel.org/lkml/20220719195325.402745-4-gpiccoli@igalia.com/


 drivers/firmware/google/gsmi.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Greg KH Sept. 1, 2022, 3:52 p.m. UTC | #1
On Fri, Aug 19, 2022 at 12:50:59PM -0300, Guilherme G. Piccoli wrote:
> Currently the gsmi driver registers a panic notifier as well as
> reboot and die notifiers. The callbacks registered are called in
> atomic and very limited context - for instance, panic disables
> preemption and local IRQs, also all secondary CPUs (not executing
> the panic path) are shutdown.
> 
> With that said, taking a spinlock in this scenario is a dangerous
> invitation for lockup scenarios. So, fix that by checking if the
> spinlock is free to acquire in the panic notifier callback - if not,
> bail-out and avoid a potential hang.
> 
> Fixes: 74c5b31c6618 ("driver: Google EFI SMI")
> Cc: Ard Biesheuvel <ardb@kernel.org>
> Cc: David Gow <davidgow@google.com>
> Cc: Julius Werner <jwerner@chromium.org>
> Reviewed-by: Evan Green <evgreen@chromium.org>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
> ---
> 
> 
> This is a re-submission of the patch - it was in a series [0], but
> Greg suggested me to resubmit individually in order it gets picked
> by the relevant maintainers, instead of asking them to merge
> individual patches from a series. Notice I've trimmed a bit the CC
> list, it was bigger due to the patch being in a series...
> 
> This is truly the V3 of the patch, below is the diff between versions:
> 
> V3:
> - added Evan's review tag - thanks!
> 
> V2:
> - do not use spin_trylock anymore, to avoid messing with
> non-panic paths; now we just check the spinlock state in
> the panic notifier before taking it. Thanks Evan for the review!
> 
> [0] https://lore.kernel.org/lkml/20220719195325.402745-4-gpiccoli@igalia.com/
> 
> 
>  drivers/firmware/google/gsmi.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/firmware/google/gsmi.c b/drivers/firmware/google/gsmi.c
> index adaa492c3d2d..3ef5f3c0b4e4 100644
> --- a/drivers/firmware/google/gsmi.c
> +++ b/drivers/firmware/google/gsmi.c
> @@ -681,6 +681,14 @@ static struct notifier_block gsmi_die_notifier = {
>  static int gsmi_panic_callback(struct notifier_block *nb,
>  			       unsigned long reason, void *arg)
>  {
> +	/*
> +	 * Perform the lock check before effectively trying
> +	 * to acquire it on gsmi_shutdown_reason() to avoid
> +	 * potential lockups in atomic context.
> +	 */
> +	if (spin_is_locked(&gsmi_dev.lock))
> +		return NOTIFY_DONE;
> +

What happens if the lock is grabbed right after testing for it?
Shouldn't you use lockdep_assert_held() instead as the documentation
says to?


>  	gsmi_shutdown_reason(GSMI_SHUTDOWN_PANIC);

You are grabbing the lock way in this call, again, you have a window
where the check above would not have worked :(

I don't think this is fixing anything properly, sorry.

greg k-h
Guilherme G. Piccoli Sept. 1, 2022, 3:59 p.m. UTC | #2
+ Petr, since this was extensively discussed in the original thread [0]
and maybe he can help with the argument.

[0]
https://lore.kernel.org/lkml/20220427224924.592546-1-gpiccoli@igalia.com/


On 01/09/2022 12:52, Greg KH wrote:
> [...]

>> +	 * Perform the lock check before effectively trying
>> +	 * to acquire it on gsmi_shutdown_reason() to avoid
>> +	 * potential lockups in atomic context.
>> +	 */
>> +	if (spin_is_locked(&gsmi_dev.lock))
>> +		return NOTIFY_DONE;
>> +
> 
> What happens if the lock is grabbed right after testing for it?
> Shouldn't you use lockdep_assert_held() instead as the documentation
> says to?

How, if in this point only a single CPU (this one, executing the code)
is running?

Remember this is the panic path - before this point we disabled all
other CPUs, except this one executing the code. So, either the lock was
taken (and we bail), or it wasn't and it's safe to continue.
Greg KH Sept. 1, 2022, 4:04 p.m. UTC | #3
On Thu, Sep 01, 2022 at 12:59:36PM -0300, Guilherme G. Piccoli wrote:
> + Petr, since this was extensively discussed in the original thread [0]
> and maybe he can help with the argument.
> 
> [0]
> https://lore.kernel.org/lkml/20220427224924.592546-1-gpiccoli@igalia.com/
> 
> 
> On 01/09/2022 12:52, Greg KH wrote:
> > [...]
> 
> >> +	 * Perform the lock check before effectively trying
> >> +	 * to acquire it on gsmi_shutdown_reason() to avoid
> >> +	 * potential lockups in atomic context.
> >> +	 */
> >> +	if (spin_is_locked(&gsmi_dev.lock))
> >> +		return NOTIFY_DONE;
> >> +
> > 
> > What happens if the lock is grabbed right after testing for it?
> > Shouldn't you use lockdep_assert_held() instead as the documentation
> > says to?
> 
> How, if in this point only a single CPU (this one, executing the code)
> is running?

How are we supposed to know this here?

> Remember this is the panic path - before this point we disabled all
> other CPUs, except this one executing the code. So, either the lock was
> taken (and we bail), or it wasn't and it's safe to continue.

Then who else could have taken the lock?  And if all other CPUs are
stopped, who cares about the lock at all?  Just don't grab it (you
should check for that when you want to grab it) and then you can work
properly at that point in time.

thanks,

greg k-h
Guilherme G. Piccoli Sept. 1, 2022, 4:24 p.m. UTC | #4
On 01/09/2022 13:04, Greg KH wrote:
> [...]
>>> What happens if the lock is grabbed right after testing for it?
>>> Shouldn't you use lockdep_assert_held() instead as the documentation
>>> says to?
>>
>> How, if in this point only a single CPU (this one, executing the code)
>> is running?
> 
> How are we supposed to know this here?
> 

Reading the code?
Or you mean, in the commit description this should be mentioned?
I can do that, if you prefer.


>> other CPUs, except this one executing the code. So, either the lock was
>> taken (and we bail), or it wasn't and it's safe to continue.
> 
> Then who else could have taken the lock?  And if all other CPUs are
> stopped, who cares about the lock at all?  Just don't grab it (you
> should check for that when you want to grab it) and then you can work
> properly at that point in time.
> 

I don't think it is so simple - we are in the panic path.
So, imagine the lock was taken in CPU0, where GSMI is doing some
operation. During that operation, CPU1 panics!

When that happens, panic() executes in CPU1, disabling CPU0 through
"strong" mechanisms (NMI). So, CPU0 had the lock, it is now off, and
when CPU1 goes through the panic notifiers, it'll eventually wait
forever for this lock in the GSMI handler, unless we have this patch
that would prevent the handler to run in such case.
Makes sense?
Greg KH Sept. 1, 2022, 4:44 p.m. UTC | #5
On Thu, Sep 01, 2022 at 01:24:46PM -0300, Guilherme G. Piccoli wrote:
> On 01/09/2022 13:04, Greg KH wrote:
> > [...]
> >>> What happens if the lock is grabbed right after testing for it?
> >>> Shouldn't you use lockdep_assert_held() instead as the documentation
> >>> says to?
> >>
> >> How, if in this point only a single CPU (this one, executing the code)
> >> is running?
> > 
> > How are we supposed to know this here?
> > 
> 
> Reading the code?
> Or you mean, in the commit description this should be mentioned?

Yes, and in the comment as this type of call is very rare and should
almost never be used.

> >> other CPUs, except this one executing the code. So, either the lock was
> >> taken (and we bail), or it wasn't and it's safe to continue.
> > 
> > Then who else could have taken the lock?  And if all other CPUs are
> > stopped, who cares about the lock at all?  Just don't grab it (you
> > should check for that when you want to grab it) and then you can work
> > properly at that point in time.
> > 
> 
> I don't think it is so simple - we are in the panic path.

Great, then the lock doesn't matter :)

> So, imagine the lock was taken in CPU0, where GSMI is doing some
> operation. During that operation, CPU1 panics!
> 
> When that happens, panic() executes in CPU1, disabling CPU0 through
> "strong" mechanisms (NMI). So, CPU0 had the lock, it is now off, and
> when CPU1 goes through the panic notifiers, it'll eventually wait
> forever for this lock in the GSMI handler, unless we have this patch
> that would prevent the handler to run in such case.
> Makes sense?

I'm trying to say "if you are in panic, never grab the lock in the first
place".  So change the place when you grab the lock, not here.

thanks,

greg k-h
Guilherme G. Piccoli Sept. 1, 2022, 5:46 p.m. UTC | #6
On 01/09/2022 13:44, Greg KH wrote:
> [...]
>>> How are we supposed to know this here?
>>>
>>
>> Reading the code?
>> Or you mean, in the commit description this should be mentioned?
> 
> Yes, and in the comment as this type of call is very rare and should
> almost never be used.

OK, I can add that, for sure.


>> [...]
>> I don't think it is so simple - we are in the panic path.
> 
> Great, then the lock doesn't matter :)
> 
>> So, imagine the lock was taken in CPU0, where GSMI is doing some
>> operation. During that operation, CPU1 panics!
>>
>> When that happens, panic() executes in CPU1, disabling CPU0 through
>> "strong" mechanisms (NMI). So, CPU0 had the lock, it is now off, and
>> when CPU1 goes through the panic notifiers, it'll eventually wait
>> forever for this lock in the GSMI handler, unless we have this patch
>> that would prevent the handler to run in such case.
>> Makes sense?
> 
> I'm trying to say "if you are in panic, never grab the lock in the first
> place".  So change the place when you grab the lock, not here.
> 

Evan, any comment here?
I think the patch is still well suited for this case. Suggestions on how
to improve it are welcome, of course.

I honestly didn't understand exactly what you're suggesting Greg...
Mind clarifying?

Cheers,


Guilherme
Greg KH Sept. 1, 2022, 6:28 p.m. UTC | #7
On Thu, Sep 01, 2022 at 02:46:48PM -0300, Guilherme G. Piccoli wrote:
> On 01/09/2022 13:44, Greg KH wrote:
> > [...]
> >>> How are we supposed to know this here?
> >>>
> >>
> >> Reading the code?
> >> Or you mean, in the commit description this should be mentioned?
> > 
> > Yes, and in the comment as this type of call is very rare and should
> > almost never be used.
> 
> OK, I can add that, for sure.
> 
> 
> >> [...]
> >> I don't think it is so simple - we are in the panic path.
> > 
> > Great, then the lock doesn't matter :)
> > 
> >> So, imagine the lock was taken in CPU0, where GSMI is doing some
> >> operation. During that operation, CPU1 panics!
> >>
> >> When that happens, panic() executes in CPU1, disabling CPU0 through
> >> "strong" mechanisms (NMI). So, CPU0 had the lock, it is now off, and
> >> when CPU1 goes through the panic notifiers, it'll eventually wait
> >> forever for this lock in the GSMI handler, unless we have this patch
> >> that would prevent the handler to run in such case.
> >> Makes sense?
> > 
> > I'm trying to say "if you are in panic, never grab the lock in the first
> > place".  So change the place when you grab the lock, not here.
> > 
> 
> Evan, any comment here?
> I think the patch is still well suited for this case. Suggestions on how
> to improve it are welcome, of course.
> 
> I honestly didn't understand exactly what you're suggesting Greg...
> Mind clarifying?

Something like this totally untested code:

diff --git a/drivers/firmware/google/gsmi.c b/drivers/firmware/google/gsmi.c
index adaa492c3d2d..6ad41b22671c 100644
--- a/drivers/firmware/google/gsmi.c
+++ b/drivers/firmware/google/gsmi.c
@@ -19,6 +19,7 @@
 #include <linux/dma-mapping.h>
 #include <linux/fs.h>
 #include <linux/slab.h>
+#include <linux/panic.h>
 #include <linux/panic_notifier.h>
 #include <linux/ioctl.h>
 #include <linux/acpi.h>
@@ -611,6 +612,11 @@ static const struct attribute *gsmi_attrs[] = {
 	NULL,
 };
 
+static bool panic_in_progress(void)
+{
+	return unlikely(atomic_read(&panic_cpu) != PANIC_CPU_INVALID);
+}
+
 static int gsmi_shutdown_reason(int reason)
 {
 	struct gsmi_log_entry_type_1 entry = {
@@ -629,7 +635,8 @@ static int gsmi_shutdown_reason(int reason)
 	if (saved_reason & (1 << reason))
 		return 0;
 
-	spin_lock_irqsave(&gsmi_dev.lock, flags);
+	if (!panic_in_progress())
+		spin_lock_irqsave(&gsmi_dev.lock, flags);
 
 	saved_reason |= (1 << reason);
 
@@ -644,7 +651,8 @@ static int gsmi_shutdown_reason(int reason)
 
 	rc = gsmi_exec(GSMI_CALLBACK, GSMI_CMD_SET_EVENT_LOG);
 
-	spin_unlock_irqrestore(&gsmi_dev.lock, flags);
+	if (!panic_in_progress())
+		spin_unlock_irqrestore(&gsmi_dev.lock, flags);
 
 	if (rc < 0)
 		printk(KERN_ERR "gsmi: Log Shutdown Reason failed\n");



That being said, are you sure spinlocks are still held in the panic
notifier?  What about the call to bust_spinlocks() that is called in
panic() already?  Wouldn't that have already dropped whatever you were
worried about here?

thanks,

greg k-h
Guilherme G. Piccoli Sept. 1, 2022, 6:46 p.m. UTC | #8
On 01/09/2022 15:28, Greg KH wrote:
> [...]
>> I honestly didn't understand exactly what you're suggesting Greg...
>> Mind clarifying?
> 
> Something like this totally untested code:
> 
> diff --git a/drivers/firmware/google/gsmi.c b/drivers/firmware/google/gsmi.c
> index adaa492c3d2d..6ad41b22671c 100644
> --- a/drivers/firmware/google/gsmi.c
> +++ b/drivers/firmware/google/gsmi.c
> @@ -19,6 +19,7 @@
>  #include <linux/dma-mapping.h>
>  #include <linux/fs.h>
>  #include <linux/slab.h>
> +#include <linux/panic.h>
>  #include <linux/panic_notifier.h>
>  #include <linux/ioctl.h>
>  #include <linux/acpi.h>
> @@ -611,6 +612,11 @@ static const struct attribute *gsmi_attrs[] = {
>  	NULL,
>  };
>  
> +static bool panic_in_progress(void)
> +{
> +	return unlikely(atomic_read(&panic_cpu) != PANIC_CPU_INVALID);
> +}
> +
>  static int gsmi_shutdown_reason(int reason)
>  {
>  	struct gsmi_log_entry_type_1 entry = {
> @@ -629,7 +635,8 @@ static int gsmi_shutdown_reason(int reason)
>  	if (saved_reason & (1 << reason))
>  		return 0;
>  
> -	spin_lock_irqsave(&gsmi_dev.lock, flags);
> +	if (!panic_in_progress())
> +		spin_lock_irqsave(&gsmi_dev.lock, flags);
>  
>  	saved_reason |= (1 << reason);
>  
> @@ -644,7 +651,8 @@ static int gsmi_shutdown_reason(int reason)
>  
>  	rc = gsmi_exec(GSMI_CALLBACK, GSMI_CMD_SET_EVENT_LOG);
>  
> -	spin_unlock_irqrestore(&gsmi_dev.lock, flags);
> +	if (!panic_in_progress())
> +		spin_unlock_irqrestore(&gsmi_dev.lock, flags);
>  
>  	if (rc < 0)
>  		printk(KERN_ERR "gsmi: Log Shutdown Reason failed\n");
> 
> 
>

Thanks! Personally, I feel the approach a bit more complex than mine,
and...racy!
Imagine CPU0 runs your tests, right after the if (!panic_in_progress())
is done, spinlock is taken and boom - panic on CPU1. This would cause
the same issue...

My approach is zero racy, since it checks if spinlock was taken in a
moment that the machine is like a no-SMP, only a single CPU running...


> That being said, are you sure spinlocks are still held in the panic
> notifier?  What about the call to bust_spinlocks() that is called in
> panic() already?  Wouldn't that have already dropped whatever you were
> worried about here?

This function is very weird. Basically, the call of "bust_spinlocks(1);"
in panic effectively means "++oops_in_progress;" IIUC.
So, I still think we can have lockups in panic notifiers with locks
previously taken =)

Cheers,


Guilherme
Greg KH Sept. 1, 2022, 6:59 p.m. UTC | #9
On Thu, Sep 01, 2022 at 03:46:17PM -0300, Guilherme G. Piccoli wrote:
> On 01/09/2022 15:28, Greg KH wrote:
> > [...]
> >> I honestly didn't understand exactly what you're suggesting Greg...
> >> Mind clarifying?
> > 
> > Something like this totally untested code:
> > 
> > diff --git a/drivers/firmware/google/gsmi.c b/drivers/firmware/google/gsmi.c
> > index adaa492c3d2d..6ad41b22671c 100644
> > --- a/drivers/firmware/google/gsmi.c
> > +++ b/drivers/firmware/google/gsmi.c
> > @@ -19,6 +19,7 @@
> >  #include <linux/dma-mapping.h>
> >  #include <linux/fs.h>
> >  #include <linux/slab.h>
> > +#include <linux/panic.h>
> >  #include <linux/panic_notifier.h>
> >  #include <linux/ioctl.h>
> >  #include <linux/acpi.h>
> > @@ -611,6 +612,11 @@ static const struct attribute *gsmi_attrs[] = {
> >  	NULL,
> >  };
> >  
> > +static bool panic_in_progress(void)
> > +{
> > +	return unlikely(atomic_read(&panic_cpu) != PANIC_CPU_INVALID);
> > +}
> > +
> >  static int gsmi_shutdown_reason(int reason)
> >  {
> >  	struct gsmi_log_entry_type_1 entry = {
> > @@ -629,7 +635,8 @@ static int gsmi_shutdown_reason(int reason)
> >  	if (saved_reason & (1 << reason))
> >  		return 0;
> >  
> > -	spin_lock_irqsave(&gsmi_dev.lock, flags);
> > +	if (!panic_in_progress())
> > +		spin_lock_irqsave(&gsmi_dev.lock, flags);
> >  
> >  	saved_reason |= (1 << reason);
> >  
> > @@ -644,7 +651,8 @@ static int gsmi_shutdown_reason(int reason)
> >  
> >  	rc = gsmi_exec(GSMI_CALLBACK, GSMI_CMD_SET_EVENT_LOG);
> >  
> > -	spin_unlock_irqrestore(&gsmi_dev.lock, flags);
> > +	if (!panic_in_progress())
> > +		spin_unlock_irqrestore(&gsmi_dev.lock, flags);
> >  
> >  	if (rc < 0)
> >  		printk(KERN_ERR "gsmi: Log Shutdown Reason failed\n");
> > 
> > 
> >
> 
> Thanks! Personally, I feel the approach a bit more complex than mine,
> and...racy!
> Imagine CPU0 runs your tests, right after the if (!panic_in_progress())
> is done, spinlock is taken and boom - panic on CPU1. This would cause
> the same issue...

True, it would, but so would yours if the unlock happens and then your
test passes and then this lock is taken and then a panic happens.

There's no "race free" way here perhaps.  The joys of notifier chains (I
hate the things...)

> My approach is zero racy, since it checks if spinlock was taken in a
> moment that the machine is like a no-SMP, only a single CPU running...

Ah, I missed that this path is only called if an panic is happening.
Well, also a reboot.

Ick, I don't know, this all feels odd.  I want someone else to review
this and give their ack on the patch before I'll take it so someone else
can share in the blame :)

thanks,

greg k-h
Guilherme G. Piccoli Sept. 1, 2022, 7:02 p.m. UTC | #10
On 01/09/2022 15:59, Greg KH wrote:
> [...]
> Ick, I don't know, this all feels odd.  I want someone else to review
> this and give their ack on the patch before I'll take it so someone else
> can share in the blame :)
> 
> thanks,
> 
> greg k-h

LOL, that's OK for me! Evan seems to be fine with it BTW.

Let's see if Petr can jump in, also adding Andrew here since he's
usually merging stuff for panic.
Cheers,


Guilherme
Andrew Morton Sept. 1, 2022, 10:13 p.m. UTC | #11
On Thu, 1 Sep 2022 16:02:08 -0300 "Guilherme G. Piccoli" <gpiccoli@igalia.com> wrote:

> On 01/09/2022 15:59, Greg KH wrote:
> > [...]
> > Ick, I don't know, this all feels odd.  I want someone else to review
> > this and give their ack on the patch before I'll take it so someone else
> > can share in the blame :)
> > 
> > thanks,
> > 
> > greg k-h
> 
> LOL, that's OK for me! Evan seems to be fine with it BTW.
> 
> Let's see if Petr can jump in, also adding Andrew here since he's
> usually merging stuff for panic.

Are the usual gsmi developers not operational?

Patch seems sensible to me, although the deadlock sounds pretty
theoretical.  A better code comment might be simply

	/*
	 * Panic callbacks are executed with all other CPUs stopped, so we must
	 * not attempt to spin waiting for gsmi_dev.lock to be released.
	 */

?
Evan Green Sept. 6, 2022, 5:09 p.m. UTC | #12
On Thu, Sep 1, 2022 at 3:13 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Thu, 1 Sep 2022 16:02:08 -0300 "Guilherme G. Piccoli" <gpiccoli@igalia.com> wrote:
>
> > On 01/09/2022 15:59, Greg KH wrote:
> > > [...]
> > > Ick, I don't know, this all feels odd.  I want someone else to review
> > > this and give their ack on the patch before I'll take it so someone else
> > > can share in the blame :)
> > >
> > > thanks,
> > >
> > > greg k-h
> >
> > LOL, that's OK for me! Evan seems to be fine with it BTW.
> >
> > Let's see if Petr can jump in, also adding Andrew here since he's
> > usually merging stuff for panic.
>
> Are the usual gsmi developers not operational?

I'm unsure who that is, I sort of Mr. Beaned my way in here having
touched the file recently. A lot of the people who historically
touched this file have gone.

>
> Patch seems sensible to me, although the deadlock sounds pretty
> theoretical.  A better code comment might be simply
>
>         /*
>          * Panic callbacks are executed with all other CPUs stopped, so we must
>          * not attempt to spin waiting for gsmi_dev.lock to be released.
>          */
>
> ?

I basically came to the same conclusion as Andrew. It seems like this
patch does fix a problem, which is a panic coming in on another CPU
and NMIing on top of a CPU doing a normal operation holding this lock.
The problem seems pretty theoretical, but I suppose I don't have
numbers one way or another since any attempt to gather numbers would
be reliant on this very mechanism. My Reviewed-by tag is already on
there.
-Evan
Guilherme G. Piccoli Sept. 8, 2022, 12:35 a.m. UTC | #13
On 06/09/2022 14:09, Evan Green wrote:
> [...]
> 
> I basically came to the same conclusion as Andrew. It seems like this
> patch does fix a problem, which is a panic coming in on another CPU
> and NMIing on top of a CPU doing a normal operation holding this lock.
> The problem seems pretty theoretical, but I suppose I don't have
> numbers one way or another since any attempt to gather numbers would
> be reliant on this very mechanism. My Reviewed-by tag is already on
> there.
> -Evan

Thanks for the feedback Evan and Andrew.

So if Greg (or anybody else) doesn't oppose, I'll send a V4 with the
commit message/comment suggestions.

Cheers,


Guilherme
diff mbox series

Patch

diff --git a/drivers/firmware/google/gsmi.c b/drivers/firmware/google/gsmi.c
index adaa492c3d2d..3ef5f3c0b4e4 100644
--- a/drivers/firmware/google/gsmi.c
+++ b/drivers/firmware/google/gsmi.c
@@ -681,6 +681,14 @@  static struct notifier_block gsmi_die_notifier = {
 static int gsmi_panic_callback(struct notifier_block *nb,
 			       unsigned long reason, void *arg)
 {
+	/*
+	 * Perform the lock check before effectively trying
+	 * to acquire it on gsmi_shutdown_reason() to avoid
+	 * potential lockups in atomic context.
+	 */
+	if (spin_is_locked(&gsmi_dev.lock))
+		return NOTIFY_DONE;
+
 	gsmi_shutdown_reason(GSMI_SHUTDOWN_PANIC);
 	return NOTIFY_DONE;
 }