diff mbox series

wifi: cfg80211: fix WARN_ON during CAC cancelling

Message ID 20241113-mlo_dfs_fix-v1-1-e4326736347b@quicinc.com
State New
Headers show
Series wifi: cfg80211: fix WARN_ON during CAC cancelling | expand

Commit Message

Aditya Kumar Singh Nov. 13, 2024, 6:27 a.m. UTC
In cfg80211_cac_event(), there’s a check to ensure that for MLO, the
link_id argument passed must be a valid link_id in the wdev. The various
callers of this function (during MLO) are -
 * ieee80211_stop_ap()
 * ieee80211_link_stop()
 * ieee80211_dfs_cac_timer_work()
 * ieee80211_dfs_cac_cancel()

Now, in ieee80211_stop_ap() the wdev->valid_links is still having the link
ID which is being stopped. ieee80211_dfs_cac_timer_work() is triggered
after CAC time and the link ID is still valid in wdev->valid_links.
Similarly in ieee80211_dfs_cac_cancel() as well, the link ID is valid in
wdev->valid_links.

However, during the link stop via ieee80211_link_stop() flow, when this
function is called, the link_id is removed from the bitmap, triggering the
WARN_ON. The flow during the stop link is -

nl80211_remove_link
  > cfg80211_remove_link
    > ieee80211_del_intf_link
      > ieee80211_vif_set_links
        > ieee80211_vif_update_links
          > ieee80211_link_stop
            > cfg80211_cac_event

In cfg80211_remove_link(), the link_id is removed from the valid_links
bitmap before ieee80211_del_intf_link() is called. Consequently, in
cfg80211_cac_event(), the WARN_ON is triggered.

Since having link_id set in valid_links is not a necessary condition now,
remove the check.

Fixes: 81f67d60ebf2 ("wifi: cfg80211: handle DFS per link")
Signed-off-by: Aditya Kumar Singh <quic_adisi@quicinc.com>
---
 net/wireless/mlme.c | 4 ----
 1 file changed, 4 deletions(-)


---
base-commit: 11597043d74809daf5d14256b96d6781749b3f82
change-id: 20241113-mlo_dfs_fix-1123060109bc

Comments

Johannes Berg Nov. 13, 2024, 9:29 a.m. UTC | #1
> 
> diff --git a/net/wireless/mlme.c b/net/wireless/mlme.c
> index a5eb92d93074e6ce1e08fcc2790b80cf04ff08f8..2a932a036225a3e0587cf5c18a4e80e91552313b 100644
> --- a/net/wireless/mlme.c
> +++ b/net/wireless/mlme.c
> @@ -1112,10 +1112,6 @@ void cfg80211_cac_event(struct net_device *netdev,
>  	struct cfg80211_registered_device *rdev = wiphy_to_rdev(wiphy);
>  	unsigned long timeout;
>  
> -	if (WARN_ON(wdev->valid_links &&
> -		    !(wdev->valid_links & BIT(link_id))))
> -		return;
> -
>  	trace_cfg80211_cac_event(netdev, event, link_id);
>  
>  	if (WARN_ON(!wdev->links[link_id].cac_started &&
> 

This really doesn't seem right.

Perhaps the order in teardown should be changed?

johannes
Aditya Kumar Singh Nov. 13, 2024, 2:43 p.m. UTC | #2
On 11/13/24 14:59, Johannes Berg wrote:
>>
>> diff --git a/net/wireless/mlme.c b/net/wireless/mlme.c
>> index a5eb92d93074e6ce1e08fcc2790b80cf04ff08f8..2a932a036225a3e0587cf5c18a4e80e91552313b 100644
>> --- a/net/wireless/mlme.c
>> +++ b/net/wireless/mlme.c
>> @@ -1112,10 +1112,6 @@ void cfg80211_cac_event(struct net_device *netdev,
>>   	struct cfg80211_registered_device *rdev = wiphy_to_rdev(wiphy);
>>   	unsigned long timeout;
>>   
>> -	if (WARN_ON(wdev->valid_links &&
>> -		    !(wdev->valid_links & BIT(link_id))))
>> -		return;
>> -
>>   	trace_cfg80211_cac_event(netdev, event, link_id);
>>   
>>   	if (WARN_ON(!wdev->links[link_id].cac_started &&
>>
> 
> This really doesn't seem right.
> 
> Perhaps the order in teardown should be changed?

I thought about it but couldn't really come down to a convincing approach.

The thing is when CAC in ongoing and hostapd process is killed, there is 
no specific event apart from link delete which hostapd sends. Will it be 
okay to add a new NL command to stop radar detection? Something opposite 
of what start_radar_detection command does?
Johannes Berg Nov. 13, 2024, 3:48 p.m. UTC | #3
On Wed, 2024-11-13 at 20:13 +0530, Aditya Kumar Singh wrote:
> On 11/13/24 14:59, Johannes Berg wrote:
> > > 
> > > diff --git a/net/wireless/mlme.c b/net/wireless/mlme.c
> > > index a5eb92d93074e6ce1e08fcc2790b80cf04ff08f8..2a932a036225a3e0587cf5c18a4e80e91552313b 100644
> > > --- a/net/wireless/mlme.c
> > > +++ b/net/wireless/mlme.c
> > > @@ -1112,10 +1112,6 @@ void cfg80211_cac_event(struct net_device *netdev,
> > >   	struct cfg80211_registered_device *rdev = wiphy_to_rdev(wiphy);
> > >   	unsigned long timeout;
> > >   
> > > -	if (WARN_ON(wdev->valid_links &&
> > > -		    !(wdev->valid_links & BIT(link_id))))
> > > -		return;
> > > -
> > >   	trace_cfg80211_cac_event(netdev, event, link_id);
> > >   
> > >   	if (WARN_ON(!wdev->links[link_id].cac_started &&
> > > 
> > 
> > This really doesn't seem right.
> > 
> > Perhaps the order in teardown should be changed?
> 
> I thought about it but couldn't really come down to a convincing approach.
> 
> The thing is when CAC in ongoing and hostapd process is killed, there is 
> no specific event apart from link delete which hostapd sends.
> 

so we do have link removal, why doesn't that work?

> Will it be 
> okay to add a new NL command to stop radar detection? Something opposite 
> of what start_radar_detection command does?
> 

No, obviously not.

johannes
Aditya Kumar Singh Nov. 13, 2024, 4:20 p.m. UTC | #4
On 11/13/24 21:18, Johannes Berg wrote:
> On Wed, 2024-11-13 at 20:13 +0530, Aditya Kumar Singh wrote:
>> On 11/13/24 14:59, Johannes Berg wrote:
>>>>
>>>> diff --git a/net/wireless/mlme.c b/net/wireless/mlme.c
>>>> index a5eb92d93074e6ce1e08fcc2790b80cf04ff08f8..2a932a036225a3e0587cf5c18a4e80e91552313b 100644
>>>> --- a/net/wireless/mlme.c
>>>> +++ b/net/wireless/mlme.c
>>>> @@ -1112,10 +1112,6 @@ void cfg80211_cac_event(struct net_device *netdev,
>>>>    	struct cfg80211_registered_device *rdev = wiphy_to_rdev(wiphy);
>>>>    	unsigned long timeout;
>>>>    
>>>> -	if (WARN_ON(wdev->valid_links &&
>>>> -		    !(wdev->valid_links & BIT(link_id))))
>>>> -		return;
>>>> -
>>>>    	trace_cfg80211_cac_event(netdev, event, link_id);
>>>>    
>>>>    	if (WARN_ON(!wdev->links[link_id].cac_started &&
>>>>
>>>
>>> This really doesn't seem right.
>>>
>>> Perhaps the order in teardown should be changed?
>>
>> I thought about it but couldn't really come down to a convincing approach.
>>
>> The thing is when CAC in ongoing and hostapd process is killed, there is
>> no specific event apart from link delete which hostapd sends.
>>
> 
> so we do have link removal, why doesn't that work?

Because link ID is cleared from the bitmap well before link stop is 
called. As mentioned in commit message, this is the flow -

nl80211_remove_link
   > cfg80211_remove_link                -> link ID gets updated here
     > ieee80211_del_intf_link
       > ieee80211_vif_set_links
         > ieee80211_vif_update_links
           > ieee80211_link_stop         -> this ultimately tries to stop
					   CAC if it is ongoing.

> 
>> Will it be
>> okay to add a new NL command to stop radar detection? Something opposite
>> of what start_radar_detection command does?
>>
> 
> No, obviously not.
> 
> johannes
Johannes Berg Nov. 15, 2024, 8:14 a.m. UTC | #5
On Wed, 2024-11-13 at 21:50 +0530, Aditya Kumar Singh wrote:
> 
> Because link ID is cleared from the bitmap well before link stop is 
> called. As mentioned in commit message, this is the flow -
> 
> nl80211_remove_link
>    > cfg80211_remove_link                -> link ID gets updated here
>      > ieee80211_del_intf_link
>        > ieee80211_vif_set_links
>          > ieee80211_vif_update_links
>            > ieee80211_link_stop         -> this ultimately tries to stop
> 					   CAC if it is ongoing.
> 

OK, but why does it have to be that way? It's all under wiphy mutex, so
perhaps we could just clear it later?

There's necessarily going to be some temporary inconsistency here, I'm
not sure it matters too much if it isn't very visible?

Alternatively, could do something like

  wdev->valid_links &= ~BIT(link_id);
  wdev->removing_link = link_id;
  ...
  wdev->removing_link = -1;

and accept the wdev->removing_link in these APIs like CAC?

johannes
Johannes Berg Nov. 20, 2024, 8:59 a.m. UTC | #6
On Tue, 2024-11-19 at 11:23 +0530, Aditya Kumar Singh wrote:
> 
> --- a/net/mac80211/cfg.c
> +++ b/net/mac80211/cfg.c
> @@ -5046,10 +5046,11 @@ static void ieee80211_del_intf_link(struct wiphy 
> *wiphy,
>                                      unsigned int link_id)
>   {
>          struct ieee80211_sub_if_data *sdata = 
> IEEE80211_WDEV_TO_SUB_IF(wdev);
> +       u16 new_links = wdev->valid_links & ~BIT(link_id);
> 
>          lockdep_assert_wiphy(sdata->local->hw.wiphy);
> 
> -       ieee80211_vif_set_links(sdata, wdev->valid_links, 0);
> +       ieee80211_vif_set_links(sdata, new_links, 0);
>   }

Needing this part is sort of clear then, but maybe not entirely obvious?
Should probably be accompanied by some documentation updates.

> So I will submit this as patch then?

Seems reasonable.

> > There's necessarily going to be some temporary inconsistency here, I'm
> > not sure it matters too much if it isn't very visible?
> > 
> 
> Any particular case you suspect and want me to test?

No, not really.

> I'm more inclining towards the first suggestion you gave - clearing the 
> bitmap later. What's your suggestion?
> 

Sure.

johannes
diff mbox series

Patch

diff --git a/net/wireless/mlme.c b/net/wireless/mlme.c
index a5eb92d93074e6ce1e08fcc2790b80cf04ff08f8..2a932a036225a3e0587cf5c18a4e80e91552313b 100644
--- a/net/wireless/mlme.c
+++ b/net/wireless/mlme.c
@@ -1112,10 +1112,6 @@  void cfg80211_cac_event(struct net_device *netdev,
 	struct cfg80211_registered_device *rdev = wiphy_to_rdev(wiphy);
 	unsigned long timeout;
 
-	if (WARN_ON(wdev->valid_links &&
-		    !(wdev->valid_links & BIT(link_id))))
-		return;
-
 	trace_cfg80211_cac_event(netdev, event, link_id);
 
 	if (WARN_ON(!wdev->links[link_id].cac_started &&