diff mbox series

[v3] rhashtable: Fix rhashtable_try_insert test

Message ID Z4XWx5X0doetOJni@gondor.apana.org.au
State New
Headers show
Series [v3] rhashtable: Fix rhashtable_try_insert test | expand

Commit Message

Herbert Xu Jan. 14, 2025, 3:15 a.m. UTC
On Fri, Jan 10, 2025 at 06:22:40PM +0000, Michael Kelley wrote:
>
> This patch passes my tests. I'm doing a narrow test to verify that
> the boot failure when opening the Mellanox NIC is no longer occurring.
> I also unloaded/reloaded the mlx5 driver a couple of times. For good
> measure, I then did a full Linux kernel build, and all is good. My testing
> does not broadly verify correct operation of rhashtable except as it
> gets exercised implicitly by these basic tests.

Thanks for testing! The patch needs one more change though as
moving the atomic_inc outside of the lock was a bad idea on my
part.  This could cause atomic_inc/atomic_dec to be reordered
thus resulting in an underflow.

Thanks,

---8<---
The test on whether rhashtable_insert_one did an insertion relies
on the value returned by rhashtable_lookup_one.  Unfortunately that
value is overwritten after rhashtable_insert_one returns.  Fix this
by moving the test before data gets overwritten.

Simplify the test as only data == NULL matters.

Finally move atomic_inc back within the lock as otherwise it may
be reordered with the atomic_dec on the removal side, potentially
leading to an underflow.

Reported-by: Michael Kelley <mhklinux@outlook.com>
Fixes: e1d3422c95f0 ("rhashtable: Fix potential deadlock by moving schedule_work outside lock")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Comments

Breno Leitao Jan. 15, 2025, 3:15 p.m. UTC | #1
On Tue, Jan 14, 2025 at 11:15:19AM +0800, Herbert Xu wrote:
> On Fri, Jan 10, 2025 at 06:22:40PM +0000, Michael Kelley wrote:
> >
> > This patch passes my tests. I'm doing a narrow test to verify that
> > the boot failure when opening the Mellanox NIC is no longer occurring.
> > I also unloaded/reloaded the mlx5 driver a couple of times. For good
> > measure, I then did a full Linux kernel build, and all is good. My testing
> > does not broadly verify correct operation of rhashtable except as it
> > gets exercised implicitly by these basic tests.
> 
> Thanks for testing! The patch needs one more change though as
> moving the atomic_inc outside of the lock was a bad idea on my
> part.  This could cause atomic_inc/atomic_dec to be reordered
> thus resulting in an underflow.
> 
> Thanks,
> 
> ---8<---
> The test on whether rhashtable_insert_one did an insertion relies
> on the value returned by rhashtable_lookup_one.  Unfortunately that
> value is overwritten after rhashtable_insert_one returns.  Fix this
> by moving the test before data gets overwritten.
> 
> Simplify the test as only data == NULL matters.
> 
> Finally move atomic_inc back within the lock as otherwise it may
> be reordered with the atomic_dec on the removal side, potentially
> leading to an underflow.
> 
> Reported-by: Michael Kelley <mhklinux@outlook.com>
> Fixes: e1d3422c95f0 ("rhashtable: Fix potential deadlock by moving schedule_work outside lock")
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Reviewed-by: Breno Leitao <leitao@debian.org>

> diff --git a/lib/rhashtable.c b/lib/rhashtable.c
> index bf956b85455a..0e9a1d4cf89b 100644
> --- a/lib/rhashtable.c
> +++ b/lib/rhashtable.c
> @@ -611,21 +611,23 @@ static void *rhashtable_try_insert(struct rhashtable *ht, const void *key,
>  			new_tbl = rht_dereference_rcu(tbl->future_tbl, ht);
>  			data = ERR_PTR(-EAGAIN);
>  		} else {
> +			bool inserted;
> +
>  			flags = rht_lock(tbl, bkt);
>  			data = rhashtable_lookup_one(ht, bkt, tbl,
>  						     hash, key, obj);
>  			new_tbl = rhashtable_insert_one(ht, bkt, tbl,
>  							hash, obj, data);
> +			inserted = data && !new_tbl;
> +			if (inserted)
> +				atomic_inc(&ht->nelems);
>  			if (PTR_ERR(new_tbl) != -EEXIST)
>  				data = ERR_CAST(new_tbl);
>  
>  			rht_unlock(tbl, bkt, flags);
>  
> -			if (PTR_ERR(data) == -ENOENT && !new_tbl) {
> -				atomic_inc(&ht->nelems);
> -				if (rht_grow_above_75(ht, tbl))
> -					schedule_work(&ht->run_work);
> -			}
> +			if (inserted && rht_grow_above_75(ht, tbl))
> +				schedule_work(&ht->run_work);

That makes sense, since data could be ERR_PTR(-ENOENT) and ERR_PTR(-EAGAIN), and
the object being inserted, which means that nelems should be increased.

It was hard to review this patch, basically rhashtable_insert_one()
returns three type of values, and you are interested in only one case,
when the obj was inserted.

These are the type of values that is coming from
rhashtable_insert_one():

  1) NULL: if object was inserted OR if data is NULL
  2) Non error and !NULL: A new table to look at
  3) ERR: Definitely not added

I am wondering if we decoupled the first case, and only return NULL iff
the object was added, it would simplify this logic.

Something like the following (not tested):

	diff --git a/lib/rhashtable.c b/lib/rhashtable.c
	index 3e555d012ed60..5a0ec71e990ee 100644
	--- a/lib/rhashtable.c
	+++ b/lib/rhashtable.c
	@@ -554,7 +554,7 @@ static struct bucket_table *rhashtable_insert_one(
			return ERR_PTR(-EEXIST);

		if (PTR_ERR(data) != -EAGAIN && PTR_ERR(data) != -ENOENT)
	-               return ERR_CAST(data);
	+               return ERR_PTR(-EINVAL);

		new_tbl = rht_dereference_rcu(tbl->future_tbl, ht);
		if (new_tbl)


Thanks for fixing it,
--breno
Herbert Xu Jan. 16, 2025, 9:10 a.m. UTC | #2
On Wed, Jan 15, 2025 at 07:15:46AM -0800, Breno Leitao wrote:
>
> Something like the following (not tested):
> 
> 	diff --git a/lib/rhashtable.c b/lib/rhashtable.c
> 	index 3e555d012ed60..5a0ec71e990ee 100644
> 	--- a/lib/rhashtable.c
> 	+++ b/lib/rhashtable.c
> 	@@ -554,7 +554,7 @@ static struct bucket_table *rhashtable_insert_one(
> 			return ERR_PTR(-EEXIST);
> 
> 		if (PTR_ERR(data) != -EAGAIN && PTR_ERR(data) != -ENOENT)
> 	-               return ERR_CAST(data);
> 	+               return ERR_PTR(-EINVAL);

But data == NULL is not an error, it is a successful return value
for rhltable.  It's when a key already exists in the rhltable
and we're simply appending the new object to it.  Thus the insertion
was successful but the hash table did not grow.

Cheers,
Alexander Gordeev Jan. 16, 2025, 11:48 a.m. UTC | #3
On Tue, Jan 14, 2025 at 11:15:19AM +0800, Herbert Xu wrote:

Hi Herbert,

> Thanks for testing! The patch needs one more change though as
> moving the atomic_inc outside of the lock was a bad idea on my
> part.  This could cause atomic_inc/atomic_dec to be reordered
> thus resulting in an underflow.

I want to confirm that this patch fixes massive strangenesses and
few crashes observed on s390 systems, in addition to OOMs reported
by Mikhail Zaslonko earlier.

> Thanks,

Thanks!
diff mbox series

Patch

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index bf956b85455a..0e9a1d4cf89b 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -611,21 +611,23 @@  static void *rhashtable_try_insert(struct rhashtable *ht, const void *key,
 			new_tbl = rht_dereference_rcu(tbl->future_tbl, ht);
 			data = ERR_PTR(-EAGAIN);
 		} else {
+			bool inserted;
+
 			flags = rht_lock(tbl, bkt);
 			data = rhashtable_lookup_one(ht, bkt, tbl,
 						     hash, key, obj);
 			new_tbl = rhashtable_insert_one(ht, bkt, tbl,
 							hash, obj, data);
+			inserted = data && !new_tbl;
+			if (inserted)
+				atomic_inc(&ht->nelems);
 			if (PTR_ERR(new_tbl) != -EEXIST)
 				data = ERR_CAST(new_tbl);
 
 			rht_unlock(tbl, bkt, flags);
 
-			if (PTR_ERR(data) == -ENOENT && !new_tbl) {
-				atomic_inc(&ht->nelems);
-				if (rht_grow_above_75(ht, tbl))
-					schedule_work(&ht->run_work);
-			}
+			if (inserted && rht_grow_above_75(ht, tbl))
+				schedule_work(&ht->run_work);
 		}
 	} while (!IS_ERR_OR_NULL(new_tbl));