[10/12] block: move blk_exit_queue into disk_release

Message ID	20220222141450.591193-11-hch@lst.de
State	Superseded
Headers	show Return-Path: <linux-scsi-owner@kernel.org> From: Christoph Hellwig <hch@lst.de> To: Jens Axboe <axboe@kernel.dk> Cc: "Martin K. Petersen" <martin.petersen@oracle.com>, Ming Lei <ming.lei@redhat.com>, linux-block@vger.kernel.org, linux-scsi@vger.kernel.org Subject: [PATCH 10/12] block: move blk_exit_queue into disk_release Date: Tue, 22 Feb 2022 15:14:48 +0100 Message-Id: <20220222141450.591193-11-hch@lst.de> In-Reply-To: <20220222141450.591193-1-hch@lst.de> References: <20220222141450.591193-1-hch@lst.de> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	[01/12] blk-mq: do not include passthrough requests in I/O accounting \| expand [01/12] blk-mq: do not include passthrough requests in I/O accounting [02/12] blk-mq: handle already freed tags gracefully in blk_mq_free_rqs [03/12] scsi: don't use disk->private_data to find the scsi_driver [04/12] sd: make use of ->free_disk to simplify refcounting [05/12] sd: remove the extra sdev_gendev reference [06/12] sr: implement ->free_disk [07/12] block: move blkcg initialization/destroy into disk allocation/release handler [08/12] block: don't remove hctx debugfs dir from blk_mq_exit_queue [09/12] block: move q_usage_counter release into blk_queue_release [10/12] block: move blk_exit_queue into disk_release [11/12] block: do more work in elevator_exit [12/12] block: move rq_qos_exit() into disk_release()

Message ID

20220222141450.591193-11-hch@lst.de

State

Superseded

Headers

From: Christoph Hellwig <hch@lst.de>
To: Jens Axboe <axboe@kernel.dk>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
        Ming Lei <ming.lei@redhat.com>, linux-block@vger.kernel.org,
        linux-scsi@vger.kernel.org
Subject: [PATCH 10/12] block: move blk_exit_queue into disk_release
Date: Tue, 22 Feb 2022 15:14:48 +0100
Message-Id: <20220222141450.591193-11-hch@lst.de>
In-Reply-To: <20220222141450.591193-1-hch@lst.de>
References: <20220222141450.591193-1-hch@lst.de>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

[01/12] blk-mq: do not include passthrough requests in I/O accounting | expand

Commit Message

Christoph Hellwig Feb. 22, 2022, 2:14 p.m. UTC

From: Ming Lei <ming.lei@redhat.com>

There can't be FS IO in disk_release(), so move blk_exit_queue() there.

We still need to freeze queue here since the request is freed after the
bio is completed and passthrough request rely on scheduler tags as well.

The disk can be released before or after queue is cleaned up, and we have
to free the scheduler request pool before blk_cleanup_queue returns,
while the static request pool has to be freed before exiting the
I/O scheduler.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
[hch: rebased]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-sysfs.c | 16 ----------------
 block/genhd.c     | 32 +++++++++++++++++++++++++++++++-
 2 files changed, 31 insertions(+), 17 deletions(-)

Comments

Ming Lei Feb. 23, 2022, 6:56 a.m. UTC | #1

On Tue, Feb 22, 2022 at 10:29:47AM -0800, Bart Van Assche wrote:
> On 2/22/22 06:14, Christoph Hellwig wrote:
> > From: Ming Lei <ming.lei@redhat.com>
> > 
> > There can't be FS IO in disk_release(), so move blk_exit_queue() there.
> > 
> > We still need to freeze queue here since the request is freed after the
> > bio is completed and passthrough request rely on scheduler tags as well.
> > 
> > The disk can be released before or after queue is cleaned up, and we have
> > to free the scheduler request pool before blk_cleanup_queue returns,
> > while the static request pool has to be freed before exiting the
> > I/O scheduler.
> 
> This patch looks dubious to me because:
> - The blk_freeze_queue() call in blk_cleanup_queue() waits for pending
>   requests to finish, so why to move blk_exit_queue() from
>   blk_cleanup_queue() into disk_release()?

scsi disk may be released before calling blk_cleanup_queue(), and we
want to tear down all FS related stuff(cgroup, rqos, elevator) in disk_release().

And FS bios have been drained already when releasing disk.

> - I'm concerned that this patch will break user space, e.g. scripts that
>   try to unload an I/O scheduler kernel module immediately after having
>   removed a request queue.

When removing a request queue, the associated disk has been removed
already, and queue's kobject has been deleted too, so how can userspace
unload I/O scheduler at that time?

> 
> > +static void blk_mq_release_queue(struct request_queue *q)
> > +{
> > +	blk_mq_cancel_work_sync(q);
> > +
> > +	/*
> > +	 * There can't be any non non-passthrough bios in flight here, but
> > +	 * requests stay around longer, including passthrough ones so we
> > +	 * still need to freeze the queue here.
> > +	 */
> > +	blk_mq_freeze_queue(q);
> 
> The above comment should be elaborated since what matters in this context is
> not whether or not any bios are still in flight but what happens with the
> request structures.

Yeah, bios have been done, but request is done after bio is ended, see
blk_update_request(), that is why we added blk_mq_freeze_queue() here.

> As you know blk_queue_enter() fails after the DYING flag
> has been set, a flag that is set by blk_cleanup_queue().blk_cleanup_queue()
> already freezes the queue. So why is it necessary to call
> blk_mq_freeze_queue() from blk_mq_release_queue()?

disk may be released before calling blk_cleanup_queue().

But I admit here the name of blk_mq_release_queue() is very misleading,
maybe blk_mq_release_io_queue() is better?


Thanks,
Ming

Bart Van Assche Feb. 23, 2022, 8:04 p.m. UTC | #2

On 2/22/22 22:56, Ming Lei wrote:
> But I admit here the name of blk_mq_release_queue() is very misleading,
> maybe blk_mq_release_io_queue() is better?

I'm not sure what the best name for that function would be. Anyway, 
thanks for having clarified that disk structures are removed before the 
request queue is cleaned up. That's something I was missing.

Bart.

Christoph Hellwig Feb. 24, 2022, 7:25 a.m. UTC | #3

On Wed, Feb 23, 2022 at 12:04:03PM -0800, Bart Van Assche wrote:
> On 2/22/22 22:56, Ming Lei wrote:
>> But I admit here the name of blk_mq_release_queue() is very misleading,
>> maybe blk_mq_release_io_queue() is better?
>
> I'm not sure what the best name for that function would be. Anyway, thanks 
> for having clarified that disk structures are removed before the request 
> queue is cleaned up. That's something I was missing.

Maybe disk_release_mq?

Ming Lei Feb. 25, 2022, 1:26 a.m. UTC | #4

On Thu, Feb 24, 2022 at 08:25:24AM +0100, Christoph Hellwig wrote:
> On Wed, Feb 23, 2022 at 12:04:03PM -0800, Bart Van Assche wrote:
> > On 2/22/22 22:56, Ming Lei wrote:
> >> But I admit here the name of blk_mq_release_queue() is very misleading,
> >> maybe blk_mq_release_io_queue() is better?
> >
> > I'm not sure what the best name for that function would be. Anyway, thanks 
> > for having clarified that disk structures are removed before the request 
> > queue is cleaned up. That's something I was missing.
> 
> Maybe disk_release_mq?

disk_release_mq() looks much better.

Thanks,
Ming

diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 4ea22169b5186..faf8577578929 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -739,20 +739,6 @@  static void blk_free_queue_rcu(struct rcu_head *rcu_head)
 	kmem_cache_free(blk_get_queue_kmem_cache(blk_queue_has_srcu(q)), q);
 }
 
-/* Unconfigure the I/O scheduler and dissociate from the cgroup controller. */
-static void blk_exit_queue(struct request_queue *q)
-{
-	/*
-	 * Since the I/O scheduler exit code may access cgroup information,
-	 * perform I/O scheduler exit before disassociating from the block
-	 * cgroup controller.
-	 */
-	if (q->elevator) {
-		ioc_clear_queue(q);
-		elevator_exit(q);
-	}
-}
-
 /**
  * blk_release_queue - releases all allocated resources of the request_queue
  * @kobj: pointer to a kobject, whose container is a request_queue
@@ -786,8 +772,6 @@  static void blk_release_queue(struct kobject *kobj)
 		blk_stat_remove_callback(q, q->poll_cb);
 	blk_stat_free_callback(q->poll_cb);
 
-	blk_exit_queue(q);
-
 	blk_free_queue_stats(q->stats);
 	kfree(q->poll_stat);
 
diff --git a/block/genhd.c b/block/genhd.c
index ebf0e0be1c545..40ef013382872 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -29,6 +29,7 @@ 
 #include "blk.h"
 #include "blk-mq-sched.h"
 #include "blk-rq-qos.h"
+#include "blk-cgroup.h"
 
 static struct kobject *block_depr;
 
@@ -1092,6 +1093,34 @@  static const struct attribute_group *disk_attr_groups[] = {
 	NULL
 };
 
+static void blk_mq_release_queue(struct request_queue *q)
+{
+	blk_mq_cancel_work_sync(q);
+
+	/*
+	 * There can't be any non non-passthrough bios in flight here, but
+	 * requests stay around longer, including passthrough ones so we
+	 * still need to freeze the queue here.
+	 */
+	blk_mq_freeze_queue(q);
+
+	/*
+	 * Since the I/O scheduler exit code may access cgroup information,
+	 * perform I/O scheduler exit before disassociating from the block
+	 * cgroup controller.
+	 */
+	if (q->elevator) {
+		ioc_clear_queue(q);
+
+		mutex_lock(&q->sysfs_lock);
+		blk_mq_sched_free_rqs(q);
+		elevator_exit(q);
+		mutex_unlock(&q->sysfs_lock);
+	}
+
+	__blk_mq_unfreeze_queue(q, true);
+}
+
 /**
  * disk_release - releases all allocated resources of the gendisk
  * @dev: the device representing this disk
@@ -1113,7 +1142,8 @@  static void disk_release(struct device *dev)
 	might_sleep();
 	WARN_ON_ONCE(disk_live(disk));
 
-	blk_mq_cancel_work_sync(disk->queue);
+	if (queue_is_mq(disk->queue))
+		blk_mq_release_queue(disk->queue);
 
 	/*
 	 * Remove all references to @q from the block cgroup controller before

[10/12] block: move blk_exit_queue into disk_release

Commit Message

Comments

Patch