Message ID | 20220222141450.591193-11-hch@lst.de |
---|---|
State | Superseded |
Headers | show |
Series | [01/12] blk-mq: do not include passthrough requests in I/O accounting | expand |
On Tue, Feb 22, 2022 at 10:29:47AM -0800, Bart Van Assche wrote: > On 2/22/22 06:14, Christoph Hellwig wrote: > > From: Ming Lei <ming.lei@redhat.com> > > > > There can't be FS IO in disk_release(), so move blk_exit_queue() there. > > > > We still need to freeze queue here since the request is freed after the > > bio is completed and passthrough request rely on scheduler tags as well. > > > > The disk can be released before or after queue is cleaned up, and we have > > to free the scheduler request pool before blk_cleanup_queue returns, > > while the static request pool has to be freed before exiting the > > I/O scheduler. > > This patch looks dubious to me because: > - The blk_freeze_queue() call in blk_cleanup_queue() waits for pending > requests to finish, so why to move blk_exit_queue() from > blk_cleanup_queue() into disk_release()? scsi disk may be released before calling blk_cleanup_queue(), and we want to tear down all FS related stuff(cgroup, rqos, elevator) in disk_release(). And FS bios have been drained already when releasing disk. > - I'm concerned that this patch will break user space, e.g. scripts that > try to unload an I/O scheduler kernel module immediately after having > removed a request queue. When removing a request queue, the associated disk has been removed already, and queue's kobject has been deleted too, so how can userspace unload I/O scheduler at that time? > > > +static void blk_mq_release_queue(struct request_queue *q) > > +{ > > + blk_mq_cancel_work_sync(q); > > + > > + /* > > + * There can't be any non non-passthrough bios in flight here, but > > + * requests stay around longer, including passthrough ones so we > > + * still need to freeze the queue here. > > + */ > > + blk_mq_freeze_queue(q); > > The above comment should be elaborated since what matters in this context is > not whether or not any bios are still in flight but what happens with the > request structures. Yeah, bios have been done, but request is done after bio is ended, see blk_update_request(), that is why we added blk_mq_freeze_queue() here. > As you know blk_queue_enter() fails after the DYING flag > has been set, a flag that is set by blk_cleanup_queue().blk_cleanup_queue() > already freezes the queue. So why is it necessary to call > blk_mq_freeze_queue() from blk_mq_release_queue()? disk may be released before calling blk_cleanup_queue(). But I admit here the name of blk_mq_release_queue() is very misleading, maybe blk_mq_release_io_queue() is better? Thanks, Ming
On 2/22/22 22:56, Ming Lei wrote: > But I admit here the name of blk_mq_release_queue() is very misleading, > maybe blk_mq_release_io_queue() is better? I'm not sure what the best name for that function would be. Anyway, thanks for having clarified that disk structures are removed before the request queue is cleaned up. That's something I was missing. Bart.
On Wed, Feb 23, 2022 at 12:04:03PM -0800, Bart Van Assche wrote: > On 2/22/22 22:56, Ming Lei wrote: >> But I admit here the name of blk_mq_release_queue() is very misleading, >> maybe blk_mq_release_io_queue() is better? > > I'm not sure what the best name for that function would be. Anyway, thanks > for having clarified that disk structures are removed before the request > queue is cleaned up. That's something I was missing. Maybe disk_release_mq?
On Thu, Feb 24, 2022 at 08:25:24AM +0100, Christoph Hellwig wrote: > On Wed, Feb 23, 2022 at 12:04:03PM -0800, Bart Van Assche wrote: > > On 2/22/22 22:56, Ming Lei wrote: > >> But I admit here the name of blk_mq_release_queue() is very misleading, > >> maybe blk_mq_release_io_queue() is better? > > > > I'm not sure what the best name for that function would be. Anyway, thanks > > for having clarified that disk structures are removed before the request > > queue is cleaned up. That's something I was missing. > > Maybe disk_release_mq? disk_release_mq() looks much better. Thanks, Ming
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 4ea22169b5186..faf8577578929 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -739,20 +739,6 @@ static void blk_free_queue_rcu(struct rcu_head *rcu_head) kmem_cache_free(blk_get_queue_kmem_cache(blk_queue_has_srcu(q)), q); } -/* Unconfigure the I/O scheduler and dissociate from the cgroup controller. */ -static void blk_exit_queue(struct request_queue *q) -{ - /* - * Since the I/O scheduler exit code may access cgroup information, - * perform I/O scheduler exit before disassociating from the block - * cgroup controller. - */ - if (q->elevator) { - ioc_clear_queue(q); - elevator_exit(q); - } -} - /** * blk_release_queue - releases all allocated resources of the request_queue * @kobj: pointer to a kobject, whose container is a request_queue @@ -786,8 +772,6 @@ static void blk_release_queue(struct kobject *kobj) blk_stat_remove_callback(q, q->poll_cb); blk_stat_free_callback(q->poll_cb); - blk_exit_queue(q); - blk_free_queue_stats(q->stats); kfree(q->poll_stat); diff --git a/block/genhd.c b/block/genhd.c index ebf0e0be1c545..40ef013382872 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -29,6 +29,7 @@ #include "blk.h" #include "blk-mq-sched.h" #include "blk-rq-qos.h" +#include "blk-cgroup.h" static struct kobject *block_depr; @@ -1092,6 +1093,34 @@ static const struct attribute_group *disk_attr_groups[] = { NULL }; +static void blk_mq_release_queue(struct request_queue *q) +{ + blk_mq_cancel_work_sync(q); + + /* + * There can't be any non non-passthrough bios in flight here, but + * requests stay around longer, including passthrough ones so we + * still need to freeze the queue here. + */ + blk_mq_freeze_queue(q); + + /* + * Since the I/O scheduler exit code may access cgroup information, + * perform I/O scheduler exit before disassociating from the block + * cgroup controller. + */ + if (q->elevator) { + ioc_clear_queue(q); + + mutex_lock(&q->sysfs_lock); + blk_mq_sched_free_rqs(q); + elevator_exit(q); + mutex_unlock(&q->sysfs_lock); + } + + __blk_mq_unfreeze_queue(q, true); +} + /** * disk_release - releases all allocated resources of the gendisk * @dev: the device representing this disk @@ -1113,7 +1142,8 @@ static void disk_release(struct device *dev) might_sleep(); WARN_ON_ONCE(disk_live(disk)); - blk_mq_cancel_work_sync(disk->queue); + if (queue_is_mq(disk->queue)) + blk_mq_release_queue(disk->queue); /* * Remove all references to @q from the block cgroup controller before