Message ID | 20220218184157.176457-1-melanieplageman@gmail.com |
---|---|
Headers | show |
Series | Add SCSI per device tagsets | expand |
On Fri, Feb 18, 2022 at 06:41:52PM +0000, Melanie Plageman (Microsoft) wrote: > Currently a single blk_mq_tag_set is associated with a Scsi_Host. When SCSI > controllers are limited, attaching multiple devices to the same controller is > required. In cloud environments with relatively high latency persistent storage, > requiring all devices on a controller to share a single blk_mq_tag_set > negatively impacts performance. So add more controllers instead of completely breaking the architecture.
On 18/02/2022 18:41, Melanie Plageman (Microsoft) wrote: > For example: a device provisioned with high IOPS and BW limits on the same > controller as a smaller and slower device can starve the slower device of tags. > This is especially noticeable when the slower device's workload has low I/O > depth tasks. If you check hctx_may_queue() in the block code then it is noticeable that a fair allocation of HW queue depth is allocated per request queue to ensure we don't get starvation. Thanks, John
On 2/18/22 10:41, Melanie Plageman (Microsoft) wrote: > Currently a single blk_mq_tag_set is associated with a Scsi_Host. When SCSI > controllers are limited, attaching multiple devices to the same controller is > required. In cloud environments with relatively high latency persistent storage, > requiring all devices on a controller to share a single blk_mq_tag_set > negatively impacts performance. > > For example: a device provisioned with high IOPS and BW limits on the same > controller as a smaller and slower device can starve the slower device of tags. > This is especially noticeable when the slower device's workload has low I/O > depth tasks. The Cc-list for this patch series is way too long. Cc-ing linux-scsi and the most active SCSI contributors would have been sufficient. Is the reported behavior reproducible with an upstream Linux kernel? I'm asking this because I think that the following block layer code should prevent the reported starvation: /* * For shared tag users, we track the number of currently active users * and attempt to provide a fair share of the tag depth for each of them. */ static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx, struct sbitmap_queue *bt) { unsigned int depth, users; if (!hctx || !(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED)) return true; /* * Don't try dividing an ant */ if (bt->sb.depth == 1) return true; if (blk_mq_is_shared_tags(hctx->flags)) { struct request_queue *q = hctx->queue; if (!test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags)) return true; } else { if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state)) return true; } users = atomic_read(&hctx->tags->active_queues); if (!users) return true; /* * Allow at least some tags */ depth = max((bt->sb.depth + users - 1) / users, 4U); return __blk_mq_active_requests(hctx) < depth; } Bart.