Message ID | 20250403211937.2225615-1-bvanassche@acm.org |
---|---|
Headers | show |
Series | Optimize the hot path in the UFS driver | expand |
On 03/04/2025 22:17, Bart Van Assche wrote: > Hi Martin, > > This patch series increases IOPS by 1% and reduces CPU per I/O by 10% on my > UFS 4.0 test setup. Please consider this patch series for the next merge > window. That cover letter is a bit thin for something which now implements SCSI reserved command handling (in addition to any UFS optimisation). > > Thanks, > > Bart. > > Bart Van Assche (23): > scsi: core: Make scsi_cmd_to_rq() accept const arguments > scsi: core: Make scsi_cmd_priv() accept const arguments > scsi: core: Use scsi_cmd_priv() instead of open-coding it > scsi: core: Introduce scsi_host_update_can_queue() > scsi: ufs: core: Change the type of one ufshcd_add_cmd_upiu_trace() > argument > scsi: ufs: core: Only call ufshcd_add_command_trace() for SCSI > commands > scsi: ufs: core: Change the type of one ufshcd_add_command_trace() > argument > scsi: ufs: core: Change the type of one ufshcd_send_command() argument > scsi: ufs: core: Only call ufshcd_should_inform_monitor() for SCSI > commands > scsi: ufs: core: Change the monitor function argument types > scsi: ufs: core: Rework ufshcd_mcq_compl_pending_transfer() > scsi: ufs: core: Rework ufshcd_eh_device_reset_handler() > scsi: ufs: core: Cache the DMA buffer sizes > scsi: ufs: core: Add an argument to ufshcd_mcq_decide_queue_depth() > scsi: ufs: core: Add an argument to ufshcd_alloc_mcq() > scsi: ufs: core: Call ufshcd_mcq_init() once > scsi: ufs: core: Allocate the SCSI host earlier > scsi: ufs: core: Call ufshcd_init_lrb() later > scsi: ufs: core: Use hba->reserved_slot > scsi: ufs: core: Allocate the reserved slot as a reserved request > scsi: ufs: core: Do not clear driver-private command data > scsi: ufs: core: Optimize the hot path > scsi: ufs: core: Remove the ufshcd_lrb task_tag member > > Hannes Reinecke (1): > scsi: core: Implement reserved command handling > > drivers/scsi/hosts.c | 3 + > drivers/scsi/scsi.c | 26 ++ > drivers/scsi/scsi_lib.c | 6 +- > drivers/scsi/scsi_logging.c | 10 +- > drivers/ufs/core/ufs-mcq.c | 31 +- > drivers/ufs/core/ufshcd-crypto.h | 18 +- > drivers/ufs/core/ufshcd-priv.h | 27 +- > drivers/ufs/core/ufshcd.c | 660 +++++++++++++++++-------------- > include/scsi/scsi_cmnd.h | 17 +- > include/scsi/scsi_host.h | 24 +- > include/ufs/ufshcd.h | 11 +- > 11 files changed, 487 insertions(+), 346 deletions(-) > >
On 4/4/25 3:15 AM, John Garry wrote: > On 03/04/2025 22:17, Bart Van Assche wrote: >> Hi Martin, >> >> This patch series increases IOPS by 1% and reduces CPU per I/O by 10% >> on my >> UFS 4.0 test setup. Please consider this patch series for the next merge >> window. > > That cover letter is a bit thin for something which now implements SCSI > reserved command handling (in addition to any UFS optimisation). Hi John, Here is a more detailed version: This patch series optimizes the hot path of the UFS driver by making struct scsi_cmnd and struct ufshcd_lrb adjacent. Making these two data structures adjacent is realized as follows: @@ -9040,6 +9046,7 @@ static const struct scsi_host_template ufshcd_driver_template = { .name = UFSHCD, .proc_name = UFSHCD, .map_queues = ufshcd_map_queues, + .cmd_size = sizeof(struct ufshcd_lrb), .init_cmd_priv = ufshcd_init_cmd_priv, .queuecommand = ufshcd_queuecommand, .mq_poll = ufshcd_poll, The following changes had to be made prior to making these two data structures adjacent: * Instead of making the reserved command slot (hba->reserved_slot) invisible to the SCSI core, let the SCSI core allocate memory for the reserved slot and tell the block layer to reserve one slot. This is why patch 04/24 adds minimal support in the SCSI core for reserved command handling. * Remove all UFS data structure members that are no longer needed because struct scsi_cmnd and struct ufshcd_lrb are now adjacent * Call ufshcd_init_lrb() from inside ufshcd_queuecommand() instead of calling this function before I/O starts. This is necessary because ufshcd_memory_alloc() allocates fewer instances than the block layer allocates requests. See also the following code in the block layer core: if (blk_mq_init_request(set, hctx->fq->flush_rq, hctx_idx, hctx->numa_node)) Although the UFS driver could be modified such that ufshcd_init_lrb() is called from ufshcd_init_cmd_priv(), realizing this would require moving the memory allocations that happen from inside ufshcd_memory_alloc() into ufshcd_init_cmd_priv(). That would make this patch series even larger. * ufshcd_add_scsi_host() happens now before any device management commands are submitted. This change is necessary because this patch makes device management command allocation happen when the SCSI host is allocated. * Start with allocating 32 command slots and increase this number later after it is clear whether or not the UFS device supports more than 32 command slots. Introduce scsi_host_update_can_queue() to support this approach. Bart.
> On 4/4/25 3:15 AM, John Garry wrote: > > On 03/04/2025 22:17, Bart Van Assche wrote: > >> Hi Martin, > >> > >> This patch series increases IOPS by 1% and reduces CPU per I/O by 10% > >> on my UFS 4.0 test setup. Please consider this patch series for the > >> next merge window. > > > > That cover letter is a bit thin for something which now implements > > SCSI reserved command handling (in addition to any UFS optimisation). > Hi John, > > Here is a more detailed version: > > This patch series optimizes the hot path of the UFS driver by making struct > scsi_cmnd and struct ufshcd_lrb adjacent. Making these two data structures > adjacent is realized as follows: > > @@ -9040,6 +9046,7 @@ static const struct scsi_host_template > ufshcd_driver_template = { > .name = UFSHCD, > .proc_name = UFSHCD, > .map_queues = ufshcd_map_queues, > + .cmd_size = sizeof(struct ufshcd_lrb), > .init_cmd_priv = ufshcd_init_cmd_priv, > .queuecommand = ufshcd_queuecommand, > .mq_poll = ufshcd_poll, > > The following changes had to be made prior to making these two data > structures adjacent: > * Instead of making the reserved command slot (hba->reserved_slot) > invisible to the SCSI core, let the SCSI core allocate memory for the > reserved slot and tell the block layer to reserve one slot. This is > why patch 04/24 adds minimal support in the SCSI core for reserved > command handling. > * Remove all UFS data structure members that are no longer needed > because struct scsi_cmnd and struct ufshcd_lrb are now adjacent > * Call ufshcd_init_lrb() from inside ufshcd_queuecommand() instead of > calling this function before I/O starts. This is necessary because > ufshcd_memory_alloc() allocates fewer instances than the block layer > allocates requests. See also the following code in the block layer > core: > > if (blk_mq_init_request(set, hctx->fq->flush_rq, hctx_idx, > hctx->numa_node)) > > Although the UFS driver could be modified such that ufshcd_init_lrb() > is called from ufshcd_init_cmd_priv(), realizing this would require > moving the memory allocations that happen from inside > ufshcd_memory_alloc() into ufshcd_init_cmd_priv(). That would make > this patch series even larger. > * ufshcd_add_scsi_host() happens now before any device management > commands are submitted. This change is necessary because this patch > makes device management command allocation happen when the SCSI host > is allocated. > * Start with allocating 32 command slots and increase this number later > after it is clear whether or not the UFS device supports more than 32 > command slots. Introduce scsi_host_update_can_queue() to support this > approach. Maybe add one extra sentence to your commit log, now that ufshcd_init_lrb() is called on each command: The benefits of reduced indirection and better cache efficiency outweigh the small overhead of per-command lrb initialization in most high-throughput workloads. Thanks, Avri > > Bart.