Message ID | 20250613062909.2505759-3-dlemoal@kernel.org |
---|---|
State | Superseded |
Headers | show |
Series | Improve optimal IO size initialization | expand |
On 6/13/25 23:31, John Garry wrote: > On 13/06/2025 07:29, Damien Le Moal wrote: >> Introduce the helper function sd_set_io_opt() to set a disk io_opt >> limit. This new way of setting this limit falls back to using the >> max_sectors limit if the host does not define an optimal sector limit >> and the device did not indicate an optimal transfer size (e.g. as is >> the case for ATA devices). io_opt calculation is done using a local >> 64-bits variable to avoid overflows. The final value is clamped to >> UINT_MAX aligned down to the device physical block size. >> >> This fallback io_opt limit avoids setting up the disk with a zero >> io_opt limit, which result in the rather small 128 KB read_ahead_kb >> attribute. The larger read_ahead_kb value set with the default non-zero >> io_opt limit significantly improves buffered read performance with file >> systems without any intervention from the user. > > Out of curiosity, why do this just for sd.c and not always set up the > default like this in blk_validate_limits()? Good point. Though I think we do not want to have a large io_opt for slow devices like MMC/SD Cards. So something like this, which is indeed simpler than hacking lim->io_opt in sd.c. diff --git a/block/blk-settings.c b/block/blk-settings.c index a000daafbfb4..d3ec6f4100f4 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -58,16 +58,24 @@ EXPORT_SYMBOL(blk_set_stacking_limits); void blk_apply_bdi_limits(struct backing_dev_info *bdi, struct queue_limits *lim) { + u64 io_opt = lim->io_opt; + /* * For read-ahead of large files to be effective, we need to read ahead - * at least twice the optimal I/O size. + * at least twice the optimal I/O size. For rotational devices that do + * not report an optimal I/O size (e.g. ATA HDDs), use the maximum I/O + * size to avoid falling back to the (rather inefficient) small default + * read-ahead size. * * There is no hardware limitation for the read-ahead size and the user * might have increased the read-ahead size through sysfs, so don't ever * decrease it. */ + if (!io_opt && (lim->features & BLK_FEAT_ROTATIONAL)) + io_opt = lim->max_sectors; + bdi->ra_pages = max3(bdi->ra_pages, - lim->io_opt * 2 / PAGE_SIZE, + io_opt * 2 >> PAGE_SHIFT, VM_READAHEAD_PAGES); bdi->io_pages = lim->max_sectors >> PAGE_SECTORS_SHIFT; } I will make a proper patch of this and send it out as a replacement.
On 6/16/25 14:34, Damien Le Moal wrote: > On 6/13/25 23:31, John Garry wrote: >> On 13/06/2025 07:29, Damien Le Moal wrote: >>> Introduce the helper function sd_set_io_opt() to set a disk io_opt >>> limit. This new way of setting this limit falls back to using the >>> max_sectors limit if the host does not define an optimal sector limit >>> and the device did not indicate an optimal transfer size (e.g. as is >>> the case for ATA devices). io_opt calculation is done using a local >>> 64-bits variable to avoid overflows. The final value is clamped to >>> UINT_MAX aligned down to the device physical block size. >>> >>> This fallback io_opt limit avoids setting up the disk with a zero >>> io_opt limit, which result in the rather small 128 KB read_ahead_kb >>> attribute. The larger read_ahead_kb value set with the default non-zero >>> io_opt limit significantly improves buffered read performance with file >>> systems without any intervention from the user. >> >> Out of curiosity, why do this just for sd.c and not always set up the >> default like this in blk_validate_limits()? > > Good point. Though I think we do not want to have a large io_opt for slow > devices like MMC/SD Cards. So something like this, which is indeed simpler than > hacking lim->io_opt in sd.c. > > diff --git a/block/blk-settings.c b/block/blk-settings.c > index a000daafbfb4..d3ec6f4100f4 100644 > --- a/block/blk-settings.c > +++ b/block/blk-settings.c > @@ -58,16 +58,24 @@ EXPORT_SYMBOL(blk_set_stacking_limits); > void blk_apply_bdi_limits(struct backing_dev_info *bdi, > struct queue_limits *lim) > { > + u64 io_opt = lim->io_opt; > + > /* > * For read-ahead of large files to be effective, we need to read ahead > - * at least twice the optimal I/O size. > + * at least twice the optimal I/O size. For rotational devices that do > + * not report an optimal I/O size (e.g. ATA HDDs), use the maximum I/O > + * size to avoid falling back to the (rather inefficient) small default > + * read-ahead size. > * > * There is no hardware limitation for the read-ahead size and the user > * might have increased the read-ahead size through sysfs, so don't ever > * decrease it. > */ > + if (!io_opt && (lim->features & BLK_FEAT_ROTATIONAL)) > + io_opt = lim->max_sectors; Oops... This should of course be: io_opt = (u64)lim->max_sectors << SECTOR_SHIFT; > + > bdi->ra_pages = max3(bdi->ra_pages, > - lim->io_opt * 2 / PAGE_SIZE, > + io_opt * 2 >> PAGE_SHIFT, > VM_READAHEAD_PAGES); > bdi->io_pages = lim->max_sectors >> PAGE_SECTORS_SHIFT; > } > > I will make a proper patch of this and send it out as a replacement. >
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index daddef2e9e87..8070356285a7 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -3681,6 +3681,40 @@ static void sd_read_block_zero(struct scsi_disk *sdkp) kfree(buffer); } +/* + * Set the optimal I/O size: limit the default to the SCSI host optimal sector + * limit if it is set. There may be an impact on performance when the size of + * a request exceeds this host limit. If the host did not set any optimal + * sector limit and the device did not indicate an optimal transfer size + * (e.g. ATA devices), default to using the device max_sectors limit. + */ +static void sd_set_io_opt(struct scsi_disk *sdkp, unsigned int dev_max, + struct queue_limits *lim) +{ + struct scsi_device *sdp = sdkp->device; + struct Scsi_Host *shost = sdp->host; + u64 io_opt; + + io_opt = (u64)shost->opt_sectors << SECTOR_SHIFT; + if (sd_validate_opt_xfer_size(sdkp, dev_max)) + io_opt = min_not_zero(io_opt, + logical_to_bytes(sdp, sdkp->opt_xfer_blocks)); + if (io_opt) { + lim->io_opt = ALIGN_DOWN(min_t(u64, io_opt, UINT_MAX), + sdkp->physical_block_size - 1); + return; + } + + /* Set default */ + io_opt = (u64)lim->max_sectors << SECTOR_SHIFT; + lim->io_opt = ALIGN_DOWN(min_t(u64, io_opt, UINT_MAX), + sdkp->physical_block_size - 1); + + sd_first_printk(KERN_INFO, sdkp, + "Using default optimal transfer size of %u bytes\n", + lim->io_opt); +} + /** * sd_revalidate_disk - called the first time a new disk is seen, * performs disk spin up, read_capacity, etc. @@ -3777,16 +3811,7 @@ static int sd_revalidate_disk(struct gendisk *disk) else lim.io_min = 0; - /* - * Limit default to SCSI host optimal sector limit if set. There may be - * an impact on performance for when the size of a request exceeds this - * host limit. - */ - lim.io_opt = sdp->host->opt_sectors << SECTOR_SHIFT; - if (sd_validate_opt_xfer_size(sdkp, dev_max)) { - lim.io_opt = min_not_zero(lim.io_opt, - logical_to_bytes(sdp, sdkp->opt_xfer_blocks)); - } + sd_set_io_opt(sdkp, dev_max, &lim); sdkp->first_scan = 0;