Message ID | be78dc2cfeecaafd171060fbebda2d268d2a94e5.camel@suse.com |
---|---|
State | New |
Headers | show |
Series | mpt3sas fails to allocate budget_map and detects no devices | expand |
On Thu, 2022-01-06 at 23:41 +0800, Ming Lei wrote: > On Thu, Jan 06, 2022 at 03:22:53PM +0000, Martin Wilck wrote: > > > > > > I'd suggest to fix mpt3sas for avoiding this memory waste. > > > > Let's wait for Sreekanth's comment on that. > > > > mpt3sas is not the only driver using a low value. Qlogic drivers > > set > > cmd_per_lun=3, for example (with 3, our logic would use shift=6, so > > the > > issue I observed wouldn't occur - but it would be prone to cache > > line > > bouncing). > > But qlogic has smaller .can_queue which looks at most 512, .can_queue > is > the depth for allocating sbitmap, since each sdev->queue_depth is <= > .can_queue. I'm seeing here (on an old kernel, admittedly) cmd_per_lun=3 and can_queue=2038 for qla2xxx and cmd_per_lun=3 and can_queue=5884 for lpfc. Both drivers change the queue depth for devices to 64 in their slave_configure() methods. Many drivers do this, as it's recommended in scsi_host.h. That's quite bad in view of the current bitmap allocation logic - we lay out the bitmap assuming the depth used will be cmd_per_lun, but that doesn't match the actual depth when the device comes online. For qla2xxx, it means that we'd allocate the sbitmap with shift=6 (64 bits per word), thus using just a single cache line for 64 requests. Shift=4 (16 bits per word) would be the default shift for depth 64. Am I misreading the code? Perhaps we should only allocate a preliminary sbitmap in scsi_alloc_sdev, and reallocate it after slave_configure() has been called, to get the shift right for the driver's default settings? Thanks, Martin
On Mon, 2022-01-10 at 10:59 +0800, Ming Lei wrote: > > Hello Martin Wilck, > > Can you test the following change and report back the result? > > From 480a61a85e9669d3487ebee8db3d387df79279fc Mon Sep 17 00:00:00 > 2001 > From: Ming Lei <ming.lei@redhat.com> > Date: Mon, 10 Jan 2022 10:26:59 +0800 > Subject: [PATCH] scsi: core: reallocate scsi device's budget map if > default > queue depth is changed > > Martin reported that sdev->queue_depth can often be changed in > ->slave_configure(), and now we uses ->cmd_per_lun as initial queue > depth for setting up sdev->budget_map. And some extreme ->cmd_per_lun > or ->can_queue won't be used at default actually, if we they are used > to allocate sdev->budget_map, huge memory may be consumed just > because > of bad ->cmd_per_lun. > > Fix the issue by reallocating sdev->budget_map after - > >slave_configure() > returns, at that time, queue_depth should be much more reasonable. > > Reported-by: Martin Wilck <martin.wilck@suse.com> > Suggested-by: Martin Wilck <martin.wilck@suse.com> > Signed-off-by: Ming Lei <ming.lei@redhat.com> This looks good. I added a few pr_infos, and for the strange mpt3sas devices I reported, I get this: # first allocation with depth=7 (cmds_per_lun) Jan 12 17:05:52 localhost kernel: scsi_realloc_sdev_budget_map: 7 0->0 (these numbers are: depth old_shift->new_shift) Jan 12 17:05:52 localhost kernel: scsi_realloc_sdev_budget_map: map_nr = 1024 # after slave_alloc() with depth 254 Jan 12 17:05:52 localhost kernel: scsi_realloc_sdev_budget_map: 254 0->5 Jan 12 17:05:52 localhost kernel: scsi_realloc_sdev_budget_map: map_nr = 32 So the depth changed from 7 to 254, the shift from 0 to 5, and the memory size of the sbitmap was reduced by a factor of 32. Nice! Tested-by: Martin Wilck <mwilck@suse.com> Reviewed-by: Martin Wilck <mwilck@suse.com> > --- > drivers/scsi/scsi_scan.c | 56 ++++++++++++++++++++++++++++++++++++-- > -- > 1 file changed, 51 insertions(+), 5 deletions(-) > > diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c > index 23e1c0acdeae..9593c9111611 100644 > --- a/drivers/scsi/scsi_scan.c > +++ b/drivers/scsi/scsi_scan.c > @@ -214,6 +214,48 @@ static void scsi_unlock_floptical(struct > scsi_device *sdev, > SCSI_TIMEOUT, 3, NULL); > } > > +static int scsi_realloc_sdev_budget_map(struct scsi_device *sdev, > + unsigned int depth) > +{ > + int new_shift = sbitmap_calculate_shift(depth); > + bool need_alloc = !sdev->budget_map.map; > + bool need_free = false; > + int ret; > + struct sbitmap sb_back; > + > + /* > + * realloc if new shift is calculated, which is caused by > setting > + * up one new default queue depth after calling - > >slave_configure > + */ > + if (!need_alloc && new_shift != sdev->budget_map.shift) > + need_alloc = need_free = true; > + > + if (!need_alloc) > + return 0; > + > + /* > + * Request queue has to be freezed for reallocating budget > map, > + * and here disk isn't added yet, so freezing is pretty fast > + */ > + if (need_free) { > + blk_mq_freeze_queue(sdev->request_queue); > + sb_back = sdev->budget_map; > + } > + ret = sbitmap_init_node(&sdev->budget_map, > + scsi_device_max_queue_depth(sdev), > + new_shift, GFP_KERNEL, > + sdev->request_queue->node, false, > true); > + if (need_free) { > + if (ret) > + sdev->budget_map = sb_back; > + else > + sbitmap_free(&sb_back); > + ret = 0; > + blk_mq_unfreeze_queue(sdev->request_queue); > + } > + return ret; > +} > + > /** > * scsi_alloc_sdev - allocate and setup a scsi_Device > * @starget: which target to allocate a &scsi_device for > @@ -306,11 +348,7 @@ static struct scsi_device > *scsi_alloc_sdev(struct scsi_target *starget, > * default device queue depth to figure out sbitmap shift > * since we use this queue depth most of times. > */ > - if (sbitmap_init_node(&sdev->budget_map, > - scsi_device_max_queue_depth(sdev), > - sbitmap_calculate_shift(depth), > - GFP_KERNEL, sdev->request_queue- > >node, > - false, true)) { > + if (scsi_realloc_sdev_budget_map(sdev, depth)) { > put_device(&starget->dev); > kfree(sdev); > goto out; > @@ -1017,6 +1055,14 @@ static int scsi_add_lun(struct scsi_device > *sdev, unsigned char *inq_result, > } > return SCSI_SCAN_NO_RESPONSE; > } > + > + /* > + * queue_depth is often changed in ->slave_configure, > so > + * setup budget map again for getting better memory > uses > + * since memory consumption of the map depends on > queue > + * depth heavily > + */ > + scsi_realloc_sdev_budget_map(sdev, sdev- > >queue_depth); > } > > if (sdev->scsi_level >= SCSI_3) > -- > 2.31.1 > > >
diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c index 23e1c0acdeae..3e51f8183b42 100644 --- a/drivers/scsi/scsi_scan.c +++ b/drivers/scsi/scsi_scan.c @@ -231,7 +231,8 @@ static void scsi_unlock_floptical(struct scsi_device *sdev, static struct scsi_device *scsi_alloc_sdev(struct scsi_target *starget, u64 lun, void *hostdata) { - unsigned int depth; + static const unsigned int BITMAP_CHUNKS = 16; + unsigned int depth, sb_map_chunks, sb_depth; struct scsi_device *sdev; struct request_queue *q; int display_failure_msg = 1, ret; @@ -298,17 +299,26 @@ static struct scsi_device *scsi_alloc_sdev(struct scsi_target *starget, __scsi_init_queue(sdev->host, q); WARN_ON_ONCE(!blk_get_queue(q)); - depth = sdev->host->cmd_per_lun ?: 1; - /* * Use .can_queue as budget map's depth because we have to * support adjusting queue depth from sysfs. Meantime use * default device queue depth to figure out sbitmap shift * since we use this queue depth most of times. + * Avoid extremely small shift if cmd_per_lun is small but + * greater than 3 (see sbitmap_calculate_shift()). + * Assume that usually no more than BITMAP_CHUNKS + * CPUs will access this bitmap simultaneously. */ + depth = sdev->host->cmd_per_lun ?: 1; + sb_map_chunks = max_t(unsigned int, 1U, + min_t(unsigned int, nr_cpu_ids, BITMAP_CHUNKS)); + sb_depth = max_t(unsigned int, + scsi_device_max_queue_depth(sdev) / sb_map_chunks, + depth); + if (sbitmap_init_node(&sdev->budget_map, scsi_device_max_queue_depth(sdev), - sbitmap_calculate_shift(depth), + sbitmap_calculate_shift(sb_depth), GFP_KERNEL, sdev->request_queue->node, false, true)) { put_device(&starget->dev);