Message ID | 20220226230435.38733-1-michael.christie@oracle.com |
---|---|
Headers | show |
Series | iscsi: Speed up failover with lots of devices. | expand |
Hey Lee, On 2/26/22 5:04 PM, Mike Christie wrote: > In: > > https://lore.kernel.org/all/CAK3e-EZbJMDHkozGiz8LnMNAZ+SoCA+QeK0kpkqM4vQ4pz86SQ@mail.gmail.com/t/ > > Zhengyuan Liu found an issue where failovers are taking a long time > with lots of devices (/dev/sdXYZ nodes). The problem is that iscsid > expects most nl operations to be fast (ignoring mem issues) and when > the session block code was written blocking a queue/scsi_device was > just setting some flag bits and state values more or less. Now a block > call will actually handle IO that has been sent to the driver, so it > can be expensive. When you add in more and more devices, then a > session block call will take longer and longer. > > This patchset moves the recovery and unbind operations to a per > session work queue instead of the mix or per session, host and module. If you get to the end of the patchset and wonder if there is a patch missing, there is :) I have one more patchset that is related to this, but not required for to handle Zhengyuan Liu's issue. I think I can also kill iscsi_conn_cleanup_workq and use the iscsi wq instead, but I want to think about it some more and test it out. And since it's not needed to handle the issue in the thread below, it should be ok to do separately. It might just be a simple kill iscsi_conn_cleanup_workq and use the iscsi wq, or I might be able to go more invasive and drop some code. I'm not sure yet.
Mike, > Zhengyuan Liu found an issue where failovers are taking a long time > with lots of devices (/dev/sdXYZ nodes). The problem is that iscsid > expects most nl operations to be fast (ignoring mem issues) and when > the session block code was written blocking a queue/scsi_device was > just setting some flag bits and state values more or less. Now a block > call will actually handle IO that has been sent to the driver, so it > can be expensive. When you add in more and more devices, then a > session block call will take longer and longer. > > This patchset moves the recovery and unbind operations to a per > session work queue instead of the mix or per session, host and module. Applied to 5.18/scsi-staging, thanks!
On Sat, 26 Feb 2022 17:04:29 -0600, Mike Christie wrote: > In: > > https://lore.kernel.org/all/CAK3e-EZbJMDHkozGiz8LnMNAZ+SoCA+QeK0kpkqM4vQ4pz86SQ@mail.gmail.com/t/ > > Zhengyuan Liu found an issue where failovers are taking a long time > with lots of devices (/dev/sdXYZ nodes). The problem is that iscsid > expects most nl operations to be fast (ignoring mem issues) and when > the session block code was written blocking a queue/scsi_device was > just setting some flag bits and state values more or less. Now a block > call will actually handle IO that has been sent to the driver, so it > can be expensive. When you add in more and more devices, then a > session block call will take longer and longer. > > [...] Applied to 5.18/scsi-queue, thanks! [1/6] scsi: iscsi: Fix recovery and ublocking race. https://git.kernel.org/mkp/scsi/c/8dd3dff3bf3e [2/6] scsi: iscsi: Speed up session unblocking and removal. https://git.kernel.org/mkp/scsi/c/b07c348f8ffb [3/6] scsi: iscsi: Remove iscsi_scan_finished. https://git.kernel.org/mkp/scsi/c/d8ec5d67b8bb [4/6] scsi: iscsi, ql4: Use per session workqueue for unbinding. https://git.kernel.org/mkp/scsi/c/5842ea366831 [5/6] scsi: iscsi: Use the session workqueue for recovery. https://git.kernel.org/mkp/scsi/c/7cb6683ce761 [6/6] scsi: iscsi: Drop temp workq_name. https://git.kernel.org/mkp/scsi/c/69af1c9577aa