Message ID | 20211119195743.2817-1-bvanassche@acm.org |
---|---|
Headers | show |
Series | UFS patches for kernel v5.17 | expand |
On 11/30/21 7:40 AM, Bean Huo wrote: > It is the test case in your commit message. If iodepth=1, there is no > performance improvement. Increase the io-depth to multiple, for > example, let iodepth= CPU core counter. I see that the interrupt > overhead is significantly increased when the request is completed, so > IO_polling will win compared to the interrupt mode. Hi Bean, It is not guaranteed that polling will result in a significant performance improvement. I assume that on my test setup the improvement is so significant because the interrupt coalescing delay. Maybe the interrupt coalescing delay is much smaller on your test setup. Bart.
On 11/30/21 12:43 AM, Bean Huo wrote: > On Fri, 2021-11-19 at 11:57 -0800, Bart Van Assche wrote: >> + return IRQ_HANDLED; >> } >> > > ufshcd_transfer_req_compl() will never return IRQ_NONE, > but ufshcd_intr() needs this return to check the fake interrupt. I don't think that it is possible to implement polling and detect spurious interrupts without affecting performance negatively. Hence the choice to never return IRQ_NONE. Is spurious interrupt detection that important or is this only required for debugging host controllers? I consider helping with hardware debugging as out of scope for the UFS driver. >> @@ -9437,6 +9481,7 @@ int ufshcd_alloc_host(struct device *dev, >> struct ufs_hba **hba_handle) >> err = -ENOMEM; >> goto out_error; >> } >> + host->nr_maps = 3; > > I don't understand why not directly uses HCTX_MAX_TYPES, 3 is already > maximum. If new map types would be introduced in the future, we still need 3 maps. Hence the choice for '3' instead of HCTX_MAX_TYPES. Thanks, Bart.
On 30/11/2021 19:51, Bart Van Assche wrote: > On 11/29/21 10:41 PM, Adrian Hunter wrote: >> On 29/11/2021 21:32, Bart Van Assche wrote: >>> * The code in blk_cleanup_queue() that waits for pending requests to finish >>> before resources associated with the request queue are freed. >>> ufshcd_remove() calls blk_cleanup_queue(hba->cmd_queue) and hence waits until >>> pending device management commands have finished. That would no longer be the >>> case if the block layer is bypassed to submit device management commands. >> >> cmd_queue is used only by the UFS driver, so if the driver is racing with >> itself at "remove", then that should be fixed. The risk is not that the UFS >> driver might use requests, but that it might still be operating when hba or >> other resources get freed. >> >> So the question remains, for device commands, we do not need the block >> layer, nor SCSI commands which still begs the question, why involve them >> at all? > > By using the block layer request allocation functions the block layer guarantees > that each tag is in use in only one context. When bypassing the block layer code > would have to be inserted in ufshcd_exec_dev_cmd() and ufshcd_issue_devman_upiu_cmd() > to serialize these functions. They already are serialized, but you are essentially saying the functionality being duplicated is just a lock. What you are proposing seems awfully complicated just to get the functionality of a lock. > In other words, we would be duplicating existing > functionality if we bypass the block layer. The recommended approach in the Linux > kernel is not to duplicate existing functionality. More accurately, the functionality would not be being used at all, so not really any duplication.
On 11/30/21 11:15 AM, Adrian Hunter wrote: > On 30/11/2021 19:51, Bart Van Assche wrote: >> By using the block layer request allocation functions the block layer guarantees >> that each tag is in use in only one context. When bypassing the block layer code >> would have to be inserted in ufshcd_exec_dev_cmd() and ufshcd_issue_devman_upiu_cmd() >> to serialize these functions. > > They already are serialized, but you are essentially saying the functionality > being duplicated is just a lock. What you are proposing seems awfully > complicated just to get the functionality of a lock. Are you perhaps referring to hba->dev_cmd.lock? I had overlooked that lock when I wrote my previous email. I will take a closer look at the reserved slot approach. Thanks, Bart.