Message ID | 20220308002747.122682-9-michael.christie@oracle.com |
---|---|
State | Superseded |
Headers | show |
Series | misc iscsi patches | expand |
On 2022/3/8 8:27, Mike Christie wrote: > The patch: > > commit 5923d64b7ab6 ("scsi: libiscsi: Drop taskqueuelock") > > added an extra task->state because for > > commit 6f8830f5bbab ("scsi: libiscsi: add lock around task lists to fix > list corruption regression") > > we didn't know why we ended up with cmds on the list and thought it > might have been a bad target sending a response while we were still > sending the cmd. We were never able to get a target to send us a response > early, because it turns out the bug was just a race in libiscsi/ > libiscsi_tcp where > > 1. iscsi_tcp_r2t_rsp queues a r2t to tcp_task->r2tqueue. > 2. iscsi_tcp_task_xmit runs iscsi_tcp_get_curr_r2t and sees we have a r2t. > It dequeues it and iscsi_tcp_task_xmit starts to process it. > 3. iscsi_tcp_r2t_rsp runs iscsi_requeue_task and puts the task on the > requeue list. > 4. iscsi_tcp_task_xmit sends the data for r2t. This is the final chunk of > data, so the cmd is done. > 5. target sends the response. > 6. On a different CPU from #3, iscsi_complete_task processes the response. > Since there was no common lock for the list, the lists/tasks pointers are > not fully in sync, so could end up with list corruption. > > Since it was just a race on our side, this patch removes the extra check > and fixes up the comments. > > Reviewed-by: Lee Duncan <lduncan@suse.com> > Signed-off-by: Mike Christie <michael.christie@oracle.com> > --- > drivers/scsi/libiscsi.c | 13 ++++++++----- > 1 file changed, 8 insertions(+), 5 deletions(-) > > diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c > index 0a0076144874..5c74ab92725f 100644 > --- a/drivers/scsi/libiscsi.c > +++ b/drivers/scsi/libiscsi.c > @@ -567,16 +567,19 @@ static bool cleanup_queued_task(struct iscsi_task *task) > struct iscsi_conn *conn = task->conn; > bool early_complete = false; > > - /* Bad target might have completed task while it was still running */ > + /* > + * We might have raced where we handled a R2T early and got a response > + * but have not yet taken the task off the requeue list, then a TMF or > + * recovery happened and so we can still see it here. > + */ > if (task->state == ISCSI_TASK_COMPLETED) > early_complete = true; > > if (!list_empty(&task->running)) { > list_del_init(&task->running); > /* > - * If it's on a list but still running, this could be from > - * a bad target sending a rsp early, cleanup from a TMF, or > - * session recovery. > + * If it's on a list but still running this could be cleanup > + * from a TMF or session recovery. > */ > if (task->state == ISCSI_TASK_RUNNING || > task->state == ISCSI_TASK_COMPLETED) > @@ -1484,7 +1487,7 @@ static int iscsi_xmit_task(struct iscsi_conn *conn, struct iscsi_task *task, > } > /* regular RX path uses back_lock */ > spin_lock(&conn->session->back_lock); > - if (rc && task->state == ISCSI_TASK_RUNNING) { > + if (rc) { > /* > * get an extra ref that is released next time we access it > * as conn->task above. > Reviewed-by: Wu Bo <wubo40@huawei.com>
On 3/7/22 16:27, Mike Christie wrote: > The patch: > > commit 5923d64b7ab6 ("scsi: libiscsi: Drop taskqueuelock") > > added an extra task->state because for > > commit 6f8830f5bbab ("scsi: libiscsi: add lock around task lists to fix > list corruption regression") > > we didn't know why we ended up with cmds on the list and thought it > might have been a bad target sending a response while we were still > sending the cmd. We were never able to get a target to send us a response > early, because it turns out the bug was just a race in libiscsi/ > libiscsi_tcp where > > 1. iscsi_tcp_r2t_rsp queues a r2t to tcp_task->r2tqueue. > 2. iscsi_tcp_task_xmit runs iscsi_tcp_get_curr_r2t and sees we have a r2t. > It dequeues it and iscsi_tcp_task_xmit starts to process it. > 3. iscsi_tcp_r2t_rsp runs iscsi_requeue_task and puts the task on the > requeue list. > 4. iscsi_tcp_task_xmit sends the data for r2t. This is the final chunk of > data, so the cmd is done. > 5. target sends the response. > 6. On a different CPU from #3, iscsi_complete_task processes the response. > Since there was no common lock for the list, the lists/tasks pointers are > not fully in sync, so could end up with list corruption. > > Since it was just a race on our side, this patch removes the extra check > and fixes up the comments. > > Reviewed-by: Lee Duncan <lduncan@suse.com> > Signed-off-by: Mike Christie <michael.christie@oracle.com> > --- > drivers/scsi/libiscsi.c | 13 ++++++++----- > 1 file changed, 8 insertions(+), 5 deletions(-) > > diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c > index 0a0076144874..5c74ab92725f 100644 > --- a/drivers/scsi/libiscsi.c > +++ b/drivers/scsi/libiscsi.c > @@ -567,16 +567,19 @@ static bool cleanup_queued_task(struct iscsi_task *task) > struct iscsi_conn *conn = task->conn; > bool early_complete = false; > > - /* Bad target might have completed task while it was still running */ > + /* > + * We might have raced where we handled a R2T early and got a response > + * but have not yet taken the task off the requeue list, then a TMF or > + * recovery happened and so we can still see it here. > + */ > if (task->state == ISCSI_TASK_COMPLETED) > early_complete = true; > > if (!list_empty(&task->running)) { > list_del_init(&task->running); > /* > - * If it's on a list but still running, this could be from > - * a bad target sending a rsp early, cleanup from a TMF, or > - * session recovery. > + * If it's on a list but still running this could be cleanup > + * from a TMF or session recovery. > */ > if (task->state == ISCSI_TASK_RUNNING || > task->state == ISCSI_TASK_COMPLETED) > @@ -1484,7 +1487,7 @@ static int iscsi_xmit_task(struct iscsi_conn *conn, struct iscsi_task *task, > } > /* regular RX path uses back_lock */ > spin_lock(&conn->session->back_lock); > - if (rc && task->state == ISCSI_TASK_RUNNING) { > + if (rc) { > /* > * get an extra ref that is released next time we access it > * as conn->task above. Reviewed-by: Lee Duncan <lduncan@suse.com>
diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c index 0a0076144874..5c74ab92725f 100644 --- a/drivers/scsi/libiscsi.c +++ b/drivers/scsi/libiscsi.c @@ -567,16 +567,19 @@ static bool cleanup_queued_task(struct iscsi_task *task) struct iscsi_conn *conn = task->conn; bool early_complete = false; - /* Bad target might have completed task while it was still running */ + /* + * We might have raced where we handled a R2T early and got a response + * but have not yet taken the task off the requeue list, then a TMF or + * recovery happened and so we can still see it here. + */ if (task->state == ISCSI_TASK_COMPLETED) early_complete = true; if (!list_empty(&task->running)) { list_del_init(&task->running); /* - * If it's on a list but still running, this could be from - * a bad target sending a rsp early, cleanup from a TMF, or - * session recovery. + * If it's on a list but still running this could be cleanup + * from a TMF or session recovery. */ if (task->state == ISCSI_TASK_RUNNING || task->state == ISCSI_TASK_COMPLETED) @@ -1484,7 +1487,7 @@ static int iscsi_xmit_task(struct iscsi_conn *conn, struct iscsi_task *task, } /* regular RX path uses back_lock */ spin_lock(&conn->session->back_lock); - if (rc && task->state == ISCSI_TASK_RUNNING) { + if (rc) { /* * get an extra ref that is released next time we access it * as conn->task above.