Message ID | 20250220130546.2289555-2-yangxingui@huawei.com |
---|---|
State | New |
Headers | show |
Series | scsi: hisi_sas: Fixed IO error caused by port id not updated | expand |
Hi, John On 2025/2/21 1:35, John Garry wrote: > On 20/02/2025 13:05, Xingui Yang wrote: > > -john.garry@huawei.com (this has not worked in over 2 years ...) Sorry, I used the wrong one. > >> the SAS controller determines the disk to which I/Os are delivered based >> on the port id in the DQ entry when SATA disk directly connected. >> >> When many phys were disconnected immediately and connected again during >> I/O sending and port id of phys were changed and used by other link, I/O >> may be sent to incorrect disk and data inconsistency on the SATA disk may > > > So is the disk reported gone (from libsas point-of-view) after you > unplug? If not, why not? The problem may occur in a scenario where multiple SATA disks are inserted almost at the same time. When phy reset is executed in error processing, other phys are also up, which may cause the hw port id corresponding to the phy to change. The log is as follows: [ 4588.608924] hisi_sas_v3_hw 0000:b4:02.0: phyup: phy2 link_rate=10(sata) [ 4588.609039] sas: phy-8:2 added to port-8:4, phy_mask:0x4 (5000000000000802) [ 4588.609267] sas: DOING DISCOVERY on port 4, pid:69294 [ 4588.609276] hisi_sas_v3_hw 0000:b4:02.0: dev[13:5] found [ 4588.671362] sas: ata40: end_device-8:4: dev error handler [ 4588.846387] hisi_sas_v3_hw 0000:b4:02.0: phydown: phy2 phy_state=0xc3 // phy2's hw port id assign by chip is released [ 4588.846393] hisi_sas_v3_hw 0000:b4:02.0: ignore flutter phy2 down [ 4588.919837] hisi_sas_v3_hw 0000:b4:02.0: phyup: phy3 link_rate=10(sata) // phy3 is assigned the hw port id previously used by phy2 [ 4589.029656] hisi_sas_v3_hw 0000:b4:02.0: phyup: phy2 link_rate=10(sata) // phy2's hw port id is assigned a new one [ 4589.220662] ata40.00: ATA-9: HUH721010ALE600, T3C0, max UDMA/133 [ 4589.220666] ata40.00: 19532873728 sectors, multi 0: LBA48 NCQ (depth 32), AA [ 4589.233022] ata40.00: configured for UDMA/133 In view of the situation corresponding to the above log, the hisi_sas_port.id corresponding to phy2 has not been updated, and the old port id is still used, which will cause the IO delivered to phy2 to be abnormally delivered to the disk of phy3. After force phy, the chip will check whether the phy information matches the port id and intercept this abnormal IO. Thanks. Xingui
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c index 6e7f99fcc824..3af991cad07e 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c +++ b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c @@ -2501,6 +2501,7 @@ static void prep_ata_v2_hw(struct hisi_hba *hisi_hba, struct hisi_sas_port *port = to_hisi_sas_port(sas_port); struct sas_ata_task *ata_task = &task->ata_task; struct sas_tmf_task *tmf = slot->tmf; + int phy_id; u8 *buf_cmd; int has_data = 0, hdr_tag = 0; u32 dw0, dw1 = 0, dw2 = 0; @@ -2508,10 +2509,14 @@ static void prep_ata_v2_hw(struct hisi_hba *hisi_hba, /* create header */ /* dw0 */ dw0 = port->id << CMD_HDR_PORT_OFF; - if (parent_dev && dev_is_expander(parent_dev->dev_type)) + if (parent_dev && dev_is_expander(parent_dev->dev_type)) { dw0 |= 3 << CMD_HDR_CMD_OFF; - else + } else { + phy_id = device->phy->identify.phy_identifier; + dw0 |= (1U << phy_id) << CMD_HDR_PHY_ID_OFF; + dw0 |= CMD_HDR_FORCE_PHY_MSK; dw0 |= 4 << CMD_HDR_CMD_OFF; + } if (tmf && ata_task->force_phy) { dw0 |= CMD_HDR_FORCE_PHY_MSK; diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c index 095bbf80c34e..6a0656f3b596 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c +++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c @@ -359,6 +359,10 @@ #define CMD_HDR_RESP_REPORT_MSK (0x1 << CMD_HDR_RESP_REPORT_OFF) #define CMD_HDR_TLR_CTRL_OFF 6 #define CMD_HDR_TLR_CTRL_MSK (0x3 << CMD_HDR_TLR_CTRL_OFF) +#define CMD_HDR_PHY_ID_OFF 8 +#define CMD_HDR_PHY_ID_MSK (0x1ff << CMD_HDR_PHY_ID_OFF) +#define CMD_HDR_FORCE_PHY_OFF 17 +#define CMD_HDR_FORCE_PHY_MSK (0x1U << CMD_HDR_FORCE_PHY_OFF) #define CMD_HDR_PORT_OFF 18 #define CMD_HDR_PORT_MSK (0xf << CMD_HDR_PORT_OFF) #define CMD_HDR_PRIORITY_OFF 27 @@ -1429,15 +1433,21 @@ static void prep_ata_v3_hw(struct hisi_hba *hisi_hba, struct hisi_sas_cmd_hdr *hdr = slot->cmd_hdr; struct asd_sas_port *sas_port = device->port; struct hisi_sas_port *port = to_hisi_sas_port(sas_port); + int phy_id; u8 *buf_cmd; int has_data = 0, hdr_tag = 0; u32 dw1 = 0, dw2 = 0; hdr->dw0 = cpu_to_le32(port->id << CMD_HDR_PORT_OFF); - if (parent_dev && dev_is_expander(parent_dev->dev_type)) + if (parent_dev && dev_is_expander(parent_dev->dev_type)) { hdr->dw0 |= cpu_to_le32(3 << CMD_HDR_CMD_OFF); - else + } else { + phy_id = device->phy->identify.phy_identifier; + hdr->dw0 |= cpu_to_le32((1U << phy_id) + << CMD_HDR_PHY_ID_OFF); + hdr->dw0 |= CMD_HDR_FORCE_PHY_MSK; hdr->dw0 |= cpu_to_le32(4U << CMD_HDR_CMD_OFF); + } switch (task->data_dir) { case DMA_TO_DEVICE: