Message ID | 20221111144433.2421680-1-yangyingliang@huawei.com |
---|---|
State | New |
Headers | show |
Series | [v2] scsi: scsi_transport_sas: fix error handling in sas_rphy_add() | expand |
+Cc: Greg Hi Greg, On 2022/11/11 23:51, James Bottomley wrote: > On Fri, 2022-11-11 at 22:44 +0800, Yang Yingliang wrote: >> In sas_rphy_add(), if transport_add_device() fails, the device >> is not added, the return value is not checked, it won't goto >> error path, when removing rphy in normal remove path, it causes >> null-ptr-deref, because transport_remove_device() is called to >> remove the device that was not added. >> >> Unable to handle kernel NULL pointer dereference at virtual address >> 0000000000000108 >> pc : device_del+0x54/0x3d0 >> lr : device_del+0x37c/0x3d0 >> Call trace: >> device_del+0x54/0x3d0 >> attribute_container_class_device_del+0x28/0x38 >> transport_remove_classdev+0x6c/0x80 >> attribute_container_device_trigger+0x108/0x110 >> transport_remove_device+0x28/0x38 >> sas_rphy_remove+0x50/0x78 [scsi_transport_sas] >> sas_port_delete+0x30/0x148 [scsi_transport_sas] >> do_sas_phy_delete+0x78/0x80 [scsi_transport_sas] >> device_for_each_child+0x68/0xb0 >> sas_remove_children+0x30/0x50 [scsi_transport_sas] >> sas_rphy_remove+0x38/0x78 [scsi_transport_sas] >> sas_port_delete+0x30/0x148 [scsi_transport_sas] >> do_sas_phy_delete+0x78/0x80 [scsi_transport_sas] >> device_for_each_child+0x68/0xb0 >> sas_remove_children+0x30/0x50 [scsi_transport_sas] >> sas_remove_host+0x20/0x38 [scsi_transport_sas] >> scsih_remove+0xd8/0x420 [mpt3sas] >> >> Fix this by checking and handling return value of >> transport_add_device() >> in sas_rphy_add(). >> >> Fixes: c7ebbbce366c ("[SCSI] SAS transport class") >> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> >> --- >> v1 -> v2: >> Update commit message. >> --- >> drivers/scsi/scsi_transport_sas.c | 6 +++++- >> 1 file changed, 5 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/scsi/scsi_transport_sas.c >> b/drivers/scsi/scsi_transport_sas.c >> index 74b99f2b0b74..accc0afa8f77 100644 >> --- a/drivers/scsi/scsi_transport_sas.c >> +++ b/drivers/scsi/scsi_transport_sas.c >> @@ -1526,7 +1526,11 @@ int sas_rphy_add(struct sas_rphy *rphy) >> error = device_add(&rphy->dev); >> if (error) >> return error; >> - transport_add_device(&rphy->dev); >> + error = transport_add_device(&rphy->dev); >> + if (error) { >> + device_del(&rphy->dev); >> + return error; >> + } >> transport_configure_device(&rphy->dev); >> if (sas_bsg_initialize(shost, rphy)) >> printk("fail to a bsg device %s\n", dev_name(&rphy- >>> dev)); > There is a slight problem with doing this in that if > transport_device_add() ever fails it's likely because memory pressure > caused the allocation of the internal_container to fail. What that > means is that the visible sysfs attributes don't get added, but > otherwise the rphy is fully functional as far as the driver sees it, so > this condition doesn't have to be a fatal error which kills the device. > > There are two ways of handling this: > > 1. The above to move the condition from an ignored to a fatal error. > It's so rare that we almost never see it in practice and if it > ever happened, the machine is so low on memory that something > else is bound to fail an allocation and kill the device anyway, > so treating it as non-fatal likely serves no purpose. > 2. Simply to make the assumption that transport_remove_device() is > idempotent true by adding a flag in the internal_class to signify > removal is required. This would preserve current behaviour and > have the bonus that it only requires a single patch, not one > patch per transport class object that has this problem. > > I'd probably prefer 2. since it's way less work, but others might have > different opinions. Current some callers ignore the return value of transport_add_device(), if it fails, it will cause null-ptr-deref in transport_remove_device(). James suggested that add some check in transport_remove_device(), so all can be fix in one patch. Do you have any suggestion for this ? Thanks, Yang > > James > > .
On 2022/11/18 17:18, John Garry wrote: > On 18/11/2022 03:11, Yang Yingliang wrote: >>>>> ); >>> There is a slight problem with doing this in that if >>> transport_device_add() ever fails it's likely because memory pressure >>> caused the allocation of the internal_container to fail. What that >>> means is that the visible sysfs attributes don't get added, but >>> otherwise the rphy is fully functional as far as the driver sees it, so >>> this condition doesn't have to be a fatal error which kills the device. >>> >>> There are two ways of handling this: >>> >>> 1. The above to move the condition from an ignored to a fatal >>> error. >>> It's so rare that we almost never see it in practice and if it >>> ever happened, the machine is so low on memory that something >>> else is bound to fail an allocation and kill the device anyway, >>> so treating it as non-fatal likely serves no purpose. >>> 2. Simply to make the assumption that transport_remove_device() is >>> idempotent true by adding a flag in the internal_class to >>> signify >>> removal is required. This would preserve current behaviour and >>> have the bonus that it only requires a single patch, not one >>> patch per transport class object that has this problem. >>> >>> I'd probably prefer 2. since it's way less work, but others might have >>> different opinions. >> Current some callers ignore the return value of >> transport_add_device(), if it fails, >> it will cause null-ptr-deref in transport_remove_device(). >> >> James suggested that add some check in transport_remove_device(), so >> all can >> be fix in one patch. >> >> Do you have any suggestion for this ? > > Personally I prefer 1. However did you develop a prototype patch for > how 2. would look? And how many changes are still required for 1.? For 1, in total, there are 8 places need be checked in drivers/scsi/scsi_transport_sas.c, 2 places in drivers/scsi/scsi_sysfs.c, 3 places in drivers/scsi/scsi_transport_fc.c, 2 places in drivers/scsi/scsi_transport_srp.c, 1 place For 2, I think we can use device_is_registered() to check if add operation is successful, may be like this (not test yet): diff --git a/drivers/base/transport_class.c b/drivers/base/transport_class.c index ccc86206e508..ac41be7b724e 100644 --- a/drivers/base/transport_class.c +++ b/drivers/base/transport_class.c @@ -227,9 +227,11 @@ static int transport_remove_classdev(struct attribute_container *cont, tclass->remove(tcont, dev, classdev); if (tclass->remove != anon_transport_dummy_function) { - if (tcont->statistics) - sysfs_remove_group(&classdev->kobj, tcont->statistics); - attribute_container_class_device_del(classdev); + if (device_is_registered(classdev)) { + if (tcont->statistics) + sysfs_remove_group(&classdev->kobj, tcont->statistics); + attribute_container_class_device_del(classdev); + } } return 0; Thanks, Yang > > Thanks, > John > .
diff --git a/drivers/scsi/scsi_transport_sas.c b/drivers/scsi/scsi_transport_sas.c index 74b99f2b0b74..accc0afa8f77 100644 --- a/drivers/scsi/scsi_transport_sas.c +++ b/drivers/scsi/scsi_transport_sas.c @@ -1526,7 +1526,11 @@ int sas_rphy_add(struct sas_rphy *rphy) error = device_add(&rphy->dev); if (error) return error; - transport_add_device(&rphy->dev); + error = transport_add_device(&rphy->dev); + if (error) { + device_del(&rphy->dev); + return error; + } transport_configure_device(&rphy->dev); if (sas_bsg_initialize(shost, rphy)) printk("fail to a bsg device %s\n", dev_name(&rphy->dev));
In sas_rphy_add(), if transport_add_device() fails, the device is not added, the return value is not checked, it won't goto error path, when removing rphy in normal remove path, it causes null-ptr-deref, because transport_remove_device() is called to remove the device that was not added. Unable to handle kernel NULL pointer dereference at virtual address 0000000000000108 pc : device_del+0x54/0x3d0 lr : device_del+0x37c/0x3d0 Call trace: device_del+0x54/0x3d0 attribute_container_class_device_del+0x28/0x38 transport_remove_classdev+0x6c/0x80 attribute_container_device_trigger+0x108/0x110 transport_remove_device+0x28/0x38 sas_rphy_remove+0x50/0x78 [scsi_transport_sas] sas_port_delete+0x30/0x148 [scsi_transport_sas] do_sas_phy_delete+0x78/0x80 [scsi_transport_sas] device_for_each_child+0x68/0xb0 sas_remove_children+0x30/0x50 [scsi_transport_sas] sas_rphy_remove+0x38/0x78 [scsi_transport_sas] sas_port_delete+0x30/0x148 [scsi_transport_sas] do_sas_phy_delete+0x78/0x80 [scsi_transport_sas] device_for_each_child+0x68/0xb0 sas_remove_children+0x30/0x50 [scsi_transport_sas] sas_remove_host+0x20/0x38 [scsi_transport_sas] scsih_remove+0xd8/0x420 [mpt3sas] Fix this by checking and handling return value of transport_add_device() in sas_rphy_add(). Fixes: c7ebbbce366c ("[SCSI] SAS transport class") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> --- v1 -> v2: Update commit message. --- drivers/scsi/scsi_transport_sas.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)