mbox series

[0/2] iommu: Fix the longstanding probe issues

Message ID cover.1739486121.git.robin.murphy@arm.com
Headers show
Series iommu: Fix the longstanding probe issues | expand

Message

Robin Murphy Feb. 13, 2025, 11:48 p.m. UTC
Hi all,

Finally, it's that thing I've been mentioning on and off for the last 5
years or so! After far too much thinking about "clean" ways to refactor
everything towards an eventual goal of working properly, I finally gave
up and decided to see how bad it would be to just get it working, then
worry about cleaning up afterwards. Frankly I still can't quiet believe
how small the patch turned out to be...

Patch #1 is a fix for what I think is the only "expected" race in the
core API design - we knew we had that window where a default domain is
NULL, but the potential consequences weren't clear until Charan figured
it out the hard way. That should be good to go already, and it's not
strictly a dependency for patch #2, but it seemed worth posting them
together on the common theme.

Patch #2 is then the first big bite out of the problem I feel compelled
to fix because it's my name on d7b0558230e4 ("iommu/of: Prepare for
deferred IOMMU configuration") which, in hindsight, started it. I hope
this will be the only patch which has to touch absolutely everything all
at once. I've not got very far with the follow-up patches yet, but I do
foresee being able to tackle the driver core, firmware subsystems and
bus drivers more independently. I don't expect anyone to love it, but I
hope you can at least see where I'm going and give a bit of leeway for
now :)

I also do rather expect that WARN_ON to be hit, but the intent is that
I'd like to get the attention of any of_dma_configure() abusers so they
can help me figure out what the heck their code *should* be doing...

Thanks,
Robin.


Robin Murphy (2):
  iommu: Handle race with default domain setup
  iommu: Get DT/ACPI parsing into the proper probe path

 drivers/acpi/arm64/dma.c        |  5 ++++
 drivers/acpi/scan.c             | 10 +++----
 drivers/amba/bus.c              |  2 +-
 drivers/base/platform.c         |  2 +-
 drivers/bus/fsl-mc/fsl-mc-bus.c |  2 +-
 drivers/cdx/cdx.c               |  2 +-
 drivers/iommu/iommu.c           | 48 +++++++++++++++++++++++++--------
 drivers/iommu/of_iommu.c        | 10 ++++++-
 drivers/of/device.c             |  7 ++++-
 drivers/pci/pci-driver.c        |  2 +-
 10 files changed, 65 insertions(+), 25 deletions(-)

Comments

Charan Teja Kalla Feb. 14, 2025, 12:57 p.m. UTC | #1
Thanks a lot for posting these patches, Robin.

On 2/14/2025 5:18 AM, Robin Murphy wrote:
>  drivers/iommu/iommu.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 870c3cdbd0f6..2486f6d6ef68 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -3097,6 +3097,11 @@ int iommu_device_use_default_domain(struct device *dev)
>  		return 0;
>  
>  	mutex_lock(&group->mutex);
> +	/* We may race against bus_iommu_probe() finalising groups here */
> +	if (!group->default_domain) {
> +		ret = -EPROBE_DEFER;
> +		goto unlock_out;
> +	}

We just hit the issue again even after picking up this patch, though
very hard to reproduce, on 6.6 LTS.

After code inspection, it seems the issue is that - default domain is
setup in the bus_iommu_probe() before hitting of this replay.

A:async client probe in platform_dma_configure(), B:bus_iommu_probe() :-

1) A: sets up iommu_fwspec under iommu_probe_device_lock.

2) B: Sets the dev->iommu_group under iommu_probe_device_lock. Domain
setup is deferred.

3) A: Returns with out allocating the default domain, as
dev->iommu_group is set, whose checks are also made under the same
'iommu_probe_device_lock'. __This miss setting of the valid dev->dma_ops__.

4) B: Sets up the group->default_domain under group->mutex.

5) A: iommu_device_use_default_domain(): Relies on this
group->default_domain, under the same mutex, to decide if need to go for
replay, which is skipped. This is skipping the setting up of valid
dma_ops and that's an issue.

But I don't think that the same issue exists on 6.13 because of your
patch, b67483b3c44e ("iommu/dma: Centralise iommu_setup_dma_ops()").
bus_iommu_probe():
     list_for_each_entry_safe(group, next, &group_list, entry) {
		mutex_lock(&group->mutex);
		for_each_group_device(group, gdev)
			iommu_setup_dma_ops(gdev->dev);
		mutex_unlock(&group->mutex);
     }

This makes the step4 above force to use the valid dma_iommu api, thus I
see no issue when there is no probe deferral.

So, I think we are good with this patch on 6.13.

Now coming back to 6.6 LTS, any ideas you have here, please?

>  	if (group->owner_cnt) {
>  		if (group->domain != group->default_domain || group->owner ||
>  		    !xa_empty(&group->pasid_array)) {


Thanks,
Charan
Jason Gunthorpe Feb. 14, 2025, 7:43 p.m. UTC | #2
On Thu, Feb 13, 2025 at 11:48:59PM +0000, Robin Murphy wrote:
> It turns out that deferred default domain creation leaves a subtle
> race window during iommu_device_register() wherein a client driver may
> asynchronously probe in parallel and get as far as performing DMA API
> operations with dma-direct, only to be switched to iommu-dma underfoot
> once the default domain attachment finally happens, with obviously
> disastrous consequences. Even the wonky of_iommu_configure() path is at
> risk, since iommu_fwspec_init() will no longer defer client probe as the
> instance ops are (necessarily) already registered, and the "replay"
> iommu_probe_device() call can see dev->iommu_group already set and so
> think there's nothing to do either.
> 
> Fortunately we already have the right tool in the right place in the
> form of iommu_device_use_default_domain(), which just needs to ensure
> that said default domain is actually ready to *be* used. Deferring the
> client probe shouldn't have too much impact, given that this only
> happens while the IOMMU driver is probing, and thus due to kick the
> deferred probe list again once it finishes.
> 
> Reported-by: Charan Teja Kalla <quic_charante@quicinc.com>
> Fixes: 98ac73f99bc4 ("iommu: Require a default_domain for all iommu drivers")
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  drivers/iommu/iommu.c | 5 +++++
>  1 file changed, 5 insertions(+)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason
Robin Murphy Feb. 17, 2025, 4:29 p.m. UTC | #3
On 14/02/2025 12:57 pm, Charan Teja Kalla wrote:
> Thanks a lot for posting these patches, Robin.
> 
> On 2/14/2025 5:18 AM, Robin Murphy wrote:
>>   drivers/iommu/iommu.c | 5 +++++
>>   1 file changed, 5 insertions(+)
>>
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index 870c3cdbd0f6..2486f6d6ef68 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -3097,6 +3097,11 @@ int iommu_device_use_default_domain(struct device *dev)
>>   		return 0;
>>   
>>   	mutex_lock(&group->mutex);
>> +	/* We may race against bus_iommu_probe() finalising groups here */
>> +	if (!group->default_domain) {
>> +		ret = -EPROBE_DEFER;
>> +		goto unlock_out;
>> +	}
> 
> We just hit the issue again even after picking up this patch, though
> very hard to reproduce, on 6.6 LTS.
> 
> After code inspection, it seems the issue is that - default domain is
> setup in the bus_iommu_probe() before hitting of this replay.
> 
> A:async client probe in platform_dma_configure(), B:bus_iommu_probe() :-
> 
> 1) A: sets up iommu_fwspec under iommu_probe_device_lock.
> 
> 2) B: Sets the dev->iommu_group under iommu_probe_device_lock. Domain
> setup is deferred.
> 
> 3) A: Returns with out allocating the default domain, as
> dev->iommu_group is set, whose checks are also made under the same
> 'iommu_probe_device_lock'. __This miss setting of the valid dev->dma_ops__.
> 
> 4) B: Sets up the group->default_domain under group->mutex.
> 
> 5) A: iommu_device_use_default_domain(): Relies on this
> group->default_domain, under the same mutex, to decide if need to go for
> replay, which is skipped. This is skipping the setting up of valid
> dma_ops and that's an issue.
> 
> But I don't think that the same issue exists on 6.13 because of your
> patch, b67483b3c44e ("iommu/dma: Centralise iommu_setup_dma_ops()").
> bus_iommu_probe():
>       list_for_each_entry_safe(group, next, &group_list, entry) {
> 		mutex_lock(&group->mutex);
> 		for_each_group_device(group, gdev)
> 			iommu_setup_dma_ops(gdev->dev);
> 		mutex_unlock(&group->mutex);
>       }
> 
> This makes the step4 above force to use the valid dma_iommu api, thus I
> see no issue when there is no probe deferral.
> 
> So, I think we are good with this patch on 6.13.
> 
> Now coming back to 6.6 LTS, any ideas you have here, please?

Thanks for the analysis once again! I've not looked closely at 6.6 yet, 
but if we're all happy this looks like the right fix for mainline then 
I'll start having a look at the backport as soon as I can (so much for 
hoping it would be simple...)

Cheers,
Robin.

> 
>>   	if (group->owner_cnt) {
>>   		if (group->domain != group->default_domain || group->owner ||
>>   		    !xa_empty(&group->pasid_array)) {
> 
> 
> Thanks,
> Charan