Message ID | 1606905417-183214-5-git-send-email-john.garry@huawei.com |
---|---|
State | New |
Headers | show |
Series | Support managed interrupts for platform devices | expand |
On Wed, Dec 02, 2020 at 06:36:56PM +0800, John Garry wrote: > Drivers for multi-queue platform devices may also want managed interrupts > for handling HW queue completion interrupts, so add support. Why would a platform device want all of this? Shouldn't such a device be on a "real" bus instead? What in-kernel driver needs this complexity? I can't take new apis without a real user in the tree, sorry. thanks, greg k-h
On 09/12/2020 18:32, Greg KH wrote: > On Wed, Dec 02, 2020 at 06:36:56PM +0800, John Garry wrote: >> Drivers for multi-queue platform devices may also want managed interrupts >> for handling HW queue completion interrupts, so add support. > Hi Greg, > Why would a platform device want all of this? Shouldn't such a device > be on a "real" bus instead? For this HW version, the device is on the system bus, directly addressable by the CPU. Motivation is that I wanted to switch the HW completion queues to use managed interrupts. > > What in-kernel driver needs this complexity? I can't take new apis > without a real user in the tree, sorry. It's in the final patch in the series https://lore.kernel.org/linux-scsi/1606905417-183214-1-git-send-email-john.garry@huawei.com/T/#m0df7e7cd6f0819b99aaeb6b7f8939ef1e17b8a83. I don't anticipate a huge number of users of this API in future, as most multi-queue devices are PCI devices; so we could do the work of this API in the driver itself, but the preference was not to export genirq functions like irq_update_affinity_desc() or irq_create_affinity_masks(), and rather have a common helper in the core platform code. Thanks, John
On Wed, Dec 09, 2020 at 07:04:02PM +0000, John Garry wrote: > On 09/12/2020 18:32, Greg KH wrote: > > On Wed, Dec 02, 2020 at 06:36:56PM +0800, John Garry wrote: > > > Drivers for multi-queue platform devices may also want managed interrupts > > > for handling HW queue completion interrupts, so add support. > > > > Hi Greg, > > > Why would a platform device want all of this? Shouldn't such a device > > be on a "real" bus instead? > > For this HW version, the device is on the system bus, directly addressable > by the CPU. What do you mean by "system bus"? > Motivation is that I wanted to switch the HW completion queues to use > managed interrupts. Fair enough, seems like overkill for a "platform" bus though :) > > What in-kernel driver needs this complexity? I can't take new apis > > without a real user in the tree, sorry. > > It's in the final patch in the series https://lore.kernel.org/linux-scsi/1606905417-183214-1-git-send-email-john.garry@huawei.com/T/#m0df7e7cd6f0819b99aaeb6b7f8939ef1e17b8a83. Ah, I missed that, I thought that was some high-speed scsi thing, not a tiny platform driver... > I don't anticipate a huge number of users of this API in future, as most > multi-queue devices are PCI devices; so we could do the work of this API in > the driver itself, but the preference was not to export genirq functions > like irq_update_affinity_desc() or irq_create_affinity_masks(), and rather > have a common helper in the core platform code. Ok, I'd like to have the irq maintainers/developers ack this before taking it in the driver core, as someone is going to have to maintain this crazy thing for forever if it gets merged. thanks, greg k-h
On 09/12/2020 19:13, Greg KH wrote: Hi Greg, >> For this HW version, the device is on the system bus, directly addressable >> by the CPU. > What do you mean by "system bus"? Maybe my terminology is wrong, the point is that we have a platform device driver. > >> Motivation is that I wanted to switch the HW completion queues to use >> managed interrupts. > Fair enough, seems like overkill for a "platform" bus though:) > >>> What in-kernel driver needs this complexity? I can't take new apis >>> without a real user in the tree, sorry. >> It's in the final patch in the serieshttps://lore.kernel.org/linux-scsi/1606905417-183214-1-git-send-email-john.garry@huawei.com/T/#m0df7e7cd6f0819b99aaeb6b7f8939ef1e17b8a83. > Ah, I missed that, I thought that was some high-speed scsi thing, not a > tiny platform driver... It is actually is a high-speed SCSI thing also, SAS 3.0 :) > >> I don't anticipate a huge number of users of this API in future, as most >> multi-queue devices are PCI devices; so we could do the work of this API in >> the driver itself, but the preference was not to export genirq functions >> like irq_update_affinity_desc() or irq_create_affinity_masks(), and rather >> have a common helper in the core platform code. > Ok, I'd like to have the irq maintainers/developers ack this before > taking it in the driver core, as someone is going to have to maintain > this crazy thing for forever if it gets merged. > irq experts are cc'ed and have been very helpful here So the API mushroomed a bit over time, as I realized that we need to support tearing down the irq mapping, make as devm method, use irq_calc_affinity_vectors(). Not sure how we could factor any of it out to become less of your problem. Thanks, John
On 2020-12-09 19:13, Greg KH wrote: > On Wed, Dec 09, 2020 at 07:04:02PM +0000, John Garry wrote: >> On 09/12/2020 18:32, Greg KH wrote: >> > On Wed, Dec 02, 2020 at 06:36:56PM +0800, John Garry wrote: >> > > Drivers for multi-queue platform devices may also want managed interrupts >> > > for handling HW queue completion interrupts, so add support. >> > >> >> Hi Greg, >> >> > Why would a platform device want all of this? Shouldn't such a device >> > be on a "real" bus instead? >> >> For this HW version, the device is on the system bus, directly >> addressable >> by the CPU. > > What do you mean by "system bus"? > >> Motivation is that I wanted to switch the HW completion queues to use >> managed interrupts. > > Fair enough, seems like overkill for a "platform" bus though :) You should see the box, really... ;-) > >> > What in-kernel driver needs this complexity? I can't take new apis >> > without a real user in the tree, sorry. >> >> It's in the final patch in the series >> https://lore.kernel.org/linux-scsi/1606905417-183214-1-git-send-email-john.garry@huawei.com/T/#m0df7e7cd6f0819b99aaeb6b7f8939ef1e17b8a83. > > Ah, I missed that, I thought that was some high-speed scsi thing, not a > tiny platform driver... > >> I don't anticipate a huge number of users of this API in future, as >> most >> multi-queue devices are PCI devices; so we could do the work of this >> API in >> the driver itself, but the preference was not to export genirq >> functions >> like irq_update_affinity_desc() or irq_create_affinity_masks(), and >> rather >> have a common helper in the core platform code. > > Ok, I'd like to have the irq maintainers/developers ack this before > taking it in the driver core, as someone is going to have to maintain > this crazy thing for forever if it gets merged. I'm actually quite happy with this, and as it turns out, the crazy system that has this SAS thing keeps my backside warm all year long. As long as this machine keeps ticking, I'm happy to help with this. So if that helps: Acked-by: Marc Zyngier <maz@kernel.org> We need to work out the merge strategy for the whole lot though, given that it crosses 3 subsystems over two series... M. -- Jazz is not dead. It just smells funny...
On 09/12/2020 19:39, Marc Zyngier wrote: >> >> Ok, I'd like to have the irq maintainers/developers ack this before >> taking it in the driver core, as someone is going to have to maintain >> this crazy thing for forever if it gets merged. > > I'm actually quite happy with this, and as it turns out, the crazy > system that has this SAS thing keeps my backside warm all year long. > As long as this machine keeps ticking, I'm happy to help with this. > > So if that helps: > > Acked-by: Marc Zyngier <maz@kernel.org> Cheers > > We need to work out the merge strategy for the whole lot though, given > that it crosses 3 subsystems over two series... Thomas originally suggested taking the genirq change himself and then providing a tag for others to merge: https://lore.kernel.org/linux-scsi/87h7qf1yp0.fsf@nanos.tec.linutronix.de/ Not sure if that still stands. The small ACPI change could go in a cycle after rest merged, but may be best not to split up. The worst that will happen without Marc's series is that is remove + re-probe the SCSI driver is broken, so I'm happy as long as that ends up in same kernel version somehow: https://lore.kernel.org/lkml/20201129135208.680293-1-maz@kernel.org/ Thanks, John
On Wed, Dec 09, 2020 at 07:39:03PM +0000, Marc Zyngier wrote: > On 2020-12-09 19:13, Greg KH wrote: > > On Wed, Dec 09, 2020 at 07:04:02PM +0000, John Garry wrote: > > > On 09/12/2020 18:32, Greg KH wrote: > > > > On Wed, Dec 02, 2020 at 06:36:56PM +0800, John Garry wrote: > > > > > Drivers for multi-queue platform devices may also want managed interrupts > > > > > for handling HW queue completion interrupts, so add support. > > > > > > > > > > Hi Greg, > > > > > > > Why would a platform device want all of this? Shouldn't such a device > > > > be on a "real" bus instead? > > > > > > For this HW version, the device is on the system bus, directly > > > addressable > > > by the CPU. > > > > What do you mean by "system bus"? > > > > > Motivation is that I wanted to switch the HW completion queues to use > > > managed interrupts. > > > > Fair enough, seems like overkill for a "platform" bus though :) > > You should see the box, really... ;-) {sigh} why do hardware engineers ignore sane busses... Anyway, if you all are going to maintain this, no objection from me, it should go through the irq tree. thanks, greg k-h
Hi Greg, > {sigh} why do hardware engineers ignore sane busses... The next HW version is an integrated PCI endpoint, so there is hope. > > Anyway, if you all are going to maintain this, no objection from me, it > should go through the irq tree. OK, thanks. So this is getting quite late for 5.11, and none of it has seen -next obviously. However, the changes are additive and should only affect a single driver now. I'm talking about this series now, not Marc's companion series. I just need to hear from Thomas on any merge preference. Thanks, John
On 12/2/20 2:36 AM, John Garry wrote: > + for (i = 0; i < nvec; i++) { > + int irq = platform_get_irq(dev, i); > + if (irq < 0) { > + ret = irq; > + goto err_free_devres; > + } > + ptr->irq[i] = irq; > + } (replying to an email from four years ago) Why does this function call platform_get_irq(dev, i) instead of platform_get_irq(dev, affd->pre_vectors + i)? Is there perhaps something about the hisi_sas driver that I'm missing? I'm asking this because this function would be useful for UFS controller drivers if the affd->pre_vectors offset would be added when calling platform_get_irq(). Thanks, Bart.
On 09/08/2024 19:11, Bart Van Assche wrote: > On 12/2/20 2:36 AM, John Garry wrote: >> + for (i = 0; i < nvec; i++) { >> + int irq = platform_get_irq(dev, i); >> + if (irq < 0) { >> + ret = irq; >> + goto err_free_devres; >> + } >> + ptr->irq[i] = irq; >> + } > > (replying to an email from four years ago) > > Why does this function call platform_get_irq(dev, i) instead of > platform_get_irq(dev, affd->pre_vectors + i)? Is there perhaps something > about the hisi_sas driver that I'm missing? I'm asking this because this > function would be useful for UFS controller drivers if the > affd->pre_vectors offset would be added when calling platform_get_irq(). > int devm_platform_get_irqs_affinity(struct platform_device *dev, struct irq_affinity *affd, unsigned int minvec, unsigned int maxvec, int **irqs) Function devm_platform_get_irqs_affinity() gets the irq number for a total between @minvec and @maxvec interrupts, and fills them into @irqs arg. It does not just get the interrupts for index @minvec to @maxvec only. For context, as I remember, hisi_sas v2 hw has 128 interrupts lines. Interrupts index [96, 112) are completion queue interrupts, which we want to spread over all CPUs. See interrupt_init_v2_hw() in that driver for how the control interrupts, like phy up/down, are used.
On 8/12/24 3:46 AM, John Garry wrote: > On 09/08/2024 19:11, Bart Van Assche wrote: >> On 12/2/20 2:36 AM, John Garry wrote: >>> + for (i = 0; i < nvec; i++) { >>> + int irq = platform_get_irq(dev, i); >>> + if (irq < 0) { >>> + ret = irq; >>> + goto err_free_devres; >>> + } >>> + ptr->irq[i] = irq; >>> + } >> >> (replying to an email from four years ago) >> >> Why does this function call platform_get_irq(dev, i) instead of >> platform_get_irq(dev, affd->pre_vectors + i)? Is there perhaps something >> about the hisi_sas driver that I'm missing? I'm asking this because this >> function would be useful for UFS controller drivers if the >> affd->pre_vectors offset would be added when calling platform_get_irq(). >> > int devm_platform_get_irqs_affinity(struct platform_device *dev, > struct irq_affinity *affd, > unsigned int minvec, > unsigned int maxvec, > int **irqs) > > > Function devm_platform_get_irqs_affinity() gets the irq number for a > total between @minvec and @maxvec interrupts, and fills them into @irqs > arg. It does not just get the interrupts for index @minvec to @maxvec only. > > For context, as I remember, hisi_sas v2 hw has 128 interrupts lines. > Interrupts index [96, 112) are completion queue interrupts, which we > want to spread over all CPUs. See interrupt_init_v2_hw() in that driver > for how the control interrupts, like phy up/down, are used. Hi John, In interrupt_init_v2_hw() and also elsewhere in the hisi_sas_v2_hw.c source file I see that the CQ interrupts start at offset 96. However, devm_platform_get_irqs_affinity() passes the arguments 0..(num_cqs-1) to platform_get_irq(). Shouldn't that function pass arguments in the range 96..(96+num_cqs-1) to platform_get_irq() since that is the CQ interrupt range for this storage controller? My understanding is that the devm_platform_get_irqs_affinity() call from hisi_sas_v2_hw.c will affect the affinity of the interrupts 0..(num_cqs-1) instead of the interrupts in the range 96..(96+num_cqs-1). Isn't that wrong? Thanks, Bart.
On 12/08/2024 21:29, Bart Van Assche wrote: > On 8/12/24 3:46 AM, John Garry wrote: >> On 09/08/2024 19:11, Bart Van Assche wrote: >>> On 12/2/20 2:36 AM, John Garry wrote: >>>> + for (i = 0; i < nvec; i++) { >>>> + int irq = platform_get_irq(dev, i); >>>> + if (irq < 0) { >>>> + ret = irq; >>>> + goto err_free_devres; >>>> + } >>>> + ptr->irq[i] = irq; >>>> + } >>> >>> (replying to an email from four years ago) >>> >>> Why does this function call platform_get_irq(dev, i) instead of >>> platform_get_irq(dev, affd->pre_vectors + i)? Is there perhaps something >>> about the hisi_sas driver that I'm missing? I'm asking this because this >>> function would be useful for UFS controller drivers if the >>> affd->pre_vectors offset would be added when calling platform_get_irq(). >>> >> int devm_platform_get_irqs_affinity(struct platform_device *dev, >> struct irq_affinity *affd, >> unsigned int minvec, >> unsigned int maxvec, >> int **irqs) >> >> >> Function devm_platform_get_irqs_affinity() gets the irq number for a >> total between @minvec and @maxvec interrupts, and fills them into >> @irqs arg. It does not just get the interrupts for index @minvec to >> @maxvec only. >> >> For context, as I remember, hisi_sas v2 hw has 128 interrupts lines. >> Interrupts index [96, 112) are completion queue interrupts, which we >> want to spread over all CPUs. See interrupt_init_v2_hw() in that >> driver for how the control interrupts, like phy up/down, are used. > > Hi John, > > In interrupt_init_v2_hw() and also elsewhere in the hisi_sas_v2_hw.c > source file I see that the CQ interrupts start at offset 96. However, > devm_platform_get_irqs_affinity() passes the arguments 0..(num_cqs-1) to Total interrupt lines are 128. dec.{pre, post}_vectors = {96, 16} which gives resv=96+16=112 minvec=resv+1 = 113 maxvec=128 > platform_get_irq(). Shouldn't that function pass arguments in the range > 96..(96+num_cqs-1) to platform_get_irq() since that is the CQ interrupt > range for this storage controller? The API is trying to be like pci_alloc_irq_vectors_affinity(), where it accepts a min and max vectors to allocate. platform_get_irq() actually does more than just do some lookup for an irq number - it also sets up a mapping if it does not already exist. Furthermore, as I recall, maybe platform msi code or mbigen code or hisi chipset had a restriction that platform_get_irq() had to be called in order for each and every interrupt line. So having a single point to do this made sense. Check interrupt_init_v2_hw() code and comment prior to devm_platform_get_irqs_affinity() existing, like a v5.10 kernel - it called platform_get_irq() for all 128 interrupt lines. > My understanding is that the > devm_platform_get_irqs_affinity() call from hisi_sas_v2_hw.c will affect > the affinity of the interrupts 0..(num_cqs-1) instead of the interrupts > in the range 96..(96+num_cqs-1). Isn't that wrong? > devm_platform_get_irqs_affinity() will only touch the affinity mapping of the completion queue interrupts, but still call platform_get_irq() for all interrupts
diff --git a/drivers/base/platform.c b/drivers/base/platform.c index 88aef93eb4dd..ea8add164b89 100644 --- a/drivers/base/platform.c +++ b/drivers/base/platform.c @@ -15,6 +15,8 @@ #include <linux/of_irq.h> #include <linux/module.h> #include <linux/init.h> +#include <linux/interrupt.h> +#include <linux/ioport.h> #include <linux/dma-mapping.h> #include <linux/memblock.h> #include <linux/err.h> @@ -289,6 +291,125 @@ int platform_irq_count(struct platform_device *dev) } EXPORT_SYMBOL_GPL(platform_irq_count); +struct irq_affinity_devres { + unsigned int count; + unsigned int irq[]; +}; + +static void platform_disable_acpi_irq(struct platform_device *pdev, int index) +{ + struct resource *r; + + r = platform_get_resource(pdev, IORESOURCE_IRQ, index); + if (r) + irqresource_disabled(r, 0); +} + +static void devm_platform_get_irqs_affinity_release(struct device *dev, + void *res) +{ + struct irq_affinity_devres *ptr = res; + int i; + + for (i = 0; i < ptr->count; i++) { + irq_dispose_mapping(ptr->irq[i]); + + if (has_acpi_companion(dev)) + platform_disable_acpi_irq(to_platform_device(dev), i); + } +} + +/** + * devm_platform_get_irqs_affinity - devm method to get a set of IRQs for a + * device using an interrupt affinity descriptor + * @dev: platform device pointer + * @affd: affinity descriptor + * @minvec: minimum count of interrupt vectors + * @maxvec: maximum count of interrupt vectors + * @irqs: pointer holder for IRQ numbers + * + * Gets a set of IRQs for a platform device, and updates IRQ afffinty according + * to the passed affinity descriptor + * + * Return: Number of vectors on success, negative error number on failure. + */ +int devm_platform_get_irqs_affinity(struct platform_device *dev, + struct irq_affinity *affd, + unsigned int minvec, + unsigned int maxvec, + int **irqs) +{ + struct irq_affinity_devres *ptr; + struct irq_affinity_desc *desc; + size_t size; + int i, ret, nvec; + + if (!affd) + return -EPERM; + + if (maxvec < minvec) + return -ERANGE; + + nvec = platform_irq_count(dev); + + if (nvec < minvec) + return -ENOSPC; + + nvec = irq_calc_affinity_vectors(minvec, nvec, affd); + if (nvec < minvec) + return -ENOSPC; + + if (nvec > maxvec) + nvec = maxvec; + + size = sizeof(*ptr) + sizeof(unsigned int) * nvec; + ptr = devres_alloc(devm_platform_get_irqs_affinity_release, size, + GFP_KERNEL); + if (!ptr) + return -ENOMEM; + + ptr->count = nvec; + + for (i = 0; i < nvec; i++) { + int irq = platform_get_irq(dev, i); + if (irq < 0) { + ret = irq; + goto err_free_devres; + } + ptr->irq[i] = irq; + } + + desc = irq_create_affinity_masks(nvec, affd); + if (!desc) { + ret = -ENOMEM; + goto err_free_devres; + } + + for (i = 0; i < nvec; i++) { + ret = irq_update_affinity_desc(ptr->irq[i], &desc[i]); + if (ret) { + dev_err(&dev->dev, "failed to update irq%d affinity descriptor (%d)\n", + ptr->irq[i], ret); + goto err_free_desc; + } + } + + devres_add(&dev->dev, ptr); + + kfree(desc); + + *irqs = ptr->irq; + + return nvec; + +err_free_desc: + kfree(desc); +err_free_devres: + devres_free(ptr); + return ret; +} +EXPORT_SYMBOL_GPL(devm_platform_get_irqs_affinity); + /** * platform_get_resource_byname - get a resource for a device by name * @dev: platform device diff --git a/include/linux/platform_device.h b/include/linux/platform_device.h index 77a2aada106d..4d75633e6735 100644 --- a/include/linux/platform_device.h +++ b/include/linux/platform_device.h @@ -15,6 +15,7 @@ #define PLATFORM_DEVID_NONE (-1) #define PLATFORM_DEVID_AUTO (-2) +struct irq_affinity; struct mfd_cell; struct property_entry; struct platform_device_id; @@ -70,6 +71,11 @@ devm_platform_ioremap_resource_byname(struct platform_device *pdev, extern int platform_get_irq(struct platform_device *, unsigned int); extern int platform_get_irq_optional(struct platform_device *, unsigned int); extern int platform_irq_count(struct platform_device *); +extern int devm_platform_get_irqs_affinity(struct platform_device *dev, + struct irq_affinity *affd, + unsigned int minvec, + unsigned int maxvec, + int **irqs); extern struct resource *platform_get_resource_byname(struct platform_device *, unsigned int, const char *);
Drivers for multi-queue platform devices may also want managed interrupts for handling HW queue completion interrupts, so add support. The function accepts an affinity descriptor pointer, which covers all IRQs expected for the device. The function is devm class as the only current in-tree user will also use devm method for requesting the interrupts; as such, the function is made as devm as it can ensure ordering of freeing the irq and disposing of the mapping. Signed-off-by: John Garry <john.garry@huawei.com> --- drivers/base/platform.c | 121 ++++++++++++++++++++++++++++++++ include/linux/platform_device.h | 6 ++ 2 files changed, 127 insertions(+) -- 2.26.2