diff mbox series

selftest: drivers: Add support to check duplicate hwirq

Message ID 20240904014426.3404397-1-jjang@nvidia.com
State New
Headers show
Series selftest: drivers: Add support to check duplicate hwirq | expand

Commit Message

Joseph Jang Sept. 4, 2024, 1:44 a.m. UTC
Validate there are no duplicate hwirq from the irq debug
file system /sys/kernel/debug/irq/irqs/* per chip name.

One example log show 2 duplicated hwirq in the irq debug
file system.

$ sudo cat /sys/kernel/debug/irq/irqs/163
handler:  handle_fasteoi_irq
device:   0019:00:00.0
     <SNIP>
node:     1
affinity: 72-143
effectiv: 76
domain:  irqchip@0x0000100022040000-3
 hwirq:   0xc8000000
 chip:    ITS-MSI
  flags:   0x20

$ sudo cat /sys/kernel/debug/irq/irqs/174
handler:  handle_fasteoi_irq
device:   0039:00:00.0
    <SNIP>
node:     3
affinity: 216-287
effectiv: 221
domain:  irqchip@0x0000300022040000-3
 hwirq:   0xc8000000
 chip:    ITS-MSI
  flags:   0x20

The irq-check.sh can help to collect hwirq and chip name from
/sys/kernel/debug/irq/irqs/* and print error log when find duplicate
hwirq per chip name.

Kernel patch ("PCI/MSI: Fix MSI hwirq truncation") [1] fix above issue.
[1]: https://lore.kernel.org/all/20240115135649.708536-1-vidyas@nvidia.com/

Signed-off-by: Joseph Jang <jjang@nvidia.com>
Reviewed-by: Matthew R. Ochs <mochs@nvidia.com>
---
 tools/testing/selftests/drivers/irq/Makefile  |  5 +++
 tools/testing/selftests/drivers/irq/config    |  2 +
 .../selftests/drivers/irq/irq-check.sh        | 39 +++++++++++++++++++
 3 files changed, 46 insertions(+)
 create mode 100644 tools/testing/selftests/drivers/irq/Makefile
 create mode 100644 tools/testing/selftests/drivers/irq/config
 create mode 100755 tools/testing/selftests/drivers/irq/irq-check.sh

Comments

Joseph Jang Oct. 18, 2024, 4:29 a.m. UTC | #1
On 2024/9/4 9:44 AM, Joseph Jang wrote:
> Validate there are no duplicate hwirq from the irq debug
> file system /sys/kernel/debug/irq/irqs/* per chip name.
> 
> One example log show 2 duplicated hwirq in the irq debug
> file system.
> 
> $ sudo cat /sys/kernel/debug/irq/irqs/163
> handler:  handle_fasteoi_irq
> device:   0019:00:00.0
>       <SNIP>
> node:     1
> affinity: 72-143
> effectiv: 76
> domain:  irqchip@0x0000100022040000-3
>   hwirq:   0xc8000000
>   chip:    ITS-MSI
>    flags:   0x20
> 
> $ sudo cat /sys/kernel/debug/irq/irqs/174
> handler:  handle_fasteoi_irq
> device:   0039:00:00.0
>      <SNIP>
> node:     3
> affinity: 216-287
> effectiv: 221
> domain:  irqchip@0x0000300022040000-3
>   hwirq:   0xc8000000
>   chip:    ITS-MSI
>    flags:   0x20
> 
> The irq-check.sh can help to collect hwirq and chip name from
> /sys/kernel/debug/irq/irqs/* and print error log when find duplicate
> hwirq per chip name.
> 
> Kernel patch ("PCI/MSI: Fix MSI hwirq truncation") [1] fix above issue.
> [1]: https://lore.kernel.org/all/20240115135649.708536-1-vidyas@nvidia.com/
> 
> Signed-off-by: Joseph Jang <jjang@nvidia.com>
> Reviewed-by: Matthew R. Ochs <mochs@nvidia.com>
> ---
>   tools/testing/selftests/drivers/irq/Makefile  |  5 +++
>   tools/testing/selftests/drivers/irq/config    |  2 +
>   .../selftests/drivers/irq/irq-check.sh        | 39 +++++++++++++++++++
>   3 files changed, 46 insertions(+)
>   create mode 100644 tools/testing/selftests/drivers/irq/Makefile
>   create mode 100644 tools/testing/selftests/drivers/irq/config
>   create mode 100755 tools/testing/selftests/drivers/irq/irq-check.sh
> 
> diff --git a/tools/testing/selftests/drivers/irq/Makefile b/tools/testing/selftests/drivers/irq/Makefile
> new file mode 100644
> index 000000000000..d6998017c861
> --- /dev/null
> +++ b/tools/testing/selftests/drivers/irq/Makefile
> @@ -0,0 +1,5 @@
> +# SPDX-License-Identifier: GPL-2.0
> +
> +TEST_PROGS := irq-check.sh
> +
> +include ../../lib.mk
> diff --git a/tools/testing/selftests/drivers/irq/config b/tools/testing/selftests/drivers/irq/config
> new file mode 100644
> index 000000000000..a53d3b713728
> --- /dev/null
> +++ b/tools/testing/selftests/drivers/irq/config
> @@ -0,0 +1,2 @@
> +CONFIG_GENERIC_IRQ_DEBUGFS=y
> +CONFIG_GENERIC_IRQ_INJECTION=y
> diff --git a/tools/testing/selftests/drivers/irq/irq-check.sh b/tools/testing/selftests/drivers/irq/irq-check.sh
> new file mode 100755
> index 000000000000..e784777043a1
> --- /dev/null
> +++ b/tools/testing/selftests/drivers/irq/irq-check.sh
> @@ -0,0 +1,39 @@
> +#!/bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +
> +# This script need root permission
> +uid=$(id -u)
> +if [ $uid -ne 0 ]; then
> +	echo "SKIP: Must be run as root"
> +	exit 4
> +fi
> +
> +# Ensure debugfs is mounted
> +mount -t debugfs nodev /sys/kernel/debug 2>/dev/null
> +if [ ! -d "/sys/kernel/debug/irq/irqs" ]; then
> +	echo "SKIP: irq debugfs not found"
> +	exit 4
> +fi
> +
> +# Traverse the irq debug file system directory to collect chip_name and hwirq
> +hwirq_list=$(for irq_file in /sys/kernel/debug/irq/irqs/*; do
> +	# Read chip name and hwirq from the irq_file
> +	chip_name=$(cat "$irq_file" | grep -m 1 'chip:' | awk '{print $2}')
> +	hwirq=$(cat "$irq_file" | grep -m 1 'hwirq:' | awk '{print $2}' )
> +
> +	if [ -z "$chip_name" ] || [ -z "$hwirq" ]; then
> +		continue
> +	fi
> +
> +	echo "$chip_name $hwirq"
> +done)
> +
> +dup_hwirq_list=$(echo "$hwirq_list" | sort | uniq -cd)
> +
> +if [ -n "$dup_hwirq_list" ]; then
> +	echo "ERROR: Found duplicate hwirq"
> +	echo "$dup_hwirq_list"
> +	exit 1
> +fi
> +
> +exit 0

Hi Tglx,

I follow your suggestions 
https://www.mail-archive.com/linux-kselftest@vger.kernel.org/msg16952.html 
to enable IRQ DEBUG_FS and create a new script to scan duplicated hwirq. 
If you have available time, would you please help to take a look at new 
patch again ?


https://lore.kernel.org/all/20240904014426.3404397-1-jjang@nvidia.com/T/


Hi Shuah,

If you have time, could you help to take a look at the new patch ?


Thank you,
Joseph.
Shuah Khan Oct. 18, 2024, 3:23 p.m. UTC | #2
On 10/17/24 22:29, Joseph Jang wrote:
> 
> 
> On 2024/9/4 9:44 AM, Joseph Jang wrote:
>> Validate there are no duplicate hwirq from the irq debug
>> file system /sys/kernel/debug/irq/irqs/* per chip name.
>>
>> One example log show 2 duplicated hwirq in the irq debug
>> file system.
>>
>> $ sudo cat /sys/kernel/debug/irq/irqs/163
>> handler:  handle_fasteoi_irq
>> device:   0019:00:00.0
>>       <SNIP>
>> node:     1
>> affinity: 72-143
>> effectiv: 76
>> domain:  irqchip@0x0000100022040000-3
>>   hwirq:   0xc8000000
>>   chip:    ITS-MSI
>>    flags:   0x20
>>
>> $ sudo cat /sys/kernel/debug/irq/irqs/174
>> handler:  handle_fasteoi_irq
>> device:   0039:00:00.0
>>      <SNIP>
>> node:     3
>> affinity: 216-287
>> effectiv: 221
>> domain:  irqchip@0x0000300022040000-3
>>   hwirq:   0xc8000000
>>   chip:    ITS-MSI
>>    flags:   0x20
>>
>> The irq-check.sh can help to collect hwirq and chip name from
>> /sys/kernel/debug/irq/irqs/* and print error log when find duplicate
>> hwirq per chip name.
>>
>> Kernel patch ("PCI/MSI: Fix MSI hwirq truncation") [1] fix above issue.
>> [1]: https://lore.kernel.org/all/20240115135649.708536-1-vidyas@nvidia.com/
>>
>> Signed-off-by: Joseph Jang <jjang@nvidia.com>
>> Reviewed-by: Matthew R. Ochs <mochs@nvidia.com>
>> ---
>>   tools/testing/selftests/drivers/irq/Makefile  |  5 +++
>>   tools/testing/selftests/drivers/irq/config    |  2 +
>>   .../selftests/drivers/irq/irq-check.sh        | 39 +++++++++++++++++++
>>   3 files changed, 46 insertions(+)
>>   create mode 100644 tools/testing/selftests/drivers/irq/Makefile
>>   create mode 100644 tools/testing/selftests/drivers/irq/config
>>   create mode 100755 tools/testing/selftests/drivers/irq/irq-check.sh
>>
>> diff --git a/tools/testing/selftests/drivers/irq/Makefile b/tools/testing/selftests/drivers/irq/Makefile
>> new file mode 100644
>> index 000000000000..d6998017c861
>> --- /dev/null
>> +++ b/tools/testing/selftests/drivers/irq/Makefile
>> @@ -0,0 +1,5 @@
>> +# SPDX-License-Identifier: GPL-2.0
>> +
>> +TEST_PROGS := irq-check.sh
>> +
>> +include ../../lib.mk
>> diff --git a/tools/testing/selftests/drivers/irq/config b/tools/testing/selftests/drivers/irq/config
>> new file mode 100644
>> index 000000000000..a53d3b713728
>> --- /dev/null
>> +++ b/tools/testing/selftests/drivers/irq/config
>> @@ -0,0 +1,2 @@
>> +CONFIG_GENERIC_IRQ_DEBUGFS=y
>> +CONFIG_GENERIC_IRQ_INJECTION=y
>> diff --git a/tools/testing/selftests/drivers/irq/irq-check.sh b/tools/testing/selftests/drivers/irq/irq-check.sh
>> new file mode 100755
>> index 000000000000..e784777043a1
>> --- /dev/null
>> +++ b/tools/testing/selftests/drivers/irq/irq-check.sh
>> @@ -0,0 +1,39 @@
>> +#!/bin/bash
>> +# SPDX-License-Identifier: GPL-2.0
>> +
>> +# This script need root permission
>> +uid=$(id -u)
>> +if [ $uid -ne 0 ]; then
>> +    echo "SKIP: Must be run as root"
>> +    exit 4
>> +fi
>> +
>> +# Ensure debugfs is mounted
>> +mount -t debugfs nodev /sys/kernel/debug 2>/dev/null
>> +if [ ! -d "/sys/kernel/debug/irq/irqs" ]; then
>> +    echo "SKIP: irq debugfs not found"
>> +    exit 4
>> +fi
>> +
>> +# Traverse the irq debug file system directory to collect chip_name and hwirq
>> +hwirq_list=$(for irq_file in /sys/kernel/debug/irq/irqs/*; do
>> +    # Read chip name and hwirq from the irq_file
>> +    chip_name=$(cat "$irq_file" | grep -m 1 'chip:' | awk '{print $2}')
>> +    hwirq=$(cat "$irq_file" | grep -m 1 'hwirq:' | awk '{print $2}' )
>> +
>> +    if [ -z "$chip_name" ] || [ -z "$hwirq" ]; then
>> +        continue
>> +    fi
>> +
>> +    echo "$chip_name $hwirq"
>> +done)
>> +
>> +dup_hwirq_list=$(echo "$hwirq_list" | sort | uniq -cd)
>> +
>> +if [ -n "$dup_hwirq_list" ]; then
>> +    echo "ERROR: Found duplicate hwirq"
>> +    echo "$dup_hwirq_list"
>> +    exit 1
>> +fi
>> +
>> +exit 0
> 
> Hi Tglx,
> 
> I follow your suggestions https://www.mail-archive.com/linux-kselftest@vger.kernel.org/msg16952.html to enable IRQ DEBUG_FS and create a new script to scan duplicated hwirq. If you have available time, would you please help to take a look at new patch again ?
> 
> 
> https://lore.kernel.org/all/20240904014426.3404397-1-jjang@nvidia.com/T/
> 
> 
> Hi Shuah,
> 
> If you have time, could you help to take a look at the new patch ?
> 

Once Thomas reviews this and gives me okay - I will accept the patch.

thanks,
-- Shuah
Bjorn Helgaas Oct. 18, 2024, 7:34 p.m. UTC | #3
On Tue, Sep 03, 2024 at 06:44:26PM -0700, Joseph Jang wrote:
> Validate there are no duplicate hwirq from the irq debug
> file system /sys/kernel/debug/irq/irqs/* per chip name.
> 
> One example log show 2 duplicated hwirq in the irq debug
> file system.
> 
> $ sudo cat /sys/kernel/debug/irq/irqs/163
> handler:  handle_fasteoi_irq
> device:   0019:00:00.0
>      <SNIP>
> node:     1
> affinity: 72-143
> effectiv: 76
> domain:  irqchip@0x0000100022040000-3
>  hwirq:   0xc8000000
>  chip:    ITS-MSI
>   flags:   0x20
> 
> $ sudo cat /sys/kernel/debug/irq/irqs/174
> handler:  handle_fasteoi_irq
> device:   0039:00:00.0
>     <SNIP>
> node:     3
> affinity: 216-287
> effectiv: 221
> domain:  irqchip@0x0000300022040000-3
>  hwirq:   0xc8000000
>  chip:    ITS-MSI
>   flags:   0x20
> 
> The irq-check.sh can help to collect hwirq and chip name from
> /sys/kernel/debug/irq/irqs/* and print error log when find duplicate
> hwirq per chip name.
> 
> Kernel patch ("PCI/MSI: Fix MSI hwirq truncation") [1] fix above issue.
> [1]: https://lore.kernel.org/all/20240115135649.708536-1-vidyas@nvidia.com/

I don't know enough about this issue to understand the details.  It
seems like you look for duplicate hwirqs in chips with the same name,
e.g., "ITS-MSI" in this case?  That name seems too generic to me
(might there be several instances of "ITS-MSI" in a system?)

Also, the name may come from chip->irq_print_chip(), so it apparently
relies on irqchip drivers to make the names unique if there are
multiple instances?

I would have expected looking for duplicates inside something more
specific, like "irqchip@0x0000300022040000-3".  But again, I don't
know enough about the problem to speak confidently here.

Cosmetic nits:

  - Tweak subject to match history (use "git log --oneline
    tools/testing/selftests/drivers/" to see it), e.g.,

      selftests: irq: Add check for duplicate hwirq

  - Rewrap commit log to fill 75 columns.  No point in using shorter
    lines.

  - Indent the "$ sudu cat ..." block by a couple spaces since it's
    effectively a quotation, not part of the main text body.

  - Possibly include sample output of irq-check.sh (also indented as a
    quote) when run on the system where you manually found the
    duplicate via "sudo cat /sys/kernel/debug/irq/irqs/..."

  - Reword "The irq-check.sh can help ..." to something like this:

      Add an irq-check.sh test to report errors when there are
      duplicate hwirqs per chip name.

  - Since the kernel patch has already been merged, cite it like this
    instead of using the https://lore URL:

      db744ddd59be ("PCI/MSI: Prevent MSI hardware interrupt number truncation")

> Signed-off-by: Joseph Jang <jjang@nvidia.com>
> Reviewed-by: Matthew R. Ochs <mochs@nvidia.com>
> ---
>  tools/testing/selftests/drivers/irq/Makefile  |  5 +++
>  tools/testing/selftests/drivers/irq/config    |  2 +
>  .../selftests/drivers/irq/irq-check.sh        | 39 +++++++++++++++++++
>  3 files changed, 46 insertions(+)
>  create mode 100644 tools/testing/selftests/drivers/irq/Makefile
>  create mode 100644 tools/testing/selftests/drivers/irq/config
>  create mode 100755 tools/testing/selftests/drivers/irq/irq-check.sh
> 
> diff --git a/tools/testing/selftests/drivers/irq/Makefile b/tools/testing/selftests/drivers/irq/Makefile
> new file mode 100644
> index 000000000000..d6998017c861
> --- /dev/null
> +++ b/tools/testing/selftests/drivers/irq/Makefile
> @@ -0,0 +1,5 @@
> +# SPDX-License-Identifier: GPL-2.0
> +
> +TEST_PROGS := irq-check.sh
> +
> +include ../../lib.mk
> diff --git a/tools/testing/selftests/drivers/irq/config b/tools/testing/selftests/drivers/irq/config
> new file mode 100644
> index 000000000000..a53d3b713728
> --- /dev/null
> +++ b/tools/testing/selftests/drivers/irq/config
> @@ -0,0 +1,2 @@
> +CONFIG_GENERIC_IRQ_DEBUGFS=y
> +CONFIG_GENERIC_IRQ_INJECTION=y
> diff --git a/tools/testing/selftests/drivers/irq/irq-check.sh b/tools/testing/selftests/drivers/irq/irq-check.sh
> new file mode 100755
> index 000000000000..e784777043a1
> --- /dev/null
> +++ b/tools/testing/selftests/drivers/irq/irq-check.sh
> @@ -0,0 +1,39 @@
> +#!/bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +
> +# This script need root permission
> +uid=$(id -u)
> +if [ $uid -ne 0 ]; then
> +	echo "SKIP: Must be run as root"
> +	exit 4
> +fi
> +
> +# Ensure debugfs is mounted
> +mount -t debugfs nodev /sys/kernel/debug 2>/dev/null
> +if [ ! -d "/sys/kernel/debug/irq/irqs" ]; then
> +	echo "SKIP: irq debugfs not found"
> +	exit 4
> +fi
> +
> +# Traverse the irq debug file system directory to collect chip_name and hwirq
> +hwirq_list=$(for irq_file in /sys/kernel/debug/irq/irqs/*; do
> +	# Read chip name and hwirq from the irq_file
> +	chip_name=$(cat "$irq_file" | grep -m 1 'chip:' | awk '{print $2}')
> +	hwirq=$(cat "$irq_file" | grep -m 1 'hwirq:' | awk '{print $2}' )
> +
> +	if [ -z "$chip_name" ] || [ -z "$hwirq" ]; then
> +		continue
> +	fi
> +
> +	echo "$chip_name $hwirq"
> +done)
> +
> +dup_hwirq_list=$(echo "$hwirq_list" | sort | uniq -cd)
> +
> +if [ -n "$dup_hwirq_list" ]; then
> +	echo "ERROR: Found duplicate hwirq"
> +	echo "$dup_hwirq_list"
> +	exit 1
> +fi
> +
> +exit 0
> -- 
> 2.34.1
>
Joseph Jang Nov. 11, 2024, 7:21 a.m. UTC | #4
On 2024/10/19 3:34 AM, Bjorn Helgaas wrote:
> On Tue, Sep 03, 2024 at 06:44:26PM -0700, Joseph Jang wrote:
>> Validate there are no duplicate hwirq from the irq debug
>> file system /sys/kernel/debug/irq/irqs/* per chip name.
>>
>> One example log show 2 duplicated hwirq in the irq debug
>> file system.
>>
>> $ sudo cat /sys/kernel/debug/irq/irqs/163
>> handler:  handle_fasteoi_irq
>> device:   0019:00:00.0
>>       <SNIP>
>> node:     1
>> affinity: 72-143
>> effectiv: 76
>> domain:  irqchip@0x0000100022040000-3
>>   hwirq:   0xc8000000
>>   chip:    ITS-MSI
>>    flags:   0x20
>>
>> $ sudo cat /sys/kernel/debug/irq/irqs/174
>> handler:  handle_fasteoi_irq
>> device:   0039:00:00.0
>>      <SNIP>
>> node:     3
>> affinity: 216-287
>> effectiv: 221
>> domain:  irqchip@0x0000300022040000-3
>>   hwirq:   0xc8000000
>>   chip:    ITS-MSI
>>    flags:   0x20
>>
>> The irq-check.sh can help to collect hwirq and chip name from
>> /sys/kernel/debug/irq/irqs/* and print error log when find duplicate
>> hwirq per chip name.
>>
>> Kernel patch ("PCI/MSI: Fix MSI hwirq truncation") [1] fix above issue.
>> [1]: https://lore.kernel.org/all/20240115135649.708536-1-vidyas@nvidia.com/
> 
> I don't know enough about this issue to understand the details.  It
> seems like you look for duplicate hwirqs in chips with the same name,
> e.g., "ITS-MSI" in this case?  That name seems too generic to me
> (might there be several instances of "ITS-MSI" in a system?)
> 

As I know, each PCIe device typically has only one ITS-MSI controller.
Having multiple ITS-MSI instances for the same device would lead to 
confusion and potential conflicts in interrupt routing.

> Also, the name may come from chip->irq_print_chip(), so it apparently
> relies on irqchip drivers to make the names unique if there are
> multiple instances?
> 
> I would have expected looking for duplicates inside something more
> specific, like "irqchip@0x0000300022040000-3".  But again, I don't
> know enough about the problem to speak confidently here.
>

In our case, If we look for duplicates by different irq domains like
"irqchip@0x0000100022040000-3" and "irqchip@0x0000300022040000-3" as 
following example.

     $ sudo cat /sys/kernel/debug/irq/irqs/163
     handler:  handle_fasteoi_irq
     device:   0019:00:00.0
          <SNIP>
     node:     1
     affinity: 72-143
     effectiv: 76
     domain:  irqchip@0x0000100022040000-3
      hwirq:   0xc8000000
      chip:    ITS-MSI
       flags:   0x20
     $ sudo cat /sys/kernel/debug/irq/irqs/174
     handler:  handle_fasteoi_irq
     device:   0039:00:00.0
         <SNIP>
     node:     3
     affinity: 216-287
     effectiv: 221
     domain:  irqchip@0x0000300022040000-3
      hwirq:   0xc8000000
      chip:    ITS-MSI
       flags:   0x20

We could not detect the duplicated hwirq number (0xc8000000) in this case.


> Cosmetic nits:
> 
>    - Tweak subject to match history (use "git log --oneline
>      tools/testing/selftests/drivers/" to see it), e.g.,
> 
>        selftests: irq: Add check for duplicate hwirq
> 
>    - Rewrap commit log to fill 75 columns.  No point in using shorter
>      lines.
> 
>    - Indent the "$ sudu cat ..." block by a couple spaces since it's
>      effectively a quotation, not part of the main text body.
> 
>    - Possibly include sample output of irq-check.sh (also indented as a
>      quote) when run on the system where you manually found the
>      duplicate via "sudo cat /sys/kernel/debug/irq/irqs/..."
> 
>    - Reword "The irq-check.sh can help ..." to something like this:
> 
>        Add an irq-check.sh test to report errors when there are
>        duplicate hwirqs per chip name.
> 
>    - Since the kernel patch has already been merged, cite it like this
>      instead of using the https://lore URL:
> 
>        db744ddd59be ("PCI/MSI: Prevent MSI hardware interrupt number truncation")
> 

If you agree to use irq chip name ("ITS-MSI") to scan duplicate hwirq, I
could send version 2 patch to fix above suggestions.


Thank you,
Joseph.

>> Signed-off-by: Joseph Jang <jjang@nvidia.com>
>> Reviewed-by: Matthew R. Ochs <mochs@nvidia.com>
>> ---
>>   tools/testing/selftests/drivers/irq/Makefile  |  5 +++
>>   tools/testing/selftests/drivers/irq/config    |  2 +
>>   .../selftests/drivers/irq/irq-check.sh        | 39 +++++++++++++++++++
>>   3 files changed, 46 insertions(+)
>>   create mode 100644 tools/testing/selftests/drivers/irq/Makefile
>>   create mode 100644 tools/testing/selftests/drivers/irq/config
>>   create mode 100755 tools/testing/selftests/drivers/irq/irq-check.sh
>>
>> diff --git a/tools/testing/selftests/drivers/irq/Makefile b/tools/testing/selftests/drivers/irq/Makefile
>> new file mode 100644
>> index 000000000000..d6998017c861
>> --- /dev/null
>> +++ b/tools/testing/selftests/drivers/irq/Makefile
>> @@ -0,0 +1,5 @@
>> +# SPDX-License-Identifier: GPL-2.0
>> +
>> +TEST_PROGS := irq-check.sh
>> +
>> +include ../../lib.mk
>> diff --git a/tools/testing/selftests/drivers/irq/config b/tools/testing/selftests/drivers/irq/config
>> new file mode 100644
>> index 000000000000..a53d3b713728
>> --- /dev/null
>> +++ b/tools/testing/selftests/drivers/irq/config
>> @@ -0,0 +1,2 @@
>> +CONFIG_GENERIC_IRQ_DEBUGFS=y
>> +CONFIG_GENERIC_IRQ_INJECTION=y
>> diff --git a/tools/testing/selftests/drivers/irq/irq-check.sh b/tools/testing/selftests/drivers/irq/irq-check.sh
>> new file mode 100755
>> index 000000000000..e784777043a1
>> --- /dev/null
>> +++ b/tools/testing/selftests/drivers/irq/irq-check.sh
>> @@ -0,0 +1,39 @@
>> +#!/bin/bash
>> +# SPDX-License-Identifier: GPL-2.0
>> +
>> +# This script need root permission
>> +uid=$(id -u)
>> +if [ $uid -ne 0 ]; then
>> +	echo "SKIP: Must be run as root"
>> +	exit 4
>> +fi
>> +
>> +# Ensure debugfs is mounted
>> +mount -t debugfs nodev /sys/kernel/debug 2>/dev/null
>> +if [ ! -d "/sys/kernel/debug/irq/irqs" ]; then
>> +	echo "SKIP: irq debugfs not found"
>> +	exit 4
>> +fi
>> +
>> +# Traverse the irq debug file system directory to collect chip_name and hwirq
>> +hwirq_list=$(for irq_file in /sys/kernel/debug/irq/irqs/*; do
>> +	# Read chip name and hwirq from the irq_file
>> +	chip_name=$(cat "$irq_file" | grep -m 1 'chip:' | awk '{print $2}')
>> +	hwirq=$(cat "$irq_file" | grep -m 1 'hwirq:' | awk '{print $2}' )
>> +
>> +	if [ -z "$chip_name" ] || [ -z "$hwirq" ]; then
>> +		continue
>> +	fi
>> +
>> +	echo "$chip_name $hwirq"
>> +done)
>> +
>> +dup_hwirq_list=$(echo "$hwirq_list" | sort | uniq -cd)
>> +
>> +if [ -n "$dup_hwirq_list" ]; then
>> +	echo "ERROR: Found duplicate hwirq"
>> +	echo "$dup_hwirq_list"
>> +	exit 1
>> +fi
>> +
>> +exit 0
>> -- 
>> 2.34.1
>>
>
Bjorn Helgaas Nov. 22, 2024, 5:54 p.m. UTC | #5
On Mon, Nov 11, 2024 at 03:21:36PM +0800, Joseph Jang wrote:
> On 2024/10/19 3:34 AM, Bjorn Helgaas wrote:
> > On Tue, Sep 03, 2024 at 06:44:26PM -0700, Joseph Jang wrote:
> > > Validate there are no duplicate hwirq from the irq debug
> > > file system /sys/kernel/debug/irq/irqs/* per chip name.
> > > 
> > > One example log show 2 duplicated hwirq in the irq debug
> > > file system.
> > > 
> > > $ sudo cat /sys/kernel/debug/irq/irqs/163
> > > handler:  handle_fasteoi_irq
> > > device:   0019:00:00.0
> > >       <SNIP>
> > > node:     1
> > > affinity: 72-143
> > > effectiv: 76
> > > domain:  irqchip@0x0000100022040000-3
> > >   hwirq:   0xc8000000
> > >   chip:    ITS-MSI
> > >    flags:   0x20
> > > 
> > > $ sudo cat /sys/kernel/debug/irq/irqs/174
> > > handler:  handle_fasteoi_irq
> > > device:   0039:00:00.0
> > >      <SNIP>
> > > node:     3
> > > affinity: 216-287
> > > effectiv: 221
> > > domain:  irqchip@0x0000300022040000-3
> > >   hwirq:   0xc8000000
> > >   chip:    ITS-MSI
> > >    flags:   0x20
> > > 
> > > The irq-check.sh can help to collect hwirq and chip name from
> > > /sys/kernel/debug/irq/irqs/* and print error log when find duplicate
> > > hwirq per chip name.
> > > 
> > > Kernel patch ("PCI/MSI: Fix MSI hwirq truncation") [1] fix above issue.
> > > [1]: https://lore.kernel.org/all/20240115135649.708536-1-vidyas@nvidia.com/
> > 
> > I don't know enough about this issue to understand the details.  It
> > seems like you look for duplicate hwirqs in chips with the same name,
> > e.g., "ITS-MSI" in this case?  That name seems too generic to me
> > (might there be several instances of "ITS-MSI" in a system?)
> 
> As I know, each PCIe device typically has only one ITS-MSI controller.
> Having multiple ITS-MSI instances for the same device would lead to
> confusion and potential conflicts in interrupt routing.
> 
> > Also, the name may come from chip->irq_print_chip(), so it apparently
> > relies on irqchip drivers to make the names unique if there are
> > multiple instances?
> > 
> > I would have expected looking for duplicates inside something more
> > specific, like "irqchip@0x0000300022040000-3".  But again, I don't
> > know enough about the problem to speak confidently here.
> 
> In our case, If we look for duplicates by different irq domains like
> "irqchip@0x0000100022040000-3" and "irqchip@0x0000300022040000-3" as
> following example.
> 
>     $ sudo cat /sys/kernel/debug/irq/irqs/163
>     handler:  handle_fasteoi_irq
>     device:   0019:00:00.0
>          <SNIP>
>     node:     1
>     affinity: 72-143
>     effectiv: 76
>     domain:  irqchip@0x0000100022040000-3
>      hwirq:   0xc8000000
>      chip:    ITS-MSI
>       flags:   0x20
>     $ sudo cat /sys/kernel/debug/irq/irqs/174
>     handler:  handle_fasteoi_irq
>     device:   0039:00:00.0
>         <SNIP>
>     node:     3
>     affinity: 216-287
>     effectiv: 221
>     domain:  irqchip@0x0000300022040000-3
>      hwirq:   0xc8000000
>      chip:    ITS-MSI
>       flags:   0x20
> 
> We could not detect the duplicated hwirq number (0xc8000000) in this
> case.

Again, this is really out of my area, but based on
Documentation/core-api/irq/irq-domain.rst, I assumed the point of
hwirq was that hwirq numbers were local to an interrupt controller,
i.e., to an irq_domain.

If that's the case, it should not be a problem if hwirq number
0xc8000000 is used in two separate irq_domains.

Bjorn
diff mbox series

Patch

diff --git a/tools/testing/selftests/drivers/irq/Makefile b/tools/testing/selftests/drivers/irq/Makefile
new file mode 100644
index 000000000000..d6998017c861
--- /dev/null
+++ b/tools/testing/selftests/drivers/irq/Makefile
@@ -0,0 +1,5 @@ 
+# SPDX-License-Identifier: GPL-2.0
+
+TEST_PROGS := irq-check.sh
+
+include ../../lib.mk
diff --git a/tools/testing/selftests/drivers/irq/config b/tools/testing/selftests/drivers/irq/config
new file mode 100644
index 000000000000..a53d3b713728
--- /dev/null
+++ b/tools/testing/selftests/drivers/irq/config
@@ -0,0 +1,2 @@ 
+CONFIG_GENERIC_IRQ_DEBUGFS=y
+CONFIG_GENERIC_IRQ_INJECTION=y
diff --git a/tools/testing/selftests/drivers/irq/irq-check.sh b/tools/testing/selftests/drivers/irq/irq-check.sh
new file mode 100755
index 000000000000..e784777043a1
--- /dev/null
+++ b/tools/testing/selftests/drivers/irq/irq-check.sh
@@ -0,0 +1,39 @@ 
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+# This script need root permission
+uid=$(id -u)
+if [ $uid -ne 0 ]; then
+	echo "SKIP: Must be run as root"
+	exit 4
+fi
+
+# Ensure debugfs is mounted
+mount -t debugfs nodev /sys/kernel/debug 2>/dev/null
+if [ ! -d "/sys/kernel/debug/irq/irqs" ]; then
+	echo "SKIP: irq debugfs not found"
+	exit 4
+fi
+
+# Traverse the irq debug file system directory to collect chip_name and hwirq
+hwirq_list=$(for irq_file in /sys/kernel/debug/irq/irqs/*; do
+	# Read chip name and hwirq from the irq_file
+	chip_name=$(cat "$irq_file" | grep -m 1 'chip:' | awk '{print $2}')
+	hwirq=$(cat "$irq_file" | grep -m 1 'hwirq:' | awk '{print $2}' )
+
+	if [ -z "$chip_name" ] || [ -z "$hwirq" ]; then
+		continue
+	fi
+
+	echo "$chip_name $hwirq"
+done)
+
+dup_hwirq_list=$(echo "$hwirq_list" | sort | uniq -cd)
+
+if [ -n "$dup_hwirq_list" ]; then
+	echo "ERROR: Found duplicate hwirq"
+	echo "$dup_hwirq_list"
+	exit 1
+fi
+
+exit 0