diff mbox series

[RFC,v2,27/35] ACPICA: Add new MADT GICC flags fields [code first?]

Message ID 20230913163823.7880-28-james.morse@arm.com
State New
Headers show
Series ACPI/arm64: add support for virtual cpuhotplug | expand

Commit Message

James Morse Sept. 13, 2023, 4:38 p.m. UTC
Add the new flag field to the MADT's GICC structure.

'Online Capable' indicates a disabled CPU can be enabled later.

Signed-off-by: James Morse <james.morse@arm.com>
---
This patch probably needs to go via the upstream acpica project,
but is included here so the feature can be testd.
---
 include/acpi/actbl2.h | 1 +
 1 file changed, 1 insertion(+)

Comments

Ard Biesheuvel Sept. 14, 2023, 7:57 a.m. UTC | #1
Hello James,

On Wed, 13 Sept 2023 at 18:41, James Morse <james.morse@arm.com> wrote:
>
> Add the new flag field to the MADT's GICC structure.
>
> 'Online Capable' indicates a disabled CPU can be enabled later.
>

Why do we need a bit for this? What would be the point of describing
disabled CPUs that cannot be enabled (and are you are aware of
firmware doing this?).

So why are we not able to assume that this new bit can always be treated as '1'?


> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> This patch probably needs to go via the upstream acpica project,
> but is included here so the feature can be testd.
> ---
>  include/acpi/actbl2.h | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/include/acpi/actbl2.h b/include/acpi/actbl2.h
> index 3751ae69432f..c433a079d8e1 100644
> --- a/include/acpi/actbl2.h
> +++ b/include/acpi/actbl2.h
> @@ -1046,6 +1046,7 @@ struct acpi_madt_generic_interrupt {
>  /* ACPI_MADT_ENABLED                    (1)      Processor is usable if set */
>  #define ACPI_MADT_PERFORMANCE_IRQ_MODE  (1<<1) /* 01: Performance Interrupt Mode */
>  #define ACPI_MADT_VGIC_IRQ_MODE         (1<<2) /* 02: VGIC Maintenance Interrupt mode */
> +#define ACPI_MADT_GICC_CPU_CAPABLE      (1<<3) /* 03: CPU is online capable */
>
>  /* 12: Generic Distributor (ACPI 5.0 + ACPI 6.0 changes) */
>
> --
> 2.39.2
>
Jonathan Cameron Sept. 14, 2023, 2:48 p.m. UTC | #2
On Wed, 13 Sep 2023 16:38:15 +0000
James Morse <james.morse@arm.com> wrote:

> Add the new flag field to the MADT's GICC structure.
> 
> 'Online Capable' indicates a disabled CPU can be enabled later.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
Why [code first?] it's in ACPI 6.5
https://uefi.org/sites/default/files/resources/ACPI_Spec_6_5_Aug29.pdf

Spec reference would be good though. It's 6.5 Tabel 5.37: GICC CPU Interface Flags
I think

> ---
> This patch probably needs to go via the upstream acpica project,
> but is included here so the feature can be testd.
tested

> ---
>  include/acpi/actbl2.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/include/acpi/actbl2.h b/include/acpi/actbl2.h
> index 3751ae69432f..c433a079d8e1 100644
> --- a/include/acpi/actbl2.h
> +++ b/include/acpi/actbl2.h
> @@ -1046,6 +1046,7 @@ struct acpi_madt_generic_interrupt {
>  /* ACPI_MADT_ENABLED                    (1)      Processor is usable if set */
>  #define ACPI_MADT_PERFORMANCE_IRQ_MODE  (1<<1)	/* 01: Performance Interrupt Mode */
>  #define ACPI_MADT_VGIC_IRQ_MODE         (1<<2)	/* 02: VGIC Maintenance Interrupt mode */
> +#define ACPI_MADT_GICC_CPU_CAPABLE      (1<<3)	/* 03: CPU is online capable */
bikeshed colour time....

It's capable of being a CPU?

ACPI_MADT_GICC_ONLINE_CAPABLE 

GICC already tells us it's a CPU (last C) despite the table in ACPI being labeled
Table 5.37: GICC CPU Interface table



>  
>  /* 12: Generic Distributor (ACPI 5.0 + ACPI 6.0 changes) */
>
Jonathan Cameron Sept. 14, 2023, 2:54 p.m. UTC | #3
On Thu, 14 Sep 2023 09:57:44 +0200
Ard Biesheuvel <ardb@kernel.org> wrote:

> Hello James,
> 
> On Wed, 13 Sept 2023 at 18:41, James Morse <james.morse@arm.com> wrote:
> >
> > Add the new flag field to the MADT's GICC structure.
> >
> > 'Online Capable' indicates a disabled CPU can be enabled later.
> >  
> 
> Why do we need a bit for this? What would be the point of describing
> disabled CPUs that cannot be enabled (and are you are aware of
> firmware doing this?).

Enabled being not set is common at some similar ACPI tables at least.

This is available in most ACPI tables to allow firmware to use 'nearly'
static tables and just tweak the 'enabled' bit to say if the record should
be ignored or not. Also _STA not present which is for same trick.
If you are doing clever dynamic tables, then you can just not present 
the entry.

With that existing use case in mind, need another bit to say this
one might one day turn up.  Note this is copied from x86 though no
one seems to have implemented the kernel support for them yet.

Note as per my other reply - this isn't a code first proposal. It's in the
spec already (via a code first proposal last year I think).

> 
> So why are we not able to assume that this new bit can always be treated as '1'?

Given above, need the extra bit to size stuff to allow for the CPU showing up
late.


> 
> 
> > Signed-off-by: James Morse <james.morse@arm.com>
> > ---
> > This patch probably needs to go via the upstream acpica project,
> > but is included here so the feature can be testd.
> > ---
> >  include/acpi/actbl2.h | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/include/acpi/actbl2.h b/include/acpi/actbl2.h
> > index 3751ae69432f..c433a079d8e1 100644
> > --- a/include/acpi/actbl2.h
> > +++ b/include/acpi/actbl2.h
> > @@ -1046,6 +1046,7 @@ struct acpi_madt_generic_interrupt {
> >  /* ACPI_MADT_ENABLED                    (1)      Processor is usable if set */
> >  #define ACPI_MADT_PERFORMANCE_IRQ_MODE  (1<<1) /* 01: Performance Interrupt Mode */
> >  #define ACPI_MADT_VGIC_IRQ_MODE         (1<<2) /* 02: VGIC Maintenance Interrupt mode */
> > +#define ACPI_MADT_GICC_CPU_CAPABLE      (1<<3) /* 03: CPU is online capable */
> >
> >  /* 12: Generic Distributor (ACPI 5.0 + ACPI 6.0 changes) */
> >
> > --
> > 2.39.2
> >  
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
Ard Biesheuvel Sept. 14, 2023, 3:34 p.m. UTC | #4
On Thu, 14 Sept 2023 at 16:55, Jonathan Cameron
<Jonathan.Cameron@huawei.com> wrote:
>
> On Thu, 14 Sep 2023 09:57:44 +0200
> Ard Biesheuvel <ardb@kernel.org> wrote:
>
> > Hello James,
> >
> > On Wed, 13 Sept 2023 at 18:41, James Morse <james.morse@arm.com> wrote:
> > >
> > > Add the new flag field to the MADT's GICC structure.
> > >
> > > 'Online Capable' indicates a disabled CPU can be enabled later.
> > >
> >
> > Why do we need a bit for this? What would be the point of describing
> > disabled CPUs that cannot be enabled (and are you are aware of
> > firmware doing this?).
>
> Enabled being not set is common at some similar ACPI tables at least.
>
> This is available in most ACPI tables to allow firmware to use 'nearly'
> static tables and just tweak the 'enabled' bit to say if the record should
> be ignored or not. Also _STA not present which is for same trick.
> If you are doing clever dynamic tables, then you can just not present
> the entry.
>
> With that existing use case in mind, need another bit to say this
> one might one day turn up.  Note this is copied from x86 though no
> one seems to have implemented the kernel support for them yet.
>
> Note as per my other reply - this isn't a code first proposal. It's in the
> spec already (via a code first proposal last year I think).
>
> >
> > So why are we not able to assume that this new bit can always be treated as '1'?
>
> Given above, need the extra bit to size stuff to allow for the CPU showing up
> late.
>

So does this mean that on x86, the CPU object is instantiated only
when the hardware level hotplug occurs? And before that, the object
does not exist at all?

Because it seems to me that _STA, having both enabled and present
bits, could already describe what we need here, and arguably, a CPU
that is not both present and enabled should not be used by the OS.
This would leave room for representing off-line CPUs as present but
not enabled.

Apologies if I am missing something obvious here - the whole rationale
behind this thing is rather confusing to me.
Russell King (Oracle) Sept. 14, 2023, 3:49 p.m. UTC | #5
On Thu, Sep 14, 2023 at 05:34:25PM +0200, Ard Biesheuvel wrote:
> On Thu, 14 Sept 2023 at 16:55, Jonathan Cameron
> <Jonathan.Cameron@huawei.com> wrote:
> >
> > On Thu, 14 Sep 2023 09:57:44 +0200
> > Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > > Hello James,
> > >
> > > On Wed, 13 Sept 2023 at 18:41, James Morse <james.morse@arm.com> wrote:
> > > >
> > > > Add the new flag field to the MADT's GICC structure.
> > > >
> > > > 'Online Capable' indicates a disabled CPU can be enabled later.
> > > >
> > >
> > > Why do we need a bit for this? What would be the point of describing
> > > disabled CPUs that cannot be enabled (and are you are aware of
> > > firmware doing this?).
> >
> > Enabled being not set is common at some similar ACPI tables at least.
> >
> > This is available in most ACPI tables to allow firmware to use 'nearly'
> > static tables and just tweak the 'enabled' bit to say if the record should
> > be ignored or not. Also _STA not present which is for same trick.
> > If you are doing clever dynamic tables, then you can just not present
> > the entry.
> >
> > With that existing use case in mind, need another bit to say this
> > one might one day turn up.  Note this is copied from x86 though no
> > one seems to have implemented the kernel support for them yet.
> >
> > Note as per my other reply - this isn't a code first proposal. It's in the
> > spec already (via a code first proposal last year I think).
> >
> > >
> > > So why are we not able to assume that this new bit can always be treated as '1'?
> >
> > Given above, need the extra bit to size stuff to allow for the CPU showing up
> > late.
> >
> 
> So does this mean that on x86, the CPU object is instantiated only
> when the hardware level hotplug occurs? And before that, the object
> does not exist at all?
> 
> Because it seems to me that _STA, having both enabled and present
> bits, could already describe what we need here, and arguably, a CPU
> that is not both present and enabled should not be used by the OS.
> This would leave room for representing off-line CPUs as present but
> not enabled.
> 
> Apologies if I am missing something obvious here - the whole rationale
> behind this thing is rather confusing to me.

Note that the bit is in the ACPI spec:

https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model.html#gicc-cpu-interface-flags

The new bit has the same description as per the local-APIC equivalent:

https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model.html#local-apic-flags

for a popular architecture that does have hot-pluggable physical CPUs ;)
Salil Mehta Sept. 15, 2023, 2:29 a.m. UTC | #6
Hi Ard,

> From: Ard Biesheuvel <ardb@kernel.org>
> Sent: Thursday, September 14, 2023 4:34 PM
> To: Jonathan Cameron <jonathan.cameron@huawei.com>
> Cc: James Morse <james.morse@arm.com>; linux-pm@vger.kernel.org;
> loongarch@lists.linux.dev; linux-acpi@vger.kernel.org; linux-
> arch@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm-
> kernel@lists.infradead.org; linux-riscv@lists.infradead.org;
> kvmarm@lists.linux.dev; x86@kernel.org; Salil Mehta
> <salil.mehta@huawei.com>; Russell King <linux@armlinux.org.uk>; Jean-
> Philippe Brucker <jean-philippe@linaro.org>; jianyong.wu@arm.com;
> justin.he@arm.com
> Subject: Re: [RFC PATCH v2 27/35] ACPICA: Add new MADT GICC flags fields
> [code first?]
> 
> On Thu, 14 Sept 2023 at 16:55, Jonathan Cameron
> <Jonathan.Cameron@huawei.com> wrote:
> >
> > On Thu, 14 Sep 2023 09:57:44 +0200
> > Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > > Hello James,
> > >
> > > On Wed, 13 Sept 2023 at 18:41, James Morse <james.morse@arm.com> wrote:
> > > >
> > > > Add the new flag field to the MADT's GICC structure.
> > > >
> > > > 'Online Capable' indicates a disabled CPU can be enabled later.
> > > >
> > >
> > > Why do we need a bit for this? What would be the point of describing
> > > disabled CPUs that cannot be enabled (and are you are aware of
> > > firmware doing this?).
> >
> > Enabled being not set is common at some similar ACPI tables at least.
> >
> > This is available in most ACPI tables to allow firmware to use 'nearly'
> > static tables and just tweak the 'enabled' bit to say if the record should
> > be ignored or not. Also _STA not present which is for same trick.
> > If you are doing clever dynamic tables, then you can just not present
> > the entry.
> >
> > With that existing use case in mind, need another bit to say this
> > one might one day turn up.  Note this is copied from x86 though no
> > one seems to have implemented the kernel support for them yet.
> >
> > Note as per my other reply - this isn't a code first proposal. It's in the
> > spec already (via a code first proposal last year I think).
> >
> > >
> > > So why are we not able to assume that this new bit can always be treated as '1'?
> >
> > Given above, need the extra bit to size stuff to allow for the CPU showing up
> > late.
> >
> 
> So does this mean that on x86, the CPU object is instantiated only
> when the hardware level hotplug occurs? And before that, the object
> does not exist at all?

That is correct but I am not sure if the presence of hardware Hotplug
on x86 is even true. It all hidden behind firmware magic (I think). So
x86 is able to use same infrastructure both for virtual and physical
CPU Hotplug.

From the ACPI 6.3 > x86 have started to use online-capable bit for local
x2apic in the MADT Table

https://lore.kernel.org/lkml/168016878002.404.5262105401164408214.tip-bot2@tip-bot2/
https://lore.kernel.org/lkml/168016878085.404.6003734700616193238.tip-bot2@tip-bot2/

But there is a subtle difference in the way it is being used on x86
and on the ARM platform right now.

On x86, during init, if the MADT entry for LAPIC is found to be
online-capable and is enabled as well then possible and present
cpumask gets set and a logical cpu-id is also allocated. If the
MADT entry is online-capable but not enabled then disabled cpus
are still counted but logical cpu-id is not allocated during
init time and in fact setting present mask bits are also
deferred till Hotplug happens later.

static int acpi_register_lapic(int id, u32 acpiid, u8 enabled)
{
      [...]  
	if (!enabled) {  /* Not ACPI_MADT_ENABLED */
		++disabled_cpus;
		return -EINVAL;
	}

      [...]  

	cpu = generic_processor_info(id, ver); /* logical cupid, present mask*/

      [...]
	return cpu;
}

acpi_parse_x2apic(union acpi_subtable_headers * header, const unsigned long end)
{
	struct acpi_madt_local_x2apic *processor = NULL;

	processor = (struct acpi_madt_local_x2apic *)header;

      [...] 

      enabled = processor->lapic_flags & ACPI_MADT_ENABLED;

      [...]

	/* don't register processors that cannot be onlined */
	if (!acpi_is_processor_usable(processor->lapic_flags))
		return 0;

      [...]

	acpi_register_lapic(apic_id, processor->uid, enabled);

	return 0;
}

On ARM, we similarly identify all MADT GICC entries which are
*usable* i.e. either are *ENABLED* or *online-capable*. But
Unlike x86, all cpus corresponding to usable MADT GICC entries
gets logical cpu-ds allocated and their present bit mask set
during boot itself. Hence, present mask is always equal to
the possible cpus mask on ARM.

https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model.html#gicc-cpu-interface-flags


For online-capable but *not* enabled CPUs we defer the
registration of the logical CPU-ids with the Linux Driver Model
till the time ACPI Hotplug event occurs. This means
register_cpu() is not called for the disabled CPUs during
init time. Hence, sysfs entries for the disabled CPUs
don’t exits.

But above creates bit of confusion to a x86 accustomed users
as on ARM with our solution, present CPUs are always equal to
possible CPUs. 

$ cat /sys/devices/system/cpu/possible
0-5
$ cat /sys/devices/system/cpu/present
0-5
$ cat /sys/devices/system/cpu/online
0-1
$ cat /sys/devices/system/cpu/offline
2-5

There is no way to know which CPUs have been hotplugged
using above interface. Hence, we have also a new mask
of enabled CPUs in the

$ cat /sys/devices/system/cpu/possible
0-5
$ cat /sys/devices/system/cpu/present
0-5
$ cat /sys/devices/system/cpu/enabled
0-2
$ cat /sys/devices/system/cpu/online
0-1
$ cat /sys/devices/system/cpu/offline
2-5

Qemu parameters: -smp cpu=3 maxcpus=6
Kernel parameter: maxcpus=2 


> 
> Because it seems to me that _STA, having both enabled and present
> bits, could already describe what we need here, and arguably, a CPU
> that is not both present and enabled should not be used by the OS.
> This would leave room for representing off-line CPUs as present but
> not enabled.

That is correct understanding.

For plugged cpus:
_STA.Present=1 and _STA.Enabled=1

For unplugged cpus:
_STA.Present=1 and _STA.Enabled=0

Hot(un)plugging is only allowed if during boot the GICC entries were
discovered as *online-capable*. GICC entries which are MADT GICC
enabled during boot cannot be hot-unplugged either.  

Catch:
If hot unplugging is to be supported for all cpus except the boot
then we MUST set all CPUs except boot CPUs as *online-capable*. 
This poses compatibility problems with the legacy OS running over
latest machines/platforms supporting Hotplug feature. OS might
ignore all the online-capable bits during boot time and hence only
1 CPU i.e. boot cpus might appear.

Hence, MADT.GICC.Enabled bits and MADT.GICC.online-capable need
Not be mutually exclusive. This requires more discussions!

You might find below useful:
https://kvm-forum.qemu.org/2023/talk/9SMPDQ/



> 
> Apologies if I am missing something obvious here - the whole rationale
> behind this thing is rather confusing to me.
Russell King (Oracle) Sept. 15, 2023, 7:09 a.m. UTC | #7
On Fri, Sep 15, 2023 at 02:29:13AM +0000, Salil Mehta wrote:
> On x86, during init, if the MADT entry for LAPIC is found to be
> online-capable and is enabled as well then possible and present

Note that the ACPI spec says enabled + online-capable isn't defined.

"The information conveyed by this bit depends on the value of the
Enabled bit. If the Enabled bit is set, this bit is reserved and
must be zero."

So, if x86 is doing something with the enabled && online-capable
state (other than ignoring the online-capable) then technically it
is doing something that the spec doesn't define - and it's
completely fine if aarch64 does something else (maybe treating it
strictly as per the spec and ignoring online-capable.)
Rafael J. Wysocki Sept. 15, 2023, 8:45 a.m. UTC | #8
On Fri, Sep 15, 2023 at 9:09 AM Russell King (Oracle)
<linux@armlinux.org.uk> wrote:
>
> On Fri, Sep 15, 2023 at 02:29:13AM +0000, Salil Mehta wrote:
> > On x86, during init, if the MADT entry for LAPIC is found to be
> > online-capable and is enabled as well then possible and present
>
> Note that the ACPI spec says enabled + online-capable isn't defined.
>
> "The information conveyed by this bit depends on the value of the
> Enabled bit. If the Enabled bit is set, this bit is reserved and
> must be zero."
>
> So, if x86 is doing something with the enabled && online-capable
> state (other than ignoring the online-capable) then technically it
> is doing something that the spec doesn't define

And so it is wrong.

> - and it's
> completely fine if aarch64 does something else (maybe treating it
> strictly as per the spec and ignoring online-capable.)

That actually is the only compliant thing that can be done.

As per the spec (quoted above), a platform firmware setting
online-capable to 1 when Enabled is set is not compliant and it is
invalid to treat this as meaningful data.

As currently defined, online-capable is only applicable to CPUs that
are not enabled to start with and its role is to make it clear whether
or not they can be enabled later AFAICS.

If there is a need to represent the case in which a CPI that is
enabled to start with can be disabled, but cannot be enabled again,
the spec needs to be updated.
Salil Mehta Sept. 15, 2023, 9:21 a.m. UTC | #9
Hi Russel,

> From: Russell King <linux@armlinux.org.uk>
> Sent: Friday, September 15, 2023 8:09 AM
> To: Salil Mehta <salil.mehta@huawei.com>
> Cc: Ard Biesheuvel <ardb@kernel.org>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; James Morse <james.morse@arm.com>; linux-
> pm@vger.kernel.org; loongarch@lists.linux.dev; linux-acpi@vger.kernel.org;
> linux-arch@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm-
> kernel@lists.infradead.org; linux-riscv@lists.infradead.org;
> kvmarm@lists.linux.dev; x86@kernel.org; Jean-Philippe Brucker <jean-
> philippe@linaro.org>; jianyong.wu@arm.com; justin.he@arm.com
> Subject: Re: [RFC PATCH v2 27/35] ACPICA: Add new MADT GICC flags fields
> [code first?]
> 
> On Fri, Sep 15, 2023 at 02:29:13AM +0000, Salil Mehta wrote:
> > On x86, during init, if the MADT entry for LAPIC is found to be
> > online-capable and is enabled as well then possible and present
> 
> Note that the ACPI spec says enabled + online-capable isn't defined.
> 
> "The information conveyed by this bit depends on the value of the
> Enabled bit. If the Enabled bit is set, this bit is reserved and
> must be zero."
> 
> So, if x86 is doing something with the enabled && online-capable
> state (other than ignoring the online-capable) then technically it
> is doing something that the spec doesn't define - and it's
> completely fine if aarch64 does something else (maybe treating it
> strictly as per the spec and ignoring online-capable.)


I would suggest that we should concentrate on what is actually
required. The fact of the matter is there is no need to keep
ACPI MADT.GICC.Enabled and ACPI MADT.GICC.online-capable bits
mutually exclusive. (please correct my understanding here
if I am wrong here)

It is a different matter that x86 has implemented above
requirement first for their x2APIC and spec are still not
reflecting what has been implemented as part of the code.
(I would add, for whatever reasons)

On ARM we have copied something from x86 ACPI Specification
which has not been updated yet. (why it is not updated? Maybe
x86 folks can clarify more on this?). Even on ARM, mutual
exclusiveness of the bits is not required. But does it breaks
anything on ARM to *not* have mutual exclusiveness. 
AFAICS, no, but ARM Arch guys can confirm this?)

If bits are *not* required to be mutually exclusive on either
platforms x86/ARM then, I think, it makes sense to update
ACPI specification for both of the platforms.


Thanks
Salil.
Salil Mehta Sept. 15, 2023, 9:34 a.m. UTC | #10
> From: Rafael J. Wysocki <rafael@kernel.org>
> Sent: Friday, September 15, 2023 9:45 AM
> To: Russell King (Oracle) <linux@armlinux.org.uk>
> Cc: Salil Mehta <salil.mehta@huawei.com>; Ard Biesheuvel <ardb@kernel.org>;
> Jonathan Cameron <jonathan.cameron@huawei.com>; James Morse
> <james.morse@arm.com>; linux-pm@vger.kernel.org; loongarch@lists.linux.dev;
> linux-acpi@vger.kernel.org; linux-arch@vger.kernel.org; linux-
> kernel@vger.kernel.org; linux-arm-kernel@lists.infradead.org; linux-
> riscv@lists.infradead.org; kvmarm@lists.linux.dev; x86@kernel.org; Jean-
> Philippe Brucker <jean-philippe@linaro.org>; jianyong.wu@arm.com;
> justin.he@arm.com
> Subject: Re: [RFC PATCH v2 27/35] ACPICA: Add new MADT GICC flags fields
> [code first?]
> 
> On Fri, Sep 15, 2023 at 9:09 AM Russell King (Oracle)
> <linux@armlinux.org.uk> wrote:
> >
> > On Fri, Sep 15, 2023 at 02:29:13AM +0000, Salil Mehta wrote:
> > > On x86, during init, if the MADT entry for LAPIC is found to be
> > > online-capable and is enabled as well then possible and present
> >
> > Note that the ACPI spec says enabled + online-capable isn't defined.
> >
> > "The information conveyed by this bit depends on the value of the
> > Enabled bit. If the Enabled bit is set, this bit is reserved and
> > must be zero."
> >
> > So, if x86 is doing something with the enabled && online-capable
> > state (other than ignoring the online-capable) then technically it
> > is doing something that the spec doesn't define
> 
> And so it is wrong.


Or maybe, specification has not been updated yet. code-first?


> 
> > - and it's
> > completely fine if aarch64 does something else (maybe treating it
> > strictly as per the spec and ignoring online-capable.)
> 
> That actually is the only compliant thing that can be done.

Yes, but the question is it what is required and does it solves
the problem of Hotplug. I think no. 

By complying with what is there in the spec means we have to
do the tradeoff between having not to support hot(un)plugging
of the cold-plugged CPUs Vs risk of breaking the legacy OS
attempting to use newer platforms with Hotplug support. Later
is more of a ARM problem as we are not allowed to tweak the
ACPI tables once the system has booted.


> 
> As per the spec (quoted above), a platform firmware setting
> online-capable to 1 when Enabled is set is not compliant and it is
> invalid to treat this as meaningful data.

Correct. but is it really what we need? We need both of the
Bits to be set for supporting hot(un)plugging of cold booted
CPUs.


> 
> As currently defined, online-capable is only applicable to CPUs that
> are not enabled to start with and its role is to make it clear whether
> or not they can be enabled later AFAICS.

Correct.

> 
> If there is a need to represent the case in which a CPI that is
> enabled to start with can be disabled, but cannot be enabled again,
> the spec needs to be updated.

Absolutely. And that’s what my humble suggestion is as well.


Thanks
Salil.
Rafael J. Wysocki Sept. 15, 2023, 10:21 a.m. UTC | #11
On Fri, Sep 15, 2023 at 11:34 AM Salil Mehta <salil.mehta@huawei.com> wrote:
>
>
> > From: Rafael J. Wysocki <rafael@kernel.org>
> > Sent: Friday, September 15, 2023 9:45 AM
> > To: Russell King (Oracle) <linux@armlinux.org.uk>
> > Cc: Salil Mehta <salil.mehta@huawei.com>; Ard Biesheuvel <ardb@kernel.org>;
> > Jonathan Cameron <jonathan.cameron@huawei.com>; James Morse
> > <james.morse@arm.com>; linux-pm@vger.kernel.org; loongarch@lists.linux.dev;
> > linux-acpi@vger.kernel.org; linux-arch@vger.kernel.org; linux-
> > kernel@vger.kernel.org; linux-arm-kernel@lists.infradead.org; linux-
> > riscv@lists.infradead.org; kvmarm@lists.linux.dev; x86@kernel.org; Jean-
> > Philippe Brucker <jean-philippe@linaro.org>; jianyong.wu@arm.com;
> > justin.he@arm.com
> > Subject: Re: [RFC PATCH v2 27/35] ACPICA: Add new MADT GICC flags fields
> > [code first?]
> >
> > On Fri, Sep 15, 2023 at 9:09 AM Russell King (Oracle)
> > <linux@armlinux.org.uk> wrote:
> > >
> > > On Fri, Sep 15, 2023 at 02:29:13AM +0000, Salil Mehta wrote:
> > > > On x86, during init, if the MADT entry for LAPIC is found to be
> > > > online-capable and is enabled as well then possible and present
> > >
> > > Note that the ACPI spec says enabled + online-capable isn't defined.
> > >
> > > "The information conveyed by this bit depends on the value of the
> > > Enabled bit. If the Enabled bit is set, this bit is reserved and
> > > must be zero."
> > >
> > > So, if x86 is doing something with the enabled && online-capable
> > > state (other than ignoring the online-capable) then technically it
> > > is doing something that the spec doesn't define
> >
> > And so it is wrong.
>
>
> Or maybe, specification has not been updated yet. code-first?

Well, if you are aware of any change requests related to this and
posted as code-first, please let me know.
Russell King (Oracle) Sept. 15, 2023, 1:43 p.m. UTC | #12
On Fri, Sep 15, 2023 at 09:34:46AM +0000, Salil Mehta wrote:
> > > Note that the ACPI spec says enabled + online-capable isn't defined.
> > >
> > > "The information conveyed by this bit depends on the value of the
> > > Enabled bit. If the Enabled bit is set, this bit is reserved and
> > > must be zero."
> > >
> > > So, if x86 is doing something with the enabled && online-capable
> > > state (other than ignoring the online-capable) then technically it
> > > is doing something that the spec doesn't define
> > 
> > And so it is wrong.
> 
> Or maybe, specification has not been updated yet. code-first?

What is the point in speculating. If you want to speculate about it,
fine, but please don't use speculation as a reason that "oh we need
to sort this out before we can merge the patches".

This is precisely why engineers are bad at producing products. They
like to continually tweak the design, and the design never gets out
the door. You need someone who is a project manager to tell engineers
when to stop. Without a project manager to do that, eventually the
project fades into insignificance because it becomes no longer relevant
or has its funding cut.

Hotplug VCPU on aarch64 feels exactly like that - it seems to be an
engineer project that is just going to for-ever rumble on and never
actually see the light of day.

So please - stop speculating and lets get vCPU hotplug *actually*
delivered and usable. Even if it's not 100% perfect.
Salil Mehta Sept. 15, 2023, 2:49 p.m. UTC | #13
> From: Rafael J. Wysocki <rafael@kernel.org>
> Sent: Friday, September 15, 2023 11:21 AM
> To: Salil Mehta <salil.mehta@huawei.com>
> Cc: Rafael J. Wysocki <rafael@kernel.org>; Russell King (Oracle)
> <linux@armlinux.org.uk>; Ard Biesheuvel <ardb@kernel.org>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; James Morse <james.morse@arm.com>; linux-
> pm@vger.kernel.org; loongarch@lists.linux.dev; linux-acpi@vger.kernel.org;
> linux-arch@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm-
> kernel@lists.infradead.org; linux-riscv@lists.infradead.org;
> kvmarm@lists.linux.dev; x86@kernel.org; Jean-Philippe Brucker <jean-
> philippe@linaro.org>; jianyong.wu@arm.com; justin.he@arm.com
> Subject: Re: [RFC PATCH v2 27/35] ACPICA: Add new MADT GICC flags fields
> [code first?]
> 
> On Fri, Sep 15, 2023 at 11:34 AM Salil Mehta <salil.mehta@huawei.com>
> wrote:
> >
> >
> > > From: Rafael J. Wysocki <rafael@kernel.org>
> > > Sent: Friday, September 15, 2023 9:45 AM
> > > To: Russell King (Oracle) <linux@armlinux.org.uk>
> > > Cc: Salil Mehta <salil.mehta@huawei.com>; Ard Biesheuvel <ardb@kernel.org>;
> > > Jonathan Cameron <jonathan.cameron@huawei.com>; James Morse 
> > > <james.morse@arm.com>; linux-pm@vger.kernel.org; loongarch@lists.linux.dev;
> > > linux-acpi@vger.kernel.org; linux-arch@vger.kernel.org; linux-
> > > kernel@vger.kernel.org; linux-arm-kernel@lists.infradead.org; linux-
> > > riscv@lists.infradead.org; kvmarm@lists.linux.dev; x86@kernel.org;
> Jean-
> > > Philippe Brucker <jean-philippe@linaro.org>; jianyong.wu@arm.com;
> > > justin.he@arm.com
> > > Subject: Re: [RFC PATCH v2 27/35] ACPICA: Add new MADT GICC flags
> fields
> > > [code first?]
> > >
> > > On Fri, Sep 15, 2023 at 9:09 AM Russell King (Oracle)
> > > <linux@armlinux.org.uk> wrote:
> > > >
> > > > On Fri, Sep 15, 2023 at 02:29:13AM +0000, Salil Mehta wrote:
> > > > > On x86, during init, if the MADT entry for LAPIC is found to be
> > > > > online-capable and is enabled as well then possible and present
> > > >
> > > > Note that the ACPI spec says enabled + online-capable isn't defined.
> > > >
> > > > "The information conveyed by this bit depends on the value of the
> > > > Enabled bit. If the Enabled bit is set, this bit is reserved and
> > > > must be zero."
> > > >
> > > > So, if x86 is doing something with the enabled && online-capable
> > > > state (other than ignoring the online-capable) then technically it
> > > > is doing something that the spec doesn't define
> > >
> > > And so it is wrong.
> >
> >
> > Or maybe, specification has not been updated yet. code-first?
> 
> Well, if you are aware of any change requests related to this and
> posted as code-first, please let me know.

I am not aware of any on x86. Maybe we can do it on ARM first and
let other Arch pitch-in their objection later? Afterall, there is
a legitimate use-case in case of ARM. Having mutually exclusive
bits breaks certain use-cases and we have to do the tradeoffs. 

This can be done in parallel while other patches are getting
reviewed and momentarily living with the tradeoffs till
specification is sorted. But of course it depends upon what
other stake holders and most importantly what ARM Arch people
think of it.

Thanks
Salil.
Russell King (Oracle) Sept. 15, 2023, 3:16 p.m. UTC | #14
On Fri, Sep 15, 2023 at 02:49:41PM +0000, Salil Mehta wrote:
> I am not aware of any on x86. Maybe we can do it on ARM first and
> let other Arch pitch-in their objection later? Afterall, there is
> a legitimate use-case in case of ARM. Having mutually exclusive
> bits breaks certain use-cases and we have to do the tradeoffs. 

... but let's not use that as an argument to delay the forward
progress of getting aarch64 vCPU hotplug patches merged.

If we want to later propose that Enabled=1 Online-Capable=1 means
that the CPU can be hot-unplugged, then that's something that can
be added to the spec later, and added to the kernel later. There
is no need to go through more iterations of patch sets to add this
feature before considering that aarch64 vCPU hotplug is ready to
be merged.

Like I said in my other email, it's time to stop this "well, if we
do this, then we can do that" cycle - stop playing games with what
can be done.

Delaying merging this code means not only does the maintenance
burden keep increasing (because more and more patches accumulate
which have to be constantly forward ported) but those who *want*
this feature are deprived for what, another year? two years?
decades? before it gets merged.

So please, stop dreaming up new features. Let's get aarch64 vCPU
hotplug that is compliant with the current ACPI spec, merged into
upstream. If we _then_ want to consider additional features, that's
the time to do it.

If you're not prepared to do that, do not be surprised if someone
else (such as myself) decides to fork James' work in order to get
it merged upstream - and yes, I _will_ do that if these games
carry on. I have already started to do that by proposing a patch
that is different from what James has to at least get some of
James' desired changes upstream - and I will continue doing that
all the time that (a) I see that there's a better way to address
something in James' patch and (b) I think in the longer term it
will reduce the maintenance burden of this patch set.

People are getting sick and tired of waiting for this feature.
Salil Mehta Sept. 15, 2023, 3:17 p.m. UTC | #15
Hi Russel,
Thanks for highlighting your concerns.

> From: Russell King <linux@armlinux.org.uk>
> Sent: Friday, September 15, 2023 2:43 PM
> To: Salil Mehta <salil.mehta@huawei.com>
> Cc: Rafael J. Wysocki <rafael@kernel.org>; Ard Biesheuvel
> <ardb@kernel.org>; Jonathan Cameron <jonathan.cameron@huawei.com>; James
> Morse <james.morse@arm.com>; linux-pm@vger.kernel.org;
> loongarch@lists.linux.dev; linux-acpi@vger.kernel.org; linux-
> arch@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm-
> kernel@lists.infradead.org; linux-riscv@lists.infradead.org;
> kvmarm@lists.linux.dev; x86@kernel.org; Jean-Philippe Brucker <jean-
> philippe@linaro.org>; jianyong.wu@arm.com; justin.he@arm.com
> Subject: Re: [RFC PATCH v2 27/35] ACPICA: Add new MADT GICC flags fields
> [code first?]
> 
> On Fri, Sep 15, 2023 at 09:34:46AM +0000, Salil Mehta wrote:
> > > > Note that the ACPI spec says enabled + online-capable isn't defined.
> > > >
> > > > "The information conveyed by this bit depends on the value of the
> > > > Enabled bit. If the Enabled bit is set, this bit is reserved and
> > > > must be zero."
> > > >
> > > > So, if x86 is doing something with the enabled && online-capable
> > > > state (other than ignoring the online-capable) then technically it
> > > > is doing something that the spec doesn't define
> > >
> > > And so it is wrong.
> >
> > Or maybe, specification has not been updated yet. code-first?
> 
> What is the point in speculating. If you want to speculate about it,
> fine, but please don't use speculation as a reason that "oh we need
> to sort this out before we can merge the patches".

[already replied in other thread but repeating it here]

Sorry, I am not aware but I was suggesting this. Can we have this
done for ARM first because there is a legitimate use-case. This
can be done in parallel while other patches are getting reviewed.
It would be great if they get accepted even in the current form.


> This is precisely why engineers are bad at producing products. They
> like to continually tweak the design, and the design never gets out
> the door. You need someone who is a project manager to tell engineers
> when to stop. Without a project manager to do that, eventually the
> project fades into insignificance because it becomes no longer relevant
> or has its funding cut.
> 
> Hotplug VCPU on aarch64 feels exactly like that - it seems to be an
> engineer project that is just going to for-ever rumble on and never
> actually see the light of day.


Sometimes things are not in single persons control. Yes, it is
frustrating, I do understand that.


> So please - stop speculating and lets get vCPU hotplug *actually*
> delivered and usable. Even if it's not 100% perfect.

We need to decide what is the criteria of acceptability and it can
vary across organizations. It depends upon internal requirements.
The issues what I pointed are,

1. Legacy OS will not boot on latest platform with hotplug support.
   - Try running older windows on ARM platform with hotplug support.
     - older windows will only see boot cpu with online-capable bit.
     - Will windows use _OSC to check compatibility?
   - We have verified this with older Linux and it only shows 1 CPU.
2. Hot(un)plug of cold-booted CPUs.  
   - Its use-case is subjective. Maybe you can throw light on this.

With current composition of bits both 1 & 2 cannot be supported
simultaneously. 

It is perfectly okay to live with them while clearly indicating
what we intend to support or are in process of supporting it.
But we do need an open discussion about how to proceed. This is
to avoid surprises later on.

BTW, I am just trying to make every one aware of the problems.

Many thanks!

Best regards
Salil.
Jonathan Cameron Sept. 15, 2023, 3:32 p.m. UTC | #16
On Fri, 15 Sep 2023 16:17:21 +0100
Salil Mehta <salil.mehta@huawei.com> wrote:

> Hi Russel,
> Thanks for highlighting your concerns.
> 
> > From: Russell King <linux@armlinux.org.uk>
> > Sent: Friday, September 15, 2023 2:43 PM
> > To: Salil Mehta <salil.mehta@huawei.com>
> > Cc: Rafael J. Wysocki <rafael@kernel.org>; Ard Biesheuvel
> > <ardb@kernel.org>; Jonathan Cameron <jonathan.cameron@huawei.com>; James
> > Morse <james.morse@arm.com>; linux-pm@vger.kernel.org;
> > loongarch@lists.linux.dev; linux-acpi@vger.kernel.org; linux-
> > arch@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm-
> > kernel@lists.infradead.org; linux-riscv@lists.infradead.org;
> > kvmarm@lists.linux.dev; x86@kernel.org; Jean-Philippe Brucker <jean-  
> > philippe@linaro.org>; jianyong.wu@arm.com; justin.he@arm.com  
> > Subject: Re: [RFC PATCH v2 27/35] ACPICA: Add new MADT GICC flags fields
> > [code first?]
> > 
> > On Fri, Sep 15, 2023 at 09:34:46AM +0000, Salil Mehta wrote:  
> > > > > Note that the ACPI spec says enabled + online-capable isn't defined.
> > > > >
> > > > > "The information conveyed by this bit depends on the value of the
> > > > > Enabled bit. If the Enabled bit is set, this bit is reserved and
> > > > > must be zero."
> > > > >
> > > > > So, if x86 is doing something with the enabled && online-capable
> > > > > state (other than ignoring the online-capable) then technically it
> > > > > is doing something that the spec doesn't define  
> > > >
> > > > And so it is wrong.  
> > >
> > > Or maybe, specification has not been updated yet. code-first?  
> > 
> > What is the point in speculating. If you want to speculate about it,
> > fine, but please don't use speculation as a reason that "oh we need
> > to sort this out before we can merge the patches".  
> 
> [already replied in other thread but repeating it here]
> 
> Sorry, I am not aware but I was suggesting this. Can we have this
> done for ARM first because there is a legitimate use-case. This
> can be done in parallel while other patches are getting reviewed.
> It would be great if they get accepted even in the current form.
> 
> 
> > This is precisely why engineers are bad at producing products. They
> > like to continually tweak the design, and the design never gets out
> > the door. You need someone who is a project manager to tell engineers
> > when to stop. Without a project manager to do that, eventually the
> > project fades into insignificance because it becomes no longer relevant
> > or has its funding cut.
> > 
> > Hotplug VCPU on aarch64 feels exactly like that - it seems to be an
> > engineer project that is just going to for-ever rumble on and never
> > actually see the light of day.  
> 
> 
> Sometimes things are not in single persons control. Yes, it is
> frustrating, I do understand that.
> 
> 
> > So please - stop speculating and lets get vCPU hotplug *actually*
> > delivered and usable. Even if it's not 100% perfect.  
> 
> We need to decide what is the criteria of acceptability and it can
> vary across organizations. It depends upon internal requirements.
> The issues what I pointed are,
> 
> 1. Legacy OS will not boot on latest platform with hotplug support.
>    - Try running older windows on ARM platform with hotplug support.
>      - older windows will only see boot cpu with online-capable bit.
>      - Will windows use _OSC to check compatibility?
>    - We have verified this with older Linux and it only shows 1 CPU.
> 2. Hot(un)plug of cold-booted CPUs.  
>    - Its use-case is subjective. Maybe you can throw light on this.
> 
> With current composition of bits both 1 & 2 cannot be supported
> simultaneously. 
> 
> It is perfectly okay to live with them while clearly indicating
> what we intend to support or are in process of supporting it.
> But we do need an open discussion about how to proceed. This is
> to avoid surprises later on.
> 
> BTW, I am just trying to make every one aware of the problems.

Step 1 - just allow growing (and shrinking back to initial
enabled cpus).  That is fine with current specification and legacy
OS. We only assume CPUs that are hotplugged can later be removed.
That covers most use cases.

So what effectively what Russell said. Enable what we can with
the specifications as they stand before getting distracted by
modifying them (again).

Jonathan

> 
> Many thanks!
> 
> Best regards
> Salil.
> 
> 
>
Russell King (Oracle) Sept. 15, 2023, 3:41 p.m. UTC | #17
On Fri, Sep 15, 2023 at 03:17:21PM +0000, Salil Mehta wrote:
> Hi Russel,
> Thanks for highlighting your concerns.
> 
> > From: Russell King <linux@armlinux.org.uk>
> > Sent: Friday, September 15, 2023 2:43 PM
> > To: Salil Mehta <salil.mehta@huawei.com>
> > Cc: Rafael J. Wysocki <rafael@kernel.org>; Ard Biesheuvel
> > <ardb@kernel.org>; Jonathan Cameron <jonathan.cameron@huawei.com>; James
> > Morse <james.morse@arm.com>; linux-pm@vger.kernel.org;
> > loongarch@lists.linux.dev; linux-acpi@vger.kernel.org; linux-
> > arch@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm-
> > kernel@lists.infradead.org; linux-riscv@lists.infradead.org;
> > kvmarm@lists.linux.dev; x86@kernel.org; Jean-Philippe Brucker <jean-
> > philippe@linaro.org>; jianyong.wu@arm.com; justin.he@arm.com
> > Subject: Re: [RFC PATCH v2 27/35] ACPICA: Add new MADT GICC flags fields
> > [code first?]
> > 
> > On Fri, Sep 15, 2023 at 09:34:46AM +0000, Salil Mehta wrote:
> > > > > Note that the ACPI spec says enabled + online-capable isn't defined.
> > > > >
> > > > > "The information conveyed by this bit depends on the value of the
> > > > > Enabled bit. If the Enabled bit is set, this bit is reserved and
> > > > > must be zero."
> > > > >
> > > > > So, if x86 is doing something with the enabled && online-capable
> > > > > state (other than ignoring the online-capable) then technically it
> > > > > is doing something that the spec doesn't define
> > > >
> > > > And so it is wrong.
> > >
> > > Or maybe, specification has not been updated yet. code-first?
> > 
> > What is the point in speculating. If you want to speculate about it,
> > fine, but please don't use speculation as a reason that "oh we need
> > to sort this out before we can merge the patches".
> 
> [already replied in other thread but repeating it here]
> 
> Sorry, I am not aware but I was suggesting this. Can we have this
> done for ARM first because there is a legitimate use-case. This
> can be done in parallel while other patches are getting reviewed.
> It would be great if they get accepted even in the current form.
> 
> 
> > This is precisely why engineers are bad at producing products. They
> > like to continually tweak the design, and the design never gets out
> > the door. You need someone who is a project manager to tell engineers
> > when to stop. Without a project manager to do that, eventually the
> > project fades into insignificance because it becomes no longer relevant
> > or has its funding cut.
> > 
> > Hotplug VCPU on aarch64 feels exactly like that - it seems to be an
> > engineer project that is just going to for-ever rumble on and never
> > actually see the light of day.
> 
> 
> Sometimes things are not in single persons control. Yes, it is
> frustrating, I do understand that.
> 
> 
> > So please - stop speculating and lets get vCPU hotplug *actually*
> > delivered and usable. Even if it's not 100% perfect.
> 
> We need to decide what is the criteria of acceptability and it can
> vary across organizations. It depends upon internal requirements.
> The issues what I pointed are,
> 
> 1. Legacy OS will not boot on latest platform with hotplug support.
>    - Try running older windows on ARM platform with hotplug support.
>      - older windows will only see boot cpu with online-capable bit.
>      - Will windows use _OSC to check compatibility?
>    - We have verified this with older Linux and it only shows 1 CPU.
> 2. Hot(un)plug of cold-booted CPUs.  
>    - Its use-case is subjective. Maybe you can throw light on this.
> 
> With current composition of bits both 1 & 2 cannot be supported
> simultaneously. 
> 
> It is perfectly okay to live with them while clearly indicating
> what we intend to support or are in process of supporting it.
> But we do need an open discussion about how to proceed. This is
> to avoid surprises later on.
> 
> BTW, I am just trying to make every one aware of the problems.

Please do it as a separate discussion then - rather than starting a
thread in response to a posting of patches which are _supposed_ to
be being reviewed.

Bringing up issues which are in effect future enhancements without
explicitly stating that they are future enhancements makes it look like
the patch set isn't ready to be merged - and is a distraction to trying
to get the series merged.
Salil Mehta Sept. 15, 2023, 4:46 p.m. UTC | #18
Hi Russel,

> From: Russell King <linux@armlinux.org.uk>
> Sent: Friday, September 15, 2023 4:16 PM
> To: Salil Mehta <salil.mehta@huawei.com>
> Cc: Rafael J. Wysocki <rafael@kernel.org>; Ard Biesheuvel
> <ardb@kernel.org>; Jonathan Cameron <jonathan.cameron@huawei.com>; James
> Morse <james.morse@arm.com>; linux-pm@vger.kernel.org;
> loongarch@lists.linux.dev; linux-acpi@vger.kernel.org; linux-
> arch@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm-
> kernel@lists.infradead.org; linux-riscv@lists.infradead.org;
> kvmarm@lists.linux.dev; x86@kernel.org; Jean-Philippe Brucker <jean-
> philippe@linaro.org>; jianyong.wu@arm.com; justin.he@arm.com
> Subject: Re: [RFC PATCH v2 27/35] ACPICA: Add new MADT GICC flags fields
> [code first?]
> 
> On Fri, Sep 15, 2023 at 02:49:41PM +0000, Salil Mehta wrote:
> > I am not aware of any on x86. Maybe we can do it on ARM first and
> > let other Arch pitch-in their objection later? Afterall, there is
> > a legitimate use-case in case of ARM. Having mutually exclusive
> > bits breaks certain use-cases and we have to do the tradeoffs.
> 
> ... but let's not use that as an argument to delay the forward
> progress of getting aarch64 vCPU hotplug patches merged.


Why would anybody do that? We have been working with ARM for almost
3 years to get to the current point where we have overcome most of
the architecture issues and have made this feature viable at the
first place. It is totally out of wits that anyone of us would
want to delay its acceptance.


> 
> If we want to later propose that Enabled=1 Online-Capable=1 means
> that the CPU can be hot-unplugged, then that's something that can
> be added to the spec later, and added to the kernel later. There
> is no need to go through more iterations of patch sets to add this
> feature before considering that aarch64 vCPU hotplug is ready to
> be merged.

Absolutely but again these two things can be done in parallel.
And whether patch-set is ready to get accepted is up to the
Maintainers to decide and other community members as well.
Yourself, James, I and others have been making efforts in this
direction already.

But I understand your concern that maybe current discussion might
create a bit of a distraction and can be held.

> 
> Like I said in my other email, it's time to stop this "well, if we
> do this, then we can do that" cycle - stop playing games with what
> can be done.

Don't know which cyclic games are being referred here - really!

I will leave it up to James to answer that.

> Delaying merging this code means not only does the maintenance
> burden keep increasing (because more and more patches accumulate
> which have to be constantly forward ported) but those who *want*
> this feature are deprived for what, another year? two years?
> decades? before it gets merged.

It is good to know that there are customers waiting for this
feature at your side as well. Let us hope this can get accepted
quickly.

> So please, stop dreaming up new features. Let's get aarch64 vCPU
> hotplug that is compliant with the current ACPI spec, merged into
> upstream. If we _then_ want to consider additional features, that's
> the time to do it.


That's what I suggested earlier as well but the discussions for the
problem cannot be ignored.


> If you're not prepared to do that, do not be surprised if someone
> else (such as myself) decides to fork James' work in order to get
> it merged upstream - and yes, I _will_ do that if these games
> carry on. I have already started to do that by proposing a patch
> that is different from what James has to at least get some of
> James' desired changes upstream - and I will continue doing that
> all the time that (a) I see that there's a better way to address
> something in James' patch and (b) I think in the longer term it
> will reduce the maintenance burden of this patch set.

Are you changing the approach of the kernel?


Thanks
Salil.
Salil Mehta Sept. 15, 2023, 5:07 p.m. UTC | #19
Hi Russel,

> From: Russell King <linux@armlinux.org.uk>
> Sent: Friday, September 15, 2023 4:41 PM
> To: Salil Mehta <salil.mehta@huawei.com>
> Cc: Rafael J. Wysocki <rafael@kernel.org>; Ard Biesheuvel
> <ardb@kernel.org>; Jonathan Cameron <jonathan.cameron@huawei.com>; James
> Morse <james.morse@arm.com>; linux-pm@vger.kernel.org;
> loongarch@lists.linux.dev; linux-acpi@vger.kernel.org; linux-
> arch@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm-
> kernel@lists.infradead.org; linux-riscv@lists.infradead.org;
> kvmarm@lists.linux.dev; x86@kernel.org; Jean-Philippe Brucker <jean-
> philippe@linaro.org>; jianyong.wu@arm.com; justin.he@arm.com
> Subject: Re: [RFC PATCH v2 27/35] ACPICA: Add new MADT GICC flags fields
> [code first?]
> 
> On Fri, Sep 15, 2023 at 03:17:21PM +0000, Salil Mehta wrote:
> > Hi Russel,
> > Thanks for highlighting your concerns.
> >
> > > From: Russell King <linux@armlinux.org.uk>
> > > Sent: Friday, September 15, 2023 2:43 PM
> > > To: Salil Mehta <salil.mehta@huawei.com>
> > > Cc: Rafael J. Wysocki <rafael@kernel.org>; Ard Biesheuvel
> > > <ardb@kernel.org>; Jonathan Cameron <jonathan.cameron@huawei.com>; James
> > > Morse <james.morse@arm.com>; linux-pm@vger.kernel.org;
> > > loongarch@lists.linux.dev; linux-acpi@vger.kernel.org; linux-
> > > arch@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm-
> > > kernel@lists.infradead.org; linux-riscv@lists.infradead.org;
> > > kvmarm@lists.linux.dev; x86@kernel.org; Jean-Philippe Brucker <jean-
> > > philippe@linaro.org>; jianyong.wu@arm.com; justin.he@arm.com
> > > Subject: Re: [RFC PATCH v2 27/35] ACPICA: Add new MADT GICC flags fields
> > > [code first?]
> > >
> > > On Fri, Sep 15, 2023 at 09:34:46AM +0000, Salil Mehta wrote:
> > > > > > Note that the ACPI spec says enabled + online-capable isn't defined.
> > > > > >
> > > > > > "The information conveyed by this bit depends on the value of the
> > > > > > Enabled bit. If the Enabled bit is set, this bit is reserved and
> > > > > > must be zero."
> > > > > >
> > > > > > So, if x86 is doing something with the enabled && online-capable
> > > > > > state (other than ignoring the online-capable) then technically it
> > > > > > is doing something that the spec doesn't define
> > > > >
> > > > > And so it is wrong.
> > > >
> > > > Or maybe, specification has not been updated yet. code-first?
> > >
> > > What is the point in speculating. If you want to speculate about it,
> > > fine, but please don't use speculation as a reason that "oh we need
> > > to sort this out before we can merge the patches".
> >
> > [already replied in other thread but repeating it here]
> >
> > Sorry, I am not aware but I was suggesting this. Can we have this
> > done for ARM first because there is a legitimate use-case. This
> > can be done in parallel while other patches are getting reviewed.
> > It would be great if they get accepted even in the current form.
> >
> >
> > > This is precisely why engineers are bad at producing products. They
> > > like to continually tweak the design, and the design never gets out
> > > the door. You need someone who is a project manager to tell engineers
> > > when to stop. Without a project manager to do that, eventually the
> > > project fades into insignificance because it becomes no longer relevant
> > > or has its funding cut.
> > >
> > > Hotplug VCPU on aarch64 feels exactly like that - it seems to be an
> > > engineer project that is just going to for-ever rumble on and never
> > > actually see the light of day.
> >
> >
> > Sometimes things are not in single persons control. Yes, it is
> > frustrating, I do understand that.
> >
> >
> > > So please - stop speculating and lets get vCPU hotplug *actually*
> > > delivered and usable. Even if it's not 100% perfect.
> >
> > We need to decide what is the criteria of acceptability and it can
> > vary across organizations. It depends upon internal requirements.
> > The issues what I pointed are,
> >
> > 1. Legacy OS will not boot on latest platform with hotplug support.
> >    - Try running older windows on ARM platform with hotplug support.
> >      - older windows will only see boot cpu with online-capable bit.
> >      - Will windows use _OSC to check compatibility?
> >    - We have verified this with older Linux and it only shows 1 CPU.
> > 2. Hot(un)plug of cold-booted CPUs.
> >    - Its use-case is subjective. Maybe you can throw light on this.
> >
> > With current composition of bits both 1 & 2 cannot be supported
> > simultaneously.
> >
> > It is perfectly okay to live with them while clearly indicating
> > what we intend to support or are in process of supporting it.
> > But we do need an open discussion about how to proceed. This is
> > to avoid surprises later on.
> >
> > BTW, I am just trying to make every one aware of the problems.
> 
> Please do it as a separate discussion then - rather than starting a
> thread in response to a posting of patches which are _supposed_ to
> be being reviewed.

Yes, we can discuss it as part of separate thread.

> Bringing up issues which are in effect future enhancements without
> explicitly stating that they are future enhancements makes it look like
> the patch set isn't ready to be merged - and is a distraction to trying
> to get the series merged.

I beg to disagree on this as these are not enhancements/features
but problems. But yes, we can sort these out in a step wise fashion
subsequently even after patches have been accepted. Totally agree
that this can cause distraction so let us defer it for a moment.

The original purpose was to highlight them here briefly, which 
has been achieved!

Thanks
Salil.
Salil Mehta Sept. 15, 2023, 5:12 p.m. UTC | #20
Hi Jonathan,

> From: Jonathan Cameron <jonathan.cameron@huawei.com>
> Sent: Friday, September 15, 2023 4:33 PM
> To: Salil Mehta <salil.mehta@huawei.com>
> Cc: Russell King <linux@armlinux.org.uk>; Rafael J. Wysocki
> <rafael@kernel.org>; Ard Biesheuvel <ardb@kernel.org>; James Morse
> <james.morse@arm.com>; linux-pm@vger.kernel.org; loongarch@lists.linux.dev;
> linux-acpi@vger.kernel.org; linux-arch@vger.kernel.org; linux-
> kernel@vger.kernel.org; linux-arm-kernel@lists.infradead.org; linux-
> riscv@lists.infradead.org; kvmarm@lists.linux.dev; x86@kernel.org; Jean-
> Philippe Brucker <jean-philippe@linaro.org>; jianyong.wu@arm.com;
> justin.he@arm.com
> Subject: Re: [RFC PATCH v2 27/35] ACPICA: Add new MADT GICC flags fields
> [code first?]
> 
> On Fri, 15 Sep 2023 16:17:21 +0100
> Salil Mehta <salil.mehta@huawei.com> wrote:
> 
> > Hi Russel,
> > Thanks for highlighting your concerns.
> >
> > > From: Russell King <linux@armlinux.org.uk>
> > > Sent: Friday, September 15, 2023 2:43 PM
> > > To: Salil Mehta <salil.mehta@huawei.com>
> > > Cc: Rafael J. Wysocki <rafael@kernel.org>; Ard Biesheuvel
> > > <ardb@kernel.org>; Jonathan Cameron <jonathan.cameron@huawei.com>; James
> > > Morse <james.morse@arm.com>; linux-pm@vger.kernel.org;
> > > loongarch@lists.linux.dev; linux-acpi@vger.kernel.org; linux-
> > > arch@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm-
> > > kernel@lists.infradead.org; linux-riscv@lists.infradead.org;
> > > kvmarm@lists.linux.dev; x86@kernel.org; Jean-Philippe Brucker <jean-
> > > philippe@linaro.org>; jianyong.wu@arm.com; justin.he@arm.com
> > > Subject: Re: [RFC PATCH v2 27/35] ACPICA: Add new MADT GICC flags fields
> > > [code first?]
> > >
> > > On Fri, Sep 15, 2023 at 09:34:46AM +0000, Salil Mehta wrote:
> > > > > > Note that the ACPI spec says enabled + online-capable isn't defined.
> > > > > >
> > > > > > "The information conveyed by this bit depends on the value of the
> > > > > > Enabled bit. If the Enabled bit is set, this bit is reserved and
> > > > > > must be zero."
> > > > > >
> > > > > > So, if x86 is doing something with the enabled && online-capable
> > > > > > state (other than ignoring the online-capable) then technically it
> > > > > > is doing something that the spec doesn't define
> > > > >
> > > > > And so it is wrong.
> > > >
> > > > Or maybe, specification has not been updated yet. code-first?
> > >
> > > What is the point in speculating. If you want to speculate about it,
> > > fine, but please don't use speculation as a reason that "oh we need
> > > to sort this out before we can merge the patches".
> >
> > [already replied in other thread but repeating it here]
> >
> > Sorry, I am not aware but I was suggesting this. Can we have this
> > done for ARM first because there is a legitimate use-case. This
> > can be done in parallel while other patches are getting reviewed.
> > It would be great if they get accepted even in the current form.
> >
> >
> > > This is precisely why engineers are bad at producing products. They
> > > like to continually tweak the design, and the design never gets out
> > > the door. You need someone who is a project manager to tell engineers
> > > when to stop. Without a project manager to do that, eventually the
> > > project fades into insignificance because it becomes no longer relevant
> > > or has its funding cut.
> > >
> > > Hotplug VCPU on aarch64 feels exactly like that - it seems to be an
> > > engineer project that is just going to for-ever rumble on and never
> > > actually see the light of day.
> >
> >
> > Sometimes things are not in single persons control. Yes, it is
> > frustrating, I do understand that.
> >
> >
> > > So please - stop speculating and lets get vCPU hotplug *actually*
> > > delivered and usable. Even if it's not 100% perfect.
> >
> > We need to decide what is the criteria of acceptability and it can
> > vary across organizations. It depends upon internal requirements.
> > The issues what I pointed are,
> >
> > 1. Legacy OS will not boot on latest platform with hotplug support.
> >    - Try running older windows on ARM platform with hotplug support.
> >      - older windows will only see boot cpu with online-capable bit.
> >      - Will windows use _OSC to check compatibility?
> >    - We have verified this with older Linux and it only shows 1 CPU.
> > 2. Hot(un)plug of cold-booted CPUs.
> >    - Its use-case is subjective. Maybe you can throw light on this.
> >
> > With current composition of bits both 1 & 2 cannot be supported
> > simultaneously.
> >
> > It is perfectly okay to live with them while clearly indicating
> > what we intend to support or are in process of supporting it.
> > But we do need an open discussion about how to proceed. This is
> > to avoid surprises later on.
> >
> > BTW, I am just trying to make every one aware of the problems.
> 
> Step 1 - just allow growing (and shrinking back to initial
> enabled cpus).  That is fine with current specification and legacy
> OS. We only assume CPUs that are hotplugged can later be removed.
> That covers most use cases.


Yes, we can do that for a moment (at least in qemu) and then
not allow unplugging vCPUs which were cold plugged or allow
it as a debugging feature but splash a warning.


> So what effectively what Russell said. Enable what we can with
> the specifications as they stand before getting distracted by
> modifying them (again).


Yes, agreed. Idea was to clearly highlight them. These can be
discussed as part of separate thread in parallel - absolutely!


Thanks
Salil.
diff mbox series

Patch

diff --git a/include/acpi/actbl2.h b/include/acpi/actbl2.h
index 3751ae69432f..c433a079d8e1 100644
--- a/include/acpi/actbl2.h
+++ b/include/acpi/actbl2.h
@@ -1046,6 +1046,7 @@  struct acpi_madt_generic_interrupt {
 /* ACPI_MADT_ENABLED                    (1)      Processor is usable if set */
 #define ACPI_MADT_PERFORMANCE_IRQ_MODE  (1<<1)	/* 01: Performance Interrupt Mode */
 #define ACPI_MADT_VGIC_IRQ_MODE         (1<<2)	/* 02: VGIC Maintenance Interrupt mode */
+#define ACPI_MADT_GICC_CPU_CAPABLE      (1<<3)	/* 03: CPU is online capable */
 
 /* 12: Generic Distributor (ACPI 5.0 + ACPI 6.0 changes) */