KVM: arm/arm64: Fix arch timers with userspace irqchips

Message ID	20180130124609.15076-1-christoffer.dall@linaro.org
State	Superseded
Headers	show Delivered-To: patch@linaro.org Received-SPF: pass (google.com: best guess record for domain of stable-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; From: Christoffer Dall <christoffer.dall@linaro.org> To: linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu Cc: kvm@vger.kernel.org, Marc Zyngier <marc.zyngier@arm.com>, Christoffer Dall <christoffer.dall@linaro.org>, Alexander Graf <agraf@suse.de>, stable@vger.kernel.org Subject: [PATCH] KVM: arm/arm64: Fix arch timers with userspace irqchips Date: Tue, 30 Jan 2018 13:46:09 +0100 Message-Id: <20180130124609.15076-1-christoffer.dall@linaro.org> Sender: stable-owner@vger.kernel.org Precedence: bulk
Series	KVM: arm/arm64: Fix arch timers with userspace irqchips \| expand KVM: arm/arm64: Fix arch timers with userspace irqchips

Christoffer Dall Jan. 30, 2018, 12:46 p.m. UTC

When introducing support for irqchip in userspace we needed a way to
mask the timer signal to prevent the guest continuously exiting due to a
screaming timer.

We did this by disabling the corresponding percpu interrupt on the
host interrupt controller, because we cannot rely on the host system
having a GIC, and therefore cannot make any assumptions about having an
active state to hide the timer signal.

Unfortunately, when introducing this feature, it became entirely
possible that a VCPU which belongs to a VM that has a userspace irqchip
can disable the vtimer irq on the host on some physical CPU, and then go
away without ever enabling the vimter irq on that physical CPU again.

This means that using irqchips in userspace on a system that also
supports running VMs with an in-kernel GIC can prevent forward progress
from in-kernel GIC VMs.

Later on, when we started taking virtual timer interrupts in the arch
timer code, we would also leave this timer state active for userspace
irqchip VMs, because we leave it up to a VGIC-enabled guest to
deactivate the hardware IRQ using the HW bit in the LR.

Both issues are solved by only using the enable/disable trick on systems
that do not have a host GIC which supports the active state, because all
VMs on such systems must use irqchips in userspace.  Systems that have a
working GIC with support for an active state use the active state to
mask the timer signal for both userspace an in-kernel irqchips.

Cc: Alexander Graf <agraf@suse.de>
Cc: <stable@vger.kernel.org> # v4.12+
Fixes: d9e139778376 ("KVM: arm/arm64: Support arch timers with a userspace gic")
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

---
This conflicts horribly with everything when applied to either
kvmarm/queue or kvmarm/master.  Therefore, this patch is written for
(and applies to) v4.15 with kvmarm/queue merged and should therefore
apply cleanly after v4.16-rc1.  An example with this patch applied can
be found on kvmarm/temp-for-v4.16-rc2.  I plan on sending this along
with any other potential fixes post v4.16-rc1.

 virt/kvm/arm/arch_timer.c | 77 ++++++++++++++++++++++++++---------------------
 1 file changed, 42 insertions(+), 35 deletions(-)

-- 
2.14.2

Marc Zyngier Jan. 31, 2018, 9:37 a.m. UTC | #1

On 30/01/18 12:46, Christoffer Dall wrote:
> When introducing support for irqchip in userspace we needed a way to

> mask the timer signal to prevent the guest continuously exiting due to a

> screaming timer.

> 

> We did this by disabling the corresponding percpu interrupt on the

> host interrupt controller, because we cannot rely on the host system

> having a GIC, and therefore cannot make any assumptions about having an

> active state to hide the timer signal.

> 

> Unfortunately, when introducing this feature, it became entirely

> possible that a VCPU which belongs to a VM that has a userspace irqchip

> can disable the vtimer irq on the host on some physical CPU, and then go

> away without ever enabling the vimter irq on that physical CPU again.


vtimer

> 

> This means that using irqchips in userspace on a system that also

> supports running VMs with an in-kernel GIC can prevent forward progress

> from in-kernel GIC VMs.

> 

> Later on, when we started taking virtual timer interrupts in the arch

> timer code, we would also leave this timer state active for userspace

> irqchip VMs, because we leave it up to a VGIC-enabled guest to

> deactivate the hardware IRQ using the HW bit in the LR.

> 

> Both issues are solved by only using the enable/disable trick on systems

> that do not have a host GIC which supports the active state, because all

> VMs on such systems must use irqchips in userspace.  Systems that have a

> working GIC with support for an active state use the active state to

> mask the timer signal for both userspace an in-kernel irqchips.

> 

> Cc: Alexander Graf <agraf@suse.de>

> Cc: <stable@vger.kernel.org> # v4.12+

> Fixes: d9e139778376 ("KVM: arm/arm64: Support arch timers with a userspace gic")

> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

> ---

> This conflicts horribly with everything when applied to either

> kvmarm/queue or kvmarm/master.  Therefore, this patch is written for

> (and applies to) v4.15 with kvmarm/queue merged and should therefore

> apply cleanly after v4.16-rc1.  An example with this patch applied can

> be found on kvmarm/temp-for-v4.16-rc2.  I plan on sending this along

> with any other potential fixes post v4.16-rc1.

> 

>  virt/kvm/arm/arch_timer.c | 77 ++++++++++++++++++++++++++---------------------

>  1 file changed, 42 insertions(+), 35 deletions(-)

> 

> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c

> index 70268c0bec79..228906ceb722 100644

> --- a/virt/kvm/arm/arch_timer.c

> +++ b/virt/kvm/arm/arch_timer.c

> @@ -35,6 +35,7 @@

>  static struct timecounter *timecounter;

>  static unsigned int host_vtimer_irq;

>  static u32 host_vtimer_irq_flags;

> +static bool has_gic_active_state;

>  

>  static const struct kvm_irq_level default_ptimer_irq = {

>  	.irq	= 30,

> @@ -69,25 +70,6 @@ static void soft_timer_cancel(struct hrtimer *hrt, struct work_struct *work)

>  		cancel_work_sync(work);

>  }

>  

> -static void kvm_vtimer_update_mask_user(struct kvm_vcpu *vcpu)

> -{

> -	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);

> -

> -	/*

> -	 * When using a userspace irqchip with the architected timers, we must

> -	 * prevent continuously exiting from the guest, and therefore mask the

> -	 * physical interrupt by disabling it on the host interrupt controller

> -	 * when the virtual level is high, such that the guest can make

> -	 * forward progress.  Once we detect the output level being

> -	 * de-asserted, we unmask the interrupt again so that we exit from the

> -	 * guest when the timer fires.

> -	 */

> -	if (vtimer->irq.level)

> -		disable_percpu_irq(host_vtimer_irq);

> -	else

> -		enable_percpu_irq(host_vtimer_irq, 0);

> -}

> -

>  static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)

>  {

>  	struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)dev_id;

> @@ -107,8 +89,8 @@ static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)

>  		kvm_timer_update_irq(vcpu, true, vtimer);

>  

>  	if (static_branch_unlikely(&userspace_irqchip_in_use) &&

> -	    unlikely(!irqchip_in_kernel(vcpu->kvm)))

> -		kvm_vtimer_update_mask_user(vcpu);

> +	    unlikely(!irqchip_in_kernel(vcpu->kvm)) && !has_gic_active_state)

> +		disable_percpu_irq(host_vtimer_irq);


nit: this is the negated version of the fix you posted earlier
(0e23cb34af26 in kvmarm/queue), plus a new term... How about rewriting
this as:

static inline bool userspace_irqchip(struct kvm *kvm)
{
	return (static_branch_unlikely(&userspace_irqchip_in_use) &&
		unlikely(!irqchip_in_kernel(kvm)));
}

and this becomes:

	if (userspace_irqchip(vcpu->kvm) && !has_gic_active_state)
		[...]

which I find much more readable.

Same for kvm_timer_update_irq():

	if (!userspace_irqchip(vcpu->kvm))
		[...]

>  

>  	return IRQ_HANDLED;

>  }

> @@ -460,13 +442,16 @@ static void set_cntvoff(u64 cntvoff)

>  	kvm_call_hyp(__kvm_timer_set_cntvoff, low, high);

>  }

>  

> -static void kvm_timer_vcpu_load_vgic(struct kvm_vcpu *vcpu)

> +static void kvm_timer_vcpu_load_gic(struct kvm_vcpu *vcpu)

>  {

>  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);

>  	bool phys_active;

>  	int ret;

>  

> -	phys_active = kvm_vgic_map_is_active(vcpu, vtimer->irq.irq);

> +	if (irqchip_in_kernel(vcpu->kvm))

> +		phys_active = kvm_vgic_map_is_active(vcpu, vtimer->irq.irq);

> +	else

> +		phys_active = vtimer->irq.level;


You could use the above helper here too.

>  

>  	ret = irq_set_irqchip_state(host_vtimer_irq,

>  				    IRQCHIP_STATE_ACTIVE,

> @@ -474,9 +459,24 @@ static void kvm_timer_vcpu_load_vgic(struct kvm_vcpu *vcpu)

>  	WARN_ON(ret);

>  }

>  

> -static void kvm_timer_vcpu_load_user(struct kvm_vcpu *vcpu)

> +static void kvm_timer_vcpu_load_nogic(struct kvm_vcpu *vcpu)

>  {

> -	kvm_vtimer_update_mask_user(vcpu);

> +	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);

> +

> +	/*

> +	 * When using a userspace irqchip with the architected timers and a

> +	 * host interrupt controller that doesn't support an active state, we

> +	 * must still we must prevent continuously exiting from the guest, and

> +	 * therefore mask the physical interrupt by disabling it on the host

> +	 * interrupt controller when the virtual level is high, such that the

> +	 * guest can make forward progress.  Once we detect the output level

> +	 * being de-asserted, we unmask the interrupt again so that we exit

> +	 * from the guest when the timer fires.

> +	 */

> +	if (vtimer->irq.level)

> +		disable_percpu_irq(host_vtimer_irq);

> +	else

> +		enable_percpu_irq(host_vtimer_irq, host_vtimer_irq_flags);

>  }

>  

>  void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)

> @@ -487,10 +487,10 @@ void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)

>  	if (unlikely(!timer->enabled))

>  		return;

>  

> -	if (unlikely(!irqchip_in_kernel(vcpu->kvm)))

> -		kvm_timer_vcpu_load_user(vcpu);

> +	if (has_gic_active_state)


likely()?

> +		kvm_timer_vcpu_load_gic(vcpu);

>  	else

> -		kvm_timer_vcpu_load_vgic(vcpu);

> +		kvm_timer_vcpu_load_nogic(vcpu);

>  

>  	set_cntvoff(vtimer->cntvoff);

>  

> @@ -555,18 +555,23 @@ static void unmask_vtimer_irq_user(struct kvm_vcpu *vcpu)

>  {

>  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);

>  

> -	if (unlikely(!irqchip_in_kernel(vcpu->kvm))) {

> -		__timer_snapshot_state(vtimer);

> -		if (!kvm_timer_should_fire(vtimer)) {

> -			kvm_timer_update_irq(vcpu, false, vtimer);

> -			kvm_vtimer_update_mask_user(vcpu);

> -		}

> +	__timer_snapshot_state(vtimer);

> +	if (!kvm_timer_should_fire(vtimer)) {

> +		kvm_timer_update_irq(vcpu, false, vtimer);

> +		if (!has_gic_active_state)


unlikely()?

> +			enable_percpu_irq(host_vtimer_irq, host_vtimer_irq_flags);

>  	}

>  }

>  

>  void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)

>  {

> -	unmask_vtimer_irq_user(vcpu);

> +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;

> +

> +	if (unlikely(!timer->enabled))

> +		return;

> +

> +	if (unlikely(!irqchip_in_kernel(vcpu->kvm)))

> +		unmask_vtimer_irq_user(vcpu);

>  }

>  

>  int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu)

> @@ -753,6 +758,8 @@ int kvm_timer_hyp_init(bool has_gic)

>  			kvm_err("kvm_arch_timer: error setting vcpu affinity\n");

>  			goto out_free_irq;

>  		}

> +

> +		has_gic_active_state = true;


or even better, how about turning this into a static key? This is a
one-off thing, and nobody is going to remove the GIC from the system, so
we might as well optimize it in our favour.

>  	}

>  

>  	kvm_info("virtual timer IRQ%d\n", host_vtimer_irq);

> 


These nits aside, this looks like the right thing to do.

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>


Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

Christoffer Dall Jan. 31, 2018, 10:05 a.m. UTC | #2

On Wed, Jan 31, 2018 at 09:37:31AM +0000, Marc Zyngier wrote:
> On 30/01/18 12:46, Christoffer Dall wrote:

> > When introducing support for irqchip in userspace we needed a way to

> > mask the timer signal to prevent the guest continuously exiting due to a

> > screaming timer.

> > 

> > We did this by disabling the corresponding percpu interrupt on the

> > host interrupt controller, because we cannot rely on the host system

> > having a GIC, and therefore cannot make any assumptions about having an

> > active state to hide the timer signal.

> > 

> > Unfortunately, when introducing this feature, it became entirely

> > possible that a VCPU which belongs to a VM that has a userspace irqchip

> > can disable the vtimer irq on the host on some physical CPU, and then go

> > away without ever enabling the vimter irq on that physical CPU again.

> 

> vtimer

> 

> > 

> > This means that using irqchips in userspace on a system that also

> > supports running VMs with an in-kernel GIC can prevent forward progress

> > from in-kernel GIC VMs.

> > 

> > Later on, when we started taking virtual timer interrupts in the arch

> > timer code, we would also leave this timer state active for userspace

> > irqchip VMs, because we leave it up to a VGIC-enabled guest to

> > deactivate the hardware IRQ using the HW bit in the LR.

> > 

> > Both issues are solved by only using the enable/disable trick on systems

> > that do not have a host GIC which supports the active state, because all

> > VMs on such systems must use irqchips in userspace.  Systems that have a

> > working GIC with support for an active state use the active state to

> > mask the timer signal for both userspace an in-kernel irqchips.


this should also be "and"

> > 

> > Cc: Alexander Graf <agraf@suse.de>

> > Cc: <stable@vger.kernel.org> # v4.12+

> > Fixes: d9e139778376 ("KVM: arm/arm64: Support arch timers with a userspace gic")

> > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

> > ---

> > This conflicts horribly with everything when applied to either

> > kvmarm/queue or kvmarm/master.  Therefore, this patch is written for

> > (and applies to) v4.15 with kvmarm/queue merged and should therefore

> > apply cleanly after v4.16-rc1.  An example with this patch applied can

> > be found on kvmarm/temp-for-v4.16-rc2.  I plan on sending this along

> > with any other potential fixes post v4.16-rc1.

> > 

> >  virt/kvm/arm/arch_timer.c | 77 ++++++++++++++++++++++++++---------------------

> >  1 file changed, 42 insertions(+), 35 deletions(-)

> > 

> > diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c

> > index 70268c0bec79..228906ceb722 100644

> > --- a/virt/kvm/arm/arch_timer.c

> > +++ b/virt/kvm/arm/arch_timer.c

> > @@ -35,6 +35,7 @@

> >  static struct timecounter *timecounter;

> >  static unsigned int host_vtimer_irq;

> >  static u32 host_vtimer_irq_flags;

> > +static bool has_gic_active_state;

> >  

> >  static const struct kvm_irq_level default_ptimer_irq = {

> >  	.irq	= 30,

> > @@ -69,25 +70,6 @@ static void soft_timer_cancel(struct hrtimer *hrt, struct work_struct *work)

> >  		cancel_work_sync(work);

> >  }

> >  

> > -static void kvm_vtimer_update_mask_user(struct kvm_vcpu *vcpu)

> > -{

> > -	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);

> > -

> > -	/*

> > -	 * When using a userspace irqchip with the architected timers, we must

> > -	 * prevent continuously exiting from the guest, and therefore mask the

> > -	 * physical interrupt by disabling it on the host interrupt controller

> > -	 * when the virtual level is high, such that the guest can make

> > -	 * forward progress.  Once we detect the output level being

> > -	 * de-asserted, we unmask the interrupt again so that we exit from the

> > -	 * guest when the timer fires.

> > -	 */

> > -	if (vtimer->irq.level)

> > -		disable_percpu_irq(host_vtimer_irq);

> > -	else

> > -		enable_percpu_irq(host_vtimer_irq, 0);

> > -}

> > -

> >  static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)

> >  {

> >  	struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)dev_id;

> > @@ -107,8 +89,8 @@ static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)

> >  		kvm_timer_update_irq(vcpu, true, vtimer);

> >  

> >  	if (static_branch_unlikely(&userspace_irqchip_in_use) &&

> > -	    unlikely(!irqchip_in_kernel(vcpu->kvm)))

> > -		kvm_vtimer_update_mask_user(vcpu);

> > +	    unlikely(!irqchip_in_kernel(vcpu->kvm)) && !has_gic_active_state)

> > +		disable_percpu_irq(host_vtimer_irq);

> 

> nit: this is the negated version of the fix you posted earlier

> (0e23cb34af26 in kvmarm/queue), plus a new term... How about rewriting

> this as:

> 

> static inline bool userspace_irqchip(struct kvm *kvm)

> {

> 	return (static_branch_unlikely(&userspace_irqchip_in_use) &&

> 		unlikely(!irqchip_in_kernel(kvm)));

> }

> 

> and this becomes:

> 

> 	if (userspace_irqchip(vcpu->kvm) && !has_gic_active_state)

> 		[...]

> 

> which I find much more readable.

> 

> Same for kvm_timer_update_irq():

> 

> 	if (!userspace_irqchip(vcpu->kvm))

> 		[...]

> 


yes, good idea, I find it more readable too.

> >  

> >  	return IRQ_HANDLED;

> >  }

> > @@ -460,13 +442,16 @@ static void set_cntvoff(u64 cntvoff)

> >  	kvm_call_hyp(__kvm_timer_set_cntvoff, low, high);

> >  }

> >  

> > -static void kvm_timer_vcpu_load_vgic(struct kvm_vcpu *vcpu)

> > +static void kvm_timer_vcpu_load_gic(struct kvm_vcpu *vcpu)

> >  {

> >  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);

> >  	bool phys_active;

> >  	int ret;

> >  

> > -	phys_active = kvm_vgic_map_is_active(vcpu, vtimer->irq.irq);

> > +	if (irqchip_in_kernel(vcpu->kvm))

> > +		phys_active = kvm_vgic_map_is_active(vcpu, vtimer->irq.irq);

> > +	else

> > +		phys_active = vtimer->irq.level;

> 

> You could use the above helper here too.

> 


yes

> >  

> >  	ret = irq_set_irqchip_state(host_vtimer_irq,

> >  				    IRQCHIP_STATE_ACTIVE,

> > @@ -474,9 +459,24 @@ static void kvm_timer_vcpu_load_vgic(struct kvm_vcpu *vcpu)

> >  	WARN_ON(ret);

> >  }

> >  

> > -static void kvm_timer_vcpu_load_user(struct kvm_vcpu *vcpu)

> > +static void kvm_timer_vcpu_load_nogic(struct kvm_vcpu *vcpu)

> >  {

> > -	kvm_vtimer_update_mask_user(vcpu);

> > +	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);

> > +

> > +	/*

> > +	 * When using a userspace irqchip with the architected timers and a

> > +	 * host interrupt controller that doesn't support an active state, we

> > +	 * must still we must prevent continuously exiting from the guest, and


and here I should s/we must//

> > +	 * therefore mask the physical interrupt by disabling it on the host

> > +	 * interrupt controller when the virtual level is high, such that the

> > +	 * guest can make forward progress.  Once we detect the output level

> > +	 * being de-asserted, we unmask the interrupt again so that we exit

> > +	 * from the guest when the timer fires.

> > +	 */

> > +	if (vtimer->irq.level)

> > +		disable_percpu_irq(host_vtimer_irq);

> > +	else

> > +		enable_percpu_irq(host_vtimer_irq, host_vtimer_irq_flags);

> >  }

> >  

> >  void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)

> > @@ -487,10 +487,10 @@ void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)

> >  	if (unlikely(!timer->enabled))

> >  		return;

> >  

> > -	if (unlikely(!irqchip_in_kernel(vcpu->kvm)))

> > -		kvm_timer_vcpu_load_user(vcpu);

> > +	if (has_gic_active_state)

> 

> likely()?

> 


yes, will be fixes if using a static key.

> > +		kvm_timer_vcpu_load_gic(vcpu);

> >  	else

> > -		kvm_timer_vcpu_load_vgic(vcpu);

> > +		kvm_timer_vcpu_load_nogic(vcpu);

> >  

> >  	set_cntvoff(vtimer->cntvoff);

> >  

> > @@ -555,18 +555,23 @@ static void unmask_vtimer_irq_user(struct kvm_vcpu *vcpu)

> >  {

> >  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);

> >  

> > -	if (unlikely(!irqchip_in_kernel(vcpu->kvm))) {

> > -		__timer_snapshot_state(vtimer);

> > -		if (!kvm_timer_should_fire(vtimer)) {

> > -			kvm_timer_update_irq(vcpu, false, vtimer);

> > -			kvm_vtimer_update_mask_user(vcpu);

> > -		}

> > +	__timer_snapshot_state(vtimer);

> > +	if (!kvm_timer_should_fire(vtimer)) {

> > +		kvm_timer_update_irq(vcpu, false, vtimer);

> > +		if (!has_gic_active_state)

> 

> unlikely()?

> 


same

> > +			enable_percpu_irq(host_vtimer_irq, host_vtimer_irq_flags);


So, reading your reply made me think.  Shouldn't this actually be:

		if (static_key_likely(has_gic_active_state))
			enable_percpu_irq(host_vtimer_irq, host_vtimer_irq_flags);
		else
			irq_set_irqchip_state(host_vtimer_irq, IRQCHIP_STATE_ACTIVE, false);

... which makes me want to wrap the irqchip call in a small wrapper
called from here and from load().  What do you think?

> >  	}

> >  }

> >  

> >  void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)

> >  {

> > -	unmask_vtimer_irq_user(vcpu);

> > +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;

> > +

> > +	if (unlikely(!timer->enabled))

> > +		return;

> > +

> > +	if (unlikely(!irqchip_in_kernel(vcpu->kvm)))

> > +		unmask_vtimer_irq_user(vcpu);

> >  }

> >  

> >  int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu)

> > @@ -753,6 +758,8 @@ int kvm_timer_hyp_init(bool has_gic)

> >  			kvm_err("kvm_arch_timer: error setting vcpu affinity\n");

> >  			goto out_free_irq;

> >  		}

> > +

> > +		has_gic_active_state = true;

> 

> or even better, how about turning this into a static key? This is a

> one-off thing, and nobody is going to remove the GIC from the system, so

> we might as well optimize it in our favour.


yes.

> 

> >  	}

> >  

> >  	kvm_info("virtual timer IRQ%d\n", host_vtimer_irq);

> > 

> 

> These nits aside, this looks like the right thing to do.

> 

> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>

> 


Thanks for the review.  I'll hold the tag until I hear back on the
unmask_vtimer_irq_user() thing above though.

Thanks,
-Christoffer

Marc Zyngier Jan. 31, 2018, 11:19 a.m. UTC | #3

On 31/01/18 10:05, Christoffer Dall wrote:
> On Wed, Jan 31, 2018 at 09:37:31AM +0000, Marc Zyngier wrote:

>> On 30/01/18 12:46, Christoffer Dall wrote:

>>> When introducing support for irqchip in userspace we needed a way to

>>> mask the timer signal to prevent the guest continuously exiting due to a

>>> screaming timer.

>>>

>>> We did this by disabling the corresponding percpu interrupt on the

>>> host interrupt controller, because we cannot rely on the host system

>>> having a GIC, and therefore cannot make any assumptions about having an

>>> active state to hide the timer signal.

>>>

>>> Unfortunately, when introducing this feature, it became entirely

>>> possible that a VCPU which belongs to a VM that has a userspace irqchip

>>> can disable the vtimer irq on the host on some physical CPU, and then go

>>> away without ever enabling the vimter irq on that physical CPU again.

>>

>> vtimer

>>

>>>

>>> This means that using irqchips in userspace on a system that also

>>> supports running VMs with an in-kernel GIC can prevent forward progress

>>> from in-kernel GIC VMs.

>>>

>>> Later on, when we started taking virtual timer interrupts in the arch

>>> timer code, we would also leave this timer state active for userspace

>>> irqchip VMs, because we leave it up to a VGIC-enabled guest to

>>> deactivate the hardware IRQ using the HW bit in the LR.

>>>

>>> Both issues are solved by only using the enable/disable trick on systems

>>> that do not have a host GIC which supports the active state, because all

>>> VMs on such systems must use irqchips in userspace.  Systems that have a

>>> working GIC with support for an active state use the active state to

>>> mask the timer signal for both userspace an in-kernel irqchips.

> 

> this should also be "and"

> 

>>>

>>> Cc: Alexander Graf <agraf@suse.de>

>>> Cc: <stable@vger.kernel.org> # v4.12+

>>> Fixes: d9e139778376 ("KVM: arm/arm64: Support arch timers with a userspace gic")

>>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

>>> ---

>>> This conflicts horribly with everything when applied to either

>>> kvmarm/queue or kvmarm/master.  Therefore, this patch is written for

>>> (and applies to) v4.15 with kvmarm/queue merged and should therefore

>>> apply cleanly after v4.16-rc1.  An example with this patch applied can

>>> be found on kvmarm/temp-for-v4.16-rc2.  I plan on sending this along

>>> with any other potential fixes post v4.16-rc1.

>>>

>>>  virt/kvm/arm/arch_timer.c | 77 ++++++++++++++++++++++++++---------------------

>>>  1 file changed, 42 insertions(+), 35 deletions(-)

>>>

>>> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c

>>> index 70268c0bec79..228906ceb722 100644

>>> --- a/virt/kvm/arm/arch_timer.c

>>> +++ b/virt/kvm/arm/arch_timer.c

>>> @@ -35,6 +35,7 @@

>>>  static struct timecounter *timecounter;

>>>  static unsigned int host_vtimer_irq;

>>>  static u32 host_vtimer_irq_flags;

>>> +static bool has_gic_active_state;

>>>  

>>>  static const struct kvm_irq_level default_ptimer_irq = {

>>>  	.irq	= 30,

>>> @@ -69,25 +70,6 @@ static void soft_timer_cancel(struct hrtimer *hrt, struct work_struct *work)

>>>  		cancel_work_sync(work);

>>>  }

>>>  

>>> -static void kvm_vtimer_update_mask_user(struct kvm_vcpu *vcpu)

>>> -{

>>> -	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);

>>> -

>>> -	/*

>>> -	 * When using a userspace irqchip with the architected timers, we must

>>> -	 * prevent continuously exiting from the guest, and therefore mask the

>>> -	 * physical interrupt by disabling it on the host interrupt controller

>>> -	 * when the virtual level is high, such that the guest can make

>>> -	 * forward progress.  Once we detect the output level being

>>> -	 * de-asserted, we unmask the interrupt again so that we exit from the

>>> -	 * guest when the timer fires.

>>> -	 */

>>> -	if (vtimer->irq.level)

>>> -		disable_percpu_irq(host_vtimer_irq);

>>> -	else

>>> -		enable_percpu_irq(host_vtimer_irq, 0);

>>> -}

>>> -

>>>  static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)

>>>  {

>>>  	struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)dev_id;

>>> @@ -107,8 +89,8 @@ static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)

>>>  		kvm_timer_update_irq(vcpu, true, vtimer);

>>>  

>>>  	if (static_branch_unlikely(&userspace_irqchip_in_use) &&

>>> -	    unlikely(!irqchip_in_kernel(vcpu->kvm)))

>>> -		kvm_vtimer_update_mask_user(vcpu);

>>> +	    unlikely(!irqchip_in_kernel(vcpu->kvm)) && !has_gic_active_state)

>>> +		disable_percpu_irq(host_vtimer_irq);

>>

>> nit: this is the negated version of the fix you posted earlier

>> (0e23cb34af26 in kvmarm/queue), plus a new term... How about rewriting

>> this as:

>>

>> static inline bool userspace_irqchip(struct kvm *kvm)

>> {

>> 	return (static_branch_unlikely(&userspace_irqchip_in_use) &&

>> 		unlikely(!irqchip_in_kernel(kvm)));

>> }

>>

>> and this becomes:

>>

>> 	if (userspace_irqchip(vcpu->kvm) && !has_gic_active_state)

>> 		[...]

>>

>> which I find much more readable.

>>

>> Same for kvm_timer_update_irq():

>>

>> 	if (!userspace_irqchip(vcpu->kvm))

>> 		[...]

>>

> 

> yes, good idea, I find it more readable too.

> 

>>>  

>>>  	return IRQ_HANDLED;

>>>  }

>>> @@ -460,13 +442,16 @@ static void set_cntvoff(u64 cntvoff)

>>>  	kvm_call_hyp(__kvm_timer_set_cntvoff, low, high);

>>>  }

>>>  

>>> -static void kvm_timer_vcpu_load_vgic(struct kvm_vcpu *vcpu)

>>> +static void kvm_timer_vcpu_load_gic(struct kvm_vcpu *vcpu)

>>>  {

>>>  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);

>>>  	bool phys_active;

>>>  	int ret;

>>>  

>>> -	phys_active = kvm_vgic_map_is_active(vcpu, vtimer->irq.irq);

>>> +	if (irqchip_in_kernel(vcpu->kvm))

>>> +		phys_active = kvm_vgic_map_is_active(vcpu, vtimer->irq.irq);

>>> +	else

>>> +		phys_active = vtimer->irq.level;

>>

>> You could use the above helper here too.

>>

> 

> yes

> 

>>>  

>>>  	ret = irq_set_irqchip_state(host_vtimer_irq,

>>>  				    IRQCHIP_STATE_ACTIVE,

>>> @@ -474,9 +459,24 @@ static void kvm_timer_vcpu_load_vgic(struct kvm_vcpu *vcpu)

>>>  	WARN_ON(ret);

>>>  }

>>>  

>>> -static void kvm_timer_vcpu_load_user(struct kvm_vcpu *vcpu)

>>> +static void kvm_timer_vcpu_load_nogic(struct kvm_vcpu *vcpu)

>>>  {

>>> -	kvm_vtimer_update_mask_user(vcpu);

>>> +	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);

>>> +

>>> +	/*

>>> +	 * When using a userspace irqchip with the architected timers and a

>>> +	 * host interrupt controller that doesn't support an active state, we

>>> +	 * must still we must prevent continuously exiting from the guest, and

> 

> and here I should s/we must//

> 

>>> +	 * therefore mask the physical interrupt by disabling it on the host

>>> +	 * interrupt controller when the virtual level is high, such that the

>>> +	 * guest can make forward progress.  Once we detect the output level

>>> +	 * being de-asserted, we unmask the interrupt again so that we exit

>>> +	 * from the guest when the timer fires.

>>> +	 */

>>> +	if (vtimer->irq.level)

>>> +		disable_percpu_irq(host_vtimer_irq);

>>> +	else

>>> +		enable_percpu_irq(host_vtimer_irq, host_vtimer_irq_flags);

>>>  }

>>>  

>>>  void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)

>>> @@ -487,10 +487,10 @@ void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)

>>>  	if (unlikely(!timer->enabled))

>>>  		return;

>>>  

>>> -	if (unlikely(!irqchip_in_kernel(vcpu->kvm)))

>>> -		kvm_timer_vcpu_load_user(vcpu);

>>> +	if (has_gic_active_state)

>>

>> likely()?

>>

> 

> yes, will be fixes if using a static key.

> 

>>> +		kvm_timer_vcpu_load_gic(vcpu);

>>>  	else

>>> -		kvm_timer_vcpu_load_vgic(vcpu);

>>> +		kvm_timer_vcpu_load_nogic(vcpu);

>>>  

>>>  	set_cntvoff(vtimer->cntvoff);

>>>  

>>> @@ -555,18 +555,23 @@ static void unmask_vtimer_irq_user(struct kvm_vcpu *vcpu)

>>>  {

>>>  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);

>>>  

>>> -	if (unlikely(!irqchip_in_kernel(vcpu->kvm))) {

>>> -		__timer_snapshot_state(vtimer);

>>> -		if (!kvm_timer_should_fire(vtimer)) {

>>> -			kvm_timer_update_irq(vcpu, false, vtimer);

>>> -			kvm_vtimer_update_mask_user(vcpu);

>>> -		}

>>> +	__timer_snapshot_state(vtimer);

>>> +	if (!kvm_timer_should_fire(vtimer)) {

>>> +		kvm_timer_update_irq(vcpu, false, vtimer);

>>> +		if (!has_gic_active_state)

>>

>> unlikely()?

>>

> 

> same

> 

>>> +			enable_percpu_irq(host_vtimer_irq, host_vtimer_irq_flags);

> 

> So, reading your reply made me think.  Shouldn't this actually be:

> 

> 		if (static_key_likely(has_gic_active_state))

> 			enable_percpu_irq(host_vtimer_irq, host_vtimer_irq_flags);

> 		else

> 			irq_set_irqchip_state(host_vtimer_irq, IRQCHIP_STATE_ACTIVE, false);


Shouldn't it be the other way around? Use the active state when we have
a GIC and mask if we don't?

> 

> ... which makes me want to wrap the irqchip call in a small wrapper

> called from here and from load().  What do you think?


Indeed. Some operation that "does the right thing", no matter what what
bizarre configuration we're in. It would definitely help understanding
the flow which has become a bit unwieldy.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

Christoffer Dall Jan. 31, 2018, 11:23 a.m. UTC | #4

On Wed, Jan 31, 2018 at 12:19 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
> On 31/01/18 10:05, Christoffer Dall wrote:

>> On Wed, Jan 31, 2018 at 09:37:31AM +0000, Marc Zyngier wrote:

>>> On 30/01/18 12:46, Christoffer Dall wrote:

>>>> When introducing support for irqchip in userspace we needed a way to

>>>> mask the timer signal to prevent the guest continuously exiting due to a

>>>> screaming timer.

>>>>

>>>> We did this by disabling the corresponding percpu interrupt on the

>>>> host interrupt controller, because we cannot rely on the host system

>>>> having a GIC, and therefore cannot make any assumptions about having an

>>>> active state to hide the timer signal.

>>>>

>>>> Unfortunately, when introducing this feature, it became entirely

>>>> possible that a VCPU which belongs to a VM that has a userspace irqchip

>>>> can disable the vtimer irq on the host on some physical CPU, and then go

>>>> away without ever enabling the vimter irq on that physical CPU again.

>>>

>>> vtimer

>>>

>>>>

>>>> This means that using irqchips in userspace on a system that also

>>>> supports running VMs with an in-kernel GIC can prevent forward progress

>>>> from in-kernel GIC VMs.

>>>>

>>>> Later on, when we started taking virtual timer interrupts in the arch

>>>> timer code, we would also leave this timer state active for userspace

>>>> irqchip VMs, because we leave it up to a VGIC-enabled guest to

>>>> deactivate the hardware IRQ using the HW bit in the LR.

>>>>

>>>> Both issues are solved by only using the enable/disable trick on systems

>>>> that do not have a host GIC which supports the active state, because all

>>>> VMs on such systems must use irqchips in userspace.  Systems that have a

>>>> working GIC with support for an active state use the active state to

>>>> mask the timer signal for both userspace an in-kernel irqchips.

>>

>> this should also be "and"

>>

>>>>

>>>> Cc: Alexander Graf <agraf@suse.de>

>>>> Cc: <stable@vger.kernel.org> # v4.12+

>>>> Fixes: d9e139778376 ("KVM: arm/arm64: Support arch timers with a userspace gic")

>>>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

>>>> ---

>>>> This conflicts horribly with everything when applied to either

>>>> kvmarm/queue or kvmarm/master.  Therefore, this patch is written for

>>>> (and applies to) v4.15 with kvmarm/queue merged and should therefore

>>>> apply cleanly after v4.16-rc1.  An example with this patch applied can

>>>> be found on kvmarm/temp-for-v4.16-rc2.  I plan on sending this along

>>>> with any other potential fixes post v4.16-rc1.

>>>>

>>>>  virt/kvm/arm/arch_timer.c | 77 ++++++++++++++++++++++++++---------------------

>>>>  1 file changed, 42 insertions(+), 35 deletions(-)

>>>>

>>>> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c

>>>> index 70268c0bec79..228906ceb722 100644

>>>> --- a/virt/kvm/arm/arch_timer.c

>>>> +++ b/virt/kvm/arm/arch_timer.c

>>>> @@ -35,6 +35,7 @@

>>>>  static struct timecounter *timecounter;

>>>>  static unsigned int host_vtimer_irq;

>>>>  static u32 host_vtimer_irq_flags;

>>>> +static bool has_gic_active_state;

>>>>

>>>>  static const struct kvm_irq_level default_ptimer_irq = {

>>>>     .irq    = 30,

>>>> @@ -69,25 +70,6 @@ static void soft_timer_cancel(struct hrtimer *hrt, struct work_struct *work)

>>>>             cancel_work_sync(work);

>>>>  }

>>>>

>>>> -static void kvm_vtimer_update_mask_user(struct kvm_vcpu *vcpu)

>>>> -{

>>>> -   struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);

>>>> -

>>>> -   /*

>>>> -    * When using a userspace irqchip with the architected timers, we must

>>>> -    * prevent continuously exiting from the guest, and therefore mask the

>>>> -    * physical interrupt by disabling it on the host interrupt controller

>>>> -    * when the virtual level is high, such that the guest can make

>>>> -    * forward progress.  Once we detect the output level being

>>>> -    * de-asserted, we unmask the interrupt again so that we exit from the

>>>> -    * guest when the timer fires.

>>>> -    */

>>>> -   if (vtimer->irq.level)

>>>> -           disable_percpu_irq(host_vtimer_irq);

>>>> -   else

>>>> -           enable_percpu_irq(host_vtimer_irq, 0);

>>>> -}

>>>> -

>>>>  static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)

>>>>  {

>>>>     struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)dev_id;

>>>> @@ -107,8 +89,8 @@ static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)

>>>>             kvm_timer_update_irq(vcpu, true, vtimer);

>>>>

>>>>     if (static_branch_unlikely(&userspace_irqchip_in_use) &&

>>>> -       unlikely(!irqchip_in_kernel(vcpu->kvm)))

>>>> -           kvm_vtimer_update_mask_user(vcpu);

>>>> +       unlikely(!irqchip_in_kernel(vcpu->kvm)) && !has_gic_active_state)

>>>> +           disable_percpu_irq(host_vtimer_irq);

>>>

>>> nit: this is the negated version of the fix you posted earlier

>>> (0e23cb34af26 in kvmarm/queue), plus a new term... How about rewriting

>>> this as:

>>>

>>> static inline bool userspace_irqchip(struct kvm *kvm)

>>> {

>>>      return (static_branch_unlikely(&userspace_irqchip_in_use) &&

>>>              unlikely(!irqchip_in_kernel(kvm)));

>>> }

>>>

>>> and this becomes:

>>>

>>>      if (userspace_irqchip(vcpu->kvm) && !has_gic_active_state)

>>>              [...]

>>>

>>> which I find much more readable.

>>>

>>> Same for kvm_timer_update_irq():

>>>

>>>      if (!userspace_irqchip(vcpu->kvm))

>>>              [...]

>>>

>>

>> yes, good idea, I find it more readable too.

>>

>>>>

>>>>     return IRQ_HANDLED;

>>>>  }

>>>> @@ -460,13 +442,16 @@ static void set_cntvoff(u64 cntvoff)

>>>>     kvm_call_hyp(__kvm_timer_set_cntvoff, low, high);

>>>>  }

>>>>

>>>> -static void kvm_timer_vcpu_load_vgic(struct kvm_vcpu *vcpu)

>>>> +static void kvm_timer_vcpu_load_gic(struct kvm_vcpu *vcpu)

>>>>  {

>>>>     struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);

>>>>     bool phys_active;

>>>>     int ret;

>>>>

>>>> -   phys_active = kvm_vgic_map_is_active(vcpu, vtimer->irq.irq);

>>>> +   if (irqchip_in_kernel(vcpu->kvm))

>>>> +           phys_active = kvm_vgic_map_is_active(vcpu, vtimer->irq.irq);

>>>> +   else

>>>> +           phys_active = vtimer->irq.level;

>>>

>>> You could use the above helper here too.

>>>

>>

>> yes

>>

>>>>

>>>>     ret = irq_set_irqchip_state(host_vtimer_irq,

>>>>                                 IRQCHIP_STATE_ACTIVE,

>>>> @@ -474,9 +459,24 @@ static void kvm_timer_vcpu_load_vgic(struct kvm_vcpu *vcpu)

>>>>     WARN_ON(ret);

>>>>  }

>>>>

>>>> -static void kvm_timer_vcpu_load_user(struct kvm_vcpu *vcpu)

>>>> +static void kvm_timer_vcpu_load_nogic(struct kvm_vcpu *vcpu)

>>>>  {

>>>> -   kvm_vtimer_update_mask_user(vcpu);

>>>> +   struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);

>>>> +

>>>> +   /*

>>>> +    * When using a userspace irqchip with the architected timers and a

>>>> +    * host interrupt controller that doesn't support an active state, we

>>>> +    * must still we must prevent continuously exiting from the guest, and

>>

>> and here I should s/we must//

>>

>>>> +    * therefore mask the physical interrupt by disabling it on the host

>>>> +    * interrupt controller when the virtual level is high, such that the

>>>> +    * guest can make forward progress.  Once we detect the output level

>>>> +    * being de-asserted, we unmask the interrupt again so that we exit

>>>> +    * from the guest when the timer fires.

>>>> +    */

>>>> +   if (vtimer->irq.level)

>>>> +           disable_percpu_irq(host_vtimer_irq);

>>>> +   else

>>>> +           enable_percpu_irq(host_vtimer_irq, host_vtimer_irq_flags);

>>>>  }

>>>>

>>>>  void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)

>>>> @@ -487,10 +487,10 @@ void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)

>>>>     if (unlikely(!timer->enabled))

>>>>             return;

>>>>

>>>> -   if (unlikely(!irqchip_in_kernel(vcpu->kvm)))

>>>> -           kvm_timer_vcpu_load_user(vcpu);

>>>> +   if (has_gic_active_state)

>>>

>>> likely()?

>>>

>>

>> yes, will be fixes if using a static key.

>>

>>>> +           kvm_timer_vcpu_load_gic(vcpu);

>>>>     else

>>>> -           kvm_timer_vcpu_load_vgic(vcpu);

>>>> +           kvm_timer_vcpu_load_nogic(vcpu);

>>>>

>>>>     set_cntvoff(vtimer->cntvoff);

>>>>

>>>> @@ -555,18 +555,23 @@ static void unmask_vtimer_irq_user(struct kvm_vcpu *vcpu)

>>>>  {

>>>>     struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);

>>>>

>>>> -   if (unlikely(!irqchip_in_kernel(vcpu->kvm))) {

>>>> -           __timer_snapshot_state(vtimer);

>>>> -           if (!kvm_timer_should_fire(vtimer)) {

>>>> -                   kvm_timer_update_irq(vcpu, false, vtimer);

>>>> -                   kvm_vtimer_update_mask_user(vcpu);

>>>> -           }

>>>> +   __timer_snapshot_state(vtimer);

>>>> +   if (!kvm_timer_should_fire(vtimer)) {

>>>> +           kvm_timer_update_irq(vcpu, false, vtimer);

>>>> +           if (!has_gic_active_state)

>>>

>>> unlikely()?

>>>

>>

>> same

>>

>>>> +                   enable_percpu_irq(host_vtimer_irq, host_vtimer_irq_flags);

>>

>> So, reading your reply made me think.  Shouldn't this actually be:

>>

>>               if (static_key_likely(has_gic_active_state))

>>                       enable_percpu_irq(host_vtimer_irq, host_vtimer_irq_flags);

>>               else

>>                       irq_set_irqchip_state(host_vtimer_irq, IRQCHIP_STATE_ACTIVE, false);

>

> Shouldn't it be the other way around? Use the active state when we have

> a GIC and mask if we don't?

>


Yes, it should.  Doh.

>>

>> ... which makes me want to wrap the irqchip call in a small wrapper

>> called from here and from load().  What do you think?

>

> Indeed. Some operation that "does the right thing", no matter what what

> bizarre configuration we're in. It would definitely help understanding

> the flow which has become a bit unwieldy.

>


I tried reworking the load function to first branch off
userspace_irqchip() and then call the sub-function from here, but it
was no more readable, so I've done something in between for v2.  Stay
tuned...

Did I mention that I hate this feature, which keeps breaking, and
which really isn't covered by a simple kvm-unit-test script?

-Christoffer

Marc Zyngier Jan. 31, 2018, 11:31 a.m. UTC | #5

On 31/01/18 11:23, Christoffer Dall wrote:
> On Wed, Jan 31, 2018 at 12:19 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:

>> On 31/01/18 10:05, Christoffer Dall wrote:

>>> On Wed, Jan 31, 2018 at 09:37:31AM +0000, Marc Zyngier wrote:

>>>> On 30/01/18 12:46, Christoffer Dall wrote:

>>>>> When introducing support for irqchip in userspace we needed a way to

>>>>> mask the timer signal to prevent the guest continuously exiting due to a

>>>>> screaming timer.

>>>>>

>>>>> We did this by disabling the corresponding percpu interrupt on the

>>>>> host interrupt controller, because we cannot rely on the host system

>>>>> having a GIC, and therefore cannot make any assumptions about having an

>>>>> active state to hide the timer signal.

>>>>>

>>>>> Unfortunately, when introducing this feature, it became entirely

>>>>> possible that a VCPU which belongs to a VM that has a userspace irqchip

>>>>> can disable the vtimer irq on the host on some physical CPU, and then go

>>>>> away without ever enabling the vimter irq on that physical CPU again.

>>>>

>>>> vtimer

>>>>

>>>>>

>>>>> This means that using irqchips in userspace on a system that also

>>>>> supports running VMs with an in-kernel GIC can prevent forward progress

>>>>> from in-kernel GIC VMs.

>>>>>

>>>>> Later on, when we started taking virtual timer interrupts in the arch

>>>>> timer code, we would also leave this timer state active for userspace

>>>>> irqchip VMs, because we leave it up to a VGIC-enabled guest to

>>>>> deactivate the hardware IRQ using the HW bit in the LR.

>>>>>

>>>>> Both issues are solved by only using the enable/disable trick on systems

>>>>> that do not have a host GIC which supports the active state, because all

>>>>> VMs on such systems must use irqchips in userspace.  Systems that have a

>>>>> working GIC with support for an active state use the active state to

>>>>> mask the timer signal for both userspace an in-kernel irqchips.

>>>

>>> this should also be "and"

>>>

>>>>>

>>>>> Cc: Alexander Graf <agraf@suse.de>

>>>>> Cc: <stable@vger.kernel.org> # v4.12+

>>>>> Fixes: d9e139778376 ("KVM: arm/arm64: Support arch timers with a userspace gic")

>>>>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

>>>>> ---

>>>>> This conflicts horribly with everything when applied to either

>>>>> kvmarm/queue or kvmarm/master.  Therefore, this patch is written for

>>>>> (and applies to) v4.15 with kvmarm/queue merged and should therefore

>>>>> apply cleanly after v4.16-rc1.  An example with this patch applied can

>>>>> be found on kvmarm/temp-for-v4.16-rc2.  I plan on sending this along

>>>>> with any other potential fixes post v4.16-rc1.

>>>>>

>>>>>  virt/kvm/arm/arch_timer.c | 77 ++++++++++++++++++++++++++---------------------

>>>>>  1 file changed, 42 insertions(+), 35 deletions(-)

>>>>>

>>>>> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c

>>>>> index 70268c0bec79..228906ceb722 100644

>>>>> --- a/virt/kvm/arm/arch_timer.c

>>>>> +++ b/virt/kvm/arm/arch_timer.c

>>>>> @@ -35,6 +35,7 @@

>>>>>  static struct timecounter *timecounter;

>>>>>  static unsigned int host_vtimer_irq;

>>>>>  static u32 host_vtimer_irq_flags;

>>>>> +static bool has_gic_active_state;

>>>>>

>>>>>  static const struct kvm_irq_level default_ptimer_irq = {

>>>>>     .irq    = 30,

>>>>> @@ -69,25 +70,6 @@ static void soft_timer_cancel(struct hrtimer *hrt, struct work_struct *work)

>>>>>             cancel_work_sync(work);

>>>>>  }

>>>>>

>>>>> -static void kvm_vtimer_update_mask_user(struct kvm_vcpu *vcpu)

>>>>> -{

>>>>> -   struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);

>>>>> -

>>>>> -   /*

>>>>> -    * When using a userspace irqchip with the architected timers, we must

>>>>> -    * prevent continuously exiting from the guest, and therefore mask the

>>>>> -    * physical interrupt by disabling it on the host interrupt controller

>>>>> -    * when the virtual level is high, such that the guest can make

>>>>> -    * forward progress.  Once we detect the output level being

>>>>> -    * de-asserted, we unmask the interrupt again so that we exit from the

>>>>> -    * guest when the timer fires.

>>>>> -    */

>>>>> -   if (vtimer->irq.level)

>>>>> -           disable_percpu_irq(host_vtimer_irq);

>>>>> -   else

>>>>> -           enable_percpu_irq(host_vtimer_irq, 0);

>>>>> -}

>>>>> -

>>>>>  static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)

>>>>>  {

>>>>>     struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)dev_id;

>>>>> @@ -107,8 +89,8 @@ static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)

>>>>>             kvm_timer_update_irq(vcpu, true, vtimer);

>>>>>

>>>>>     if (static_branch_unlikely(&userspace_irqchip_in_use) &&

>>>>> -       unlikely(!irqchip_in_kernel(vcpu->kvm)))

>>>>> -           kvm_vtimer_update_mask_user(vcpu);

>>>>> +       unlikely(!irqchip_in_kernel(vcpu->kvm)) && !has_gic_active_state)

>>>>> +           disable_percpu_irq(host_vtimer_irq);

>>>>

>>>> nit: this is the negated version of the fix you posted earlier

>>>> (0e23cb34af26 in kvmarm/queue), plus a new term... How about rewriting

>>>> this as:

>>>>

>>>> static inline bool userspace_irqchip(struct kvm *kvm)

>>>> {

>>>>      return (static_branch_unlikely(&userspace_irqchip_in_use) &&

>>>>              unlikely(!irqchip_in_kernel(kvm)));

>>>> }

>>>>

>>>> and this becomes:

>>>>

>>>>      if (userspace_irqchip(vcpu->kvm) && !has_gic_active_state)

>>>>              [...]

>>>>

>>>> which I find much more readable.

>>>>

>>>> Same for kvm_timer_update_irq():

>>>>

>>>>      if (!userspace_irqchip(vcpu->kvm))

>>>>              [...]

>>>>

>>>

>>> yes, good idea, I find it more readable too.

>>>

>>>>>

>>>>>     return IRQ_HANDLED;

>>>>>  }

>>>>> @@ -460,13 +442,16 @@ static void set_cntvoff(u64 cntvoff)

>>>>>     kvm_call_hyp(__kvm_timer_set_cntvoff, low, high);

>>>>>  }

>>>>>

>>>>> -static void kvm_timer_vcpu_load_vgic(struct kvm_vcpu *vcpu)

>>>>> +static void kvm_timer_vcpu_load_gic(struct kvm_vcpu *vcpu)

>>>>>  {

>>>>>     struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);

>>>>>     bool phys_active;

>>>>>     int ret;

>>>>>

>>>>> -   phys_active = kvm_vgic_map_is_active(vcpu, vtimer->irq.irq);

>>>>> +   if (irqchip_in_kernel(vcpu->kvm))

>>>>> +           phys_active = kvm_vgic_map_is_active(vcpu, vtimer->irq.irq);

>>>>> +   else

>>>>> +           phys_active = vtimer->irq.level;

>>>>

>>>> You could use the above helper here too.

>>>>

>>>

>>> yes

>>>

>>>>>

>>>>>     ret = irq_set_irqchip_state(host_vtimer_irq,

>>>>>                                 IRQCHIP_STATE_ACTIVE,

>>>>> @@ -474,9 +459,24 @@ static void kvm_timer_vcpu_load_vgic(struct kvm_vcpu *vcpu)

>>>>>     WARN_ON(ret);

>>>>>  }

>>>>>

>>>>> -static void kvm_timer_vcpu_load_user(struct kvm_vcpu *vcpu)

>>>>> +static void kvm_timer_vcpu_load_nogic(struct kvm_vcpu *vcpu)

>>>>>  {

>>>>> -   kvm_vtimer_update_mask_user(vcpu);

>>>>> +   struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);

>>>>> +

>>>>> +   /*

>>>>> +    * When using a userspace irqchip with the architected timers and a

>>>>> +    * host interrupt controller that doesn't support an active state, we

>>>>> +    * must still we must prevent continuously exiting from the guest, and

>>>

>>> and here I should s/we must//

>>>

>>>>> +    * therefore mask the physical interrupt by disabling it on the host

>>>>> +    * interrupt controller when the virtual level is high, such that the

>>>>> +    * guest can make forward progress.  Once we detect the output level

>>>>> +    * being de-asserted, we unmask the interrupt again so that we exit

>>>>> +    * from the guest when the timer fires.

>>>>> +    */

>>>>> +   if (vtimer->irq.level)

>>>>> +           disable_percpu_irq(host_vtimer_irq);

>>>>> +   else

>>>>> +           enable_percpu_irq(host_vtimer_irq, host_vtimer_irq_flags);

>>>>>  }

>>>>>

>>>>>  void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)

>>>>> @@ -487,10 +487,10 @@ void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)

>>>>>     if (unlikely(!timer->enabled))

>>>>>             return;

>>>>>

>>>>> -   if (unlikely(!irqchip_in_kernel(vcpu->kvm)))

>>>>> -           kvm_timer_vcpu_load_user(vcpu);

>>>>> +   if (has_gic_active_state)

>>>>

>>>> likely()?

>>>>

>>>

>>> yes, will be fixes if using a static key.

>>>

>>>>> +           kvm_timer_vcpu_load_gic(vcpu);

>>>>>     else

>>>>> -           kvm_timer_vcpu_load_vgic(vcpu);

>>>>> +           kvm_timer_vcpu_load_nogic(vcpu);

>>>>>

>>>>>     set_cntvoff(vtimer->cntvoff);

>>>>>

>>>>> @@ -555,18 +555,23 @@ static void unmask_vtimer_irq_user(struct kvm_vcpu *vcpu)

>>>>>  {

>>>>>     struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);

>>>>>

>>>>> -   if (unlikely(!irqchip_in_kernel(vcpu->kvm))) {

>>>>> -           __timer_snapshot_state(vtimer);

>>>>> -           if (!kvm_timer_should_fire(vtimer)) {

>>>>> -                   kvm_timer_update_irq(vcpu, false, vtimer);

>>>>> -                   kvm_vtimer_update_mask_user(vcpu);

>>>>> -           }

>>>>> +   __timer_snapshot_state(vtimer);

>>>>> +   if (!kvm_timer_should_fire(vtimer)) {

>>>>> +           kvm_timer_update_irq(vcpu, false, vtimer);

>>>>> +           if (!has_gic_active_state)

>>>>

>>>> unlikely()?

>>>>

>>>

>>> same

>>>

>>>>> +                   enable_percpu_irq(host_vtimer_irq, host_vtimer_irq_flags);

>>>

>>> So, reading your reply made me think.  Shouldn't this actually be:

>>>

>>>               if (static_key_likely(has_gic_active_state))

>>>                       enable_percpu_irq(host_vtimer_irq, host_vtimer_irq_flags);

>>>               else

>>>                       irq_set_irqchip_state(host_vtimer_irq, IRQCHIP_STATE_ACTIVE, false);

>>

>> Shouldn't it be the other way around? Use the active state when we have

>> a GIC and mask if we don't?

>>

> 

> Yes, it should.  Doh.

> 

>>>

>>> ... which makes me want to wrap the irqchip call in a small wrapper

>>> called from here and from load().  What do you think?

>>

>> Indeed. Some operation that "does the right thing", no matter what what

>> bizarre configuration we're in. It would definitely help understanding

>> the flow which has become a bit unwieldy.

>>

> 

> I tried reworking the load function to first branch off

> userspace_irqchip() and then call the sub-function from here, but it

> was no more readable, so I've done something in between for v2.  Stay

> tuned...

> 

> Did I mention that I hate this feature, which keeps breaking, and

> which really isn't covered by a simple kvm-unit-test script?


+1. I also contend that nobody is using it. Otherwise the breakages
would have been noticed on every kernel release...

	M.
-- 
Jazz is not dead. It just smells funny...

Andrew Jones Jan. 31, 2018, 1:16 p.m. UTC | #6

On Wed, Jan 31, 2018 at 12:23:09PM +0100, Christoffer Dall wrote:
> Did I mention that I hate this feature, which keeps breaking, and

> which really isn't covered by a simple kvm-unit-test script?

>


Doesn't running the kvm-unit-tests' timer tests with
'-machine kernel_irqchip=off' exercise at least some
of these paths? If so, then the kvm-unit-tests patch
below may help

Thanks,
drew


diff --git a/arm/unittests.cfg b/arm/unittests.cfg
index 44b98cfc7afde..7c0041e44d5f3 100644
--- a/arm/unittests.cfg
+++ b/arm/unittests.cfg
@@ -116,3 +116,10 @@ file = timer.flat
 groups = timer
 timeout = 2s
 arch = arm64
+
+[timer-userspace-gic]
+file = timer.flat
+extra_params = -machine kernel_irqchip=off
+groups = timer
+timeout = 2s
+arch = arm64

Christoffer Dall Jan. 31, 2018, 1:37 p.m. UTC | #7

On Wed, Jan 31, 2018 at 2:16 PM, Andrew Jones <drjones@redhat.com> wrote:
> On Wed, Jan 31, 2018 at 12:23:09PM +0100, Christoffer Dall wrote:

>> Did I mention that I hate this feature, which keeps breaking, and

>> which really isn't covered by a simple kvm-unit-test script?

>>

>

> Doesn't running the kvm-unit-tests' timer tests with

> '-machine kernel_irqchip=off' exercise at least some

> of these paths? If so, then the kvm-unit-tests patch

> below may help

>

I do this by hand currently (for kvm-unit-tests and for other tests),
but as you say, it only covers a tiny portion of the overall problem.

To test this properly, you have to run at least 4 VMs in parallel,
some that use irqchip in userspace and some that use an in-kernel
irqchip, and then you have to reboot some of these machines while the
other machines are running workloads (to catch the
deactivate-before-ack thing), and really hammer on the timer while
migrating the VCPUs around.

Then you have to run all of this on a GICv2 machine.  And on a GICv3
machine.  And on a non-GIC (or broken GIC - is there any other kind?)
machine.

So I've done that, using my https://github.com/chazy/vm-loop-test
thing, and found these issues, but I'm probably not going to do this
for every patch, so most likely we'll see breakages.

>

> diff --git a/arm/unittests.cfg b/arm/unittests.cfg

> index 44b98cfc7afde..7c0041e44d5f3 100644

> --- a/arm/unittests.cfg

> +++ b/arm/unittests.cfg

> @@ -116,3 +116,10 @@ file = timer.flat

>  groups = timer

>  timeout = 2s

>  arch = arm64

> +

> +[timer-userspace-gic]

> +file = timer.flat

> +extra_params = -machine kernel_irqchip=off

> +groups = timer

> +timeout = 2s

> +arch = arm64

you probably want "gic_version=2" as well, since otherwise some
versions of QEMU pretend they can support a GICv3 in userspace, which
they cannot.  Also, this feature is broken around v2.9.1, so there's
another fun data point when debugging this.

It definitely doesn't hurt if we add this to kvm-unit-test.  Then
perhaps someone else than me occasionally tests this feature.

Thanks,
-Christoffer

KVM: arm/arm64: Fix arch timers with userspace irqchips

Commit Message

Comments

Patch