Message ID | 566E70C7.4070700@linaro.org |
---|---|
State | Superseded |
Headers | show |
On 12/15/2015 02:33 AM, Marc Zyngier wrote: > On 14/12/15 07:33, AKASHI Takahiro wrote: >> Marc, >> >> On 12/12/2015 01:28 AM, Marc Zyngier wrote: >>> On 11/12/15 08:06, AKASHI Takahiro wrote: >>>> Ashwin, Marc, >>>> >>>> On 12/03/2015 10:58 PM, Marc Zyngier wrote: >>>>> On 02/12/15 22:40, Ashwin Chaugule wrote: >>>>>> Hello, >>>>>> >>>>>> On 24 November 2015 at 17:25, Geoff Levand <geoff@infradead.org> wrote: >>>>>>> From: AKASHI Takahiro <takahiro.akashi@linaro.org> >>>>>>> >>>>>>> The current kvm implementation on arm64 does cpu-specific initialization >>>>>>> at system boot, and has no way to gracefully shutdown a core in terms of >>>>>>> kvm. This prevents, especially, kexec from rebooting the system on a boot >>>>>>> core in EL2. >>>>>>> >>>>>>> This patch adds a cpu tear-down function and also puts an existing cpu-init >>>>>>> code into a separate function, kvm_arch_hardware_disable() and >>>>>>> kvm_arch_hardware_enable() respectively. >>>>>>> We don't need arm64-specific cpu hotplug hook any more. >>>>>>> >>>>>>> Since this patch modifies common part of code between arm and arm64, one >>>>>>> stub definition, __cpu_reset_hyp_mode(), is added on arm side to avoid >>>>>>> compiling errors. >>>>>>> >>>>>>> Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> >>>>>>> --- >>>>>>> arch/arm/include/asm/kvm_host.h | 10 ++++- >>>>>>> arch/arm/include/asm/kvm_mmu.h | 1 + >>>>>>> arch/arm/kvm/arm.c | 79 ++++++++++++++++++--------------------- >>>>>>> arch/arm/kvm/mmu.c | 5 +++ >>>>>>> arch/arm64/include/asm/kvm_host.h | 16 +++++++- >>>>>>> arch/arm64/include/asm/kvm_mmu.h | 1 + >>>>>>> arch/arm64/include/asm/virt.h | 9 +++++ >>>>>>> arch/arm64/kvm/hyp-init.S | 33 ++++++++++++++++ >>>>>>> arch/arm64/kvm/hyp.S | 32 ++++++++++++++-- >>>>>>> 9 files changed, 138 insertions(+), 48 deletions(-) >>>>>> >>>>>> [..] >>>>>> >>>>>>> >>>>>>> >>>>>>> static struct notifier_block hyp_init_cpu_pm_nb = { >>>>>>> @@ -1108,11 +1119,6 @@ static int init_hyp_mode(void) >>>>>>> } >>>>>>> >>>>>>> /* >>>>>>> - * Execute the init code on each CPU. >>>>>>> - */ >>>>>>> - on_each_cpu(cpu_init_hyp_mode, NULL, 1); >>>>>>> - >>>>>>> - /* >>>>>>> * Init HYP view of VGIC >>>>>>> */ >>>>>>> err = kvm_vgic_hyp_init(); >>>>>> >>>>>> With this flow, the cpu_init_hyp_mode() is called only at VM guest >>>>>> creation, but vgic_hyp_init() is called at bootup. On a system with >>>>>> GICv3, it looks like we end up with bogus values from the ICH_VTR_EL2 >>>>>> (to get the number of LRs), because we're not reading it from EL2 >>>>>> anymore. >>>> >>>> Thank you for pointing this out. >>>> Recently I tested my kdump code on hikey, and as hikey(hi6220) has gic-400, >>>> I didn't notice this problem. >>> >>> Because GIC-400 is a GICv2 implementation, which is entirely MMIO based. >>> GICv3 uses some system registers that are only available at EL2, and KVM >>> needs some information contained in these registers before being able to >>> get initialized. >> >> I see. >> >>>>> Indeed, this is completely broken (I just reproduced the issue on a >>>>> model). I wish this kind of details had been checked earlier, but thanks >>>>> for pointing it out. >>>>> >>>>>> Whats the best way to fix this? >>>>>> - Call kvm_arch_hardware_enable() before vgic_hyp_init() and disable later? >>>>>> - Fold the VGIC init stuff back into hardware_enable()? >>>>> >>>>> None of that works - kvm_arch_hardware_enable() is called once per CPU, >>>>> while vgic_hyp_init() can only be called once. Also, >>>>> kvm_arch_hardware_enable() is called from interrupt context, and I >>>>> wouldn't feel comfortable starting probing DT and allocating stuff from >>>>> there. >>>> >>>> Do you think so? >>>> How about the fixup! patch attached below? >>>> The point is that, like Ashwin's first idea, we initialize cpus temporarily >>>> before kvm_vgic_hyp_init() and then soon reset cpus again. Thus, >>>> kvm cpu hotplug will still continue to work as before. >>>> Now that cpu_init_hyp_mode() is revived as exactly the same as Marc's >>>> original code, the change will not be a big jump. >>> >>> This seems quite complicated: >>> - init EL2 on all CPUs >>> - do some initialization >>> - tear all CPUs EL2 down >>> - let KVM drive the vectors being set or not >>> >>> My questions are: why do we need to do this on *all* cpus? Can't that >>> work on a single one? >> >> I did initialize all the cpus partly because using preempt_enable/disable >> looked a bit ugly and partly because we may, in the future, do additional >> per-cpu initialization in kvm_vgic_hyp_init() and/or kvm_timer_hyp_init(). >> But if you're comfortable with preempt_*() stuff, I don' care. >> >> >>> Also, the simple fact that we were able to get some junk value is a sign >>> that something is amiss. I'd expect a splat of some sort, because we now >>> have a possibility of doing things in the wrong context. >>> >>>> >>>> If kvm_hyp_call() in vgic_v3_probe()/kvm_vgic_hyp_init() is a *problem*, >>>> I hope this should work. Actually I confirmed that, with this fixup! patch, >>>> we could run a kvm guest and also successfully executed kexec on model w/gic-v3. >>>> >>>> My only concern is the following kernel message I saw when kexec shut down >>>> the kernel: >>>> (Please note that I was running one kvm quest (pid=961) here.) >>>> >>>> === >>>> sh-4.3# ./kexec -d -e >>>> kexec version: 15.11.16.11.06-g41e52e2 >>>> arch_process_options:112: command_line: (null) >>>> arch_process_options:114: initrd: (null) >>>> arch_process_options:115: dtb: (null) >>>> arch_process_options:117: port: 0x0 >>>> kvm: exiting hardware virtualization >>>> kvm [961]: Unsupported exception type: 6248304 <== this message >>> >>> That makes me feel very uncomfortable. It looks like we've exited a >>> guest with some horrible value in X0. How is that even possible? >>> >>> This deserves to be investigated. >> >> I guess the problem is that cpu tear-down function is called even if a kvm guest >> is still running in kvm_arch_vcpu_ioctl_run(). >> So adding a check whether cpu has been initialized or not in every iteration of >> kvm_arch_vcpu_ioctl_run() will, if necessary, terminate a guest safely without entering >> a guest mode. Since this check is done while interrupt is disabled, it won't >> interfere with kvm_arch_hardware_disable() called via IPI. >> See the attached fixup patch. >> >> Again, I verified the code on model. >> >> Thanks, >> -Takahiro AKASHI >> >>> Thanks, >>> >>> M. >>> >> >> ----8<---- >> From 77f273ba5e0c3dfcf75a5a8d1da8035cc390250c Mon Sep 17 00:00:00 2001 >> From: AKASHI Takahiro <takahiro.akashi@linaro.org> >> Date: Fri, 11 Dec 2015 13:43:35 +0900 >> Subject: [PATCH] fixup! arm64: kvm: allows kvm cpu hotplug >> >> --- >> arch/arm/kvm/arm.c | 45 ++++++++++++++++++++++++++++++++++----------- >> 1 file changed, 34 insertions(+), 11 deletions(-) >> >> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c >> index 518c3c7..d7e86fb 100644 >> --- a/arch/arm/kvm/arm.c >> +++ b/arch/arm/kvm/arm.c >> @@ -573,7 +573,11 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) >> /* >> * Re-check atomic conditions >> */ >> - if (signal_pending(current)) { >> + if (__hyp_get_vectors() == hyp_default_vectors) { >> + /* cpu has been torn down */ >> + ret = -ENOEXEC; >> + run->exit_reason = KVM_EXIT_SHUTDOWN; > > > That feels completely overkill (and very slow). Why don't you maintain a > per-cpu variable containing the CPU states, which will avoid calling > __hyp_get_vectors() all the time? You should be able to reuse that > construct everywhere. OK. Since I have introduced per-cpu variable, kvm_arm_hardware_enabled, against cpuidle issue, we will be able to re-use it. > Also, I'm not sure about KVM_EXIT_SHUTDOWN. This looks very x86 specific > (called on triple fault). No, I don't think so. Looking at kvm_cpu_exec() in kvm-all.c of qemu, KVM_EXIT_SHUTDOWN is handled in a generic way and results in a reset request. On the other hand, KVM_EXIT_FAIL_ENTRY seems more arch-specific. In addition, if kvm_vcpu_ioctl() returns a negative value, run->exit_reason will never be examined. So I think ret -> 0 run->exit_reason -> KVM_EXIT_SHUTDOWN or just ret -> -ENOEXEC is the best. In either way, a guest will have no good chance to gracefully shutdown itself because we're kexec'ing (without waiting for threads' termination). -Takahiro AKASHI > KVM_EXIT_FAIL_ENTRY looks more appropriate, > and the hardware_entry_failure_reason field should be populated (and > documented). > > Thanks, > > M. > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index 518c3c7..d7e86fb 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -573,7 +573,11 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) /* * Re-check atomic conditions */ - if (signal_pending(current)) { + if (__hyp_get_vectors() == hyp_default_vectors) { + /* cpu has been torn down */ + ret = -ENOEXEC; + run->exit_reason = KVM_EXIT_SHUTDOWN; + } else if (signal_pending(current)) { ret = -EINTR; run->exit_reason = KVM_EXIT_INTR; } @@ -950,7 +954,7 @@ long kvm_arch_vm_ioctl(struct file *filp, } } -int kvm_arch_hardware_enable(void) +static void cpu_init_hyp_mode(void) { phys_addr_t boot_pgd_ptr; phys_addr_t pgd_ptr; @@ -958,9 +962,6 @@ int kvm_arch_hardware_enable(void) unsigned long stack_page; unsigned long vector_ptr; - if (__hyp_get_vectors() != hyp_default_vectors) - return 0; - /* Switch from the HYP stub to our own HYP init vector */ __hyp_set_vectors(kvm_get_idmap_vector()); @@ -973,24 +974,35 @@ int kvm_arch_hardware_enable(void) __cpu_init_hyp_mode(boot_pgd_ptr, pgd_ptr, hyp_stack_ptr, vector_ptr); kvm_arm_init_debug(); - - return 0; } -void kvm_arch_hardware_disable(void) +static void cpu_reset_hyp_mode(void) { phys_addr_t boot_pgd_ptr; phys_addr_t phys_idmap_start; - if (__hyp_get_vectors() == hyp_default_vectors) - return; - boot_pgd_ptr = kvm_mmu_get_boot_httbr(); phys_idmap_start = kvm_get_idmap_start(); __cpu_reset_hyp_mode(boot_pgd_ptr, phys_idmap_start); } +int kvm_arch_hardware_enable(void) +{ + if (__hyp_get_vectors() == hyp_default_vectors) + cpu_init_hyp_mode(); + + return 0; +} + +void kvm_arch_hardware_disable(void) +{ + if (__hyp_get_vectors() == hyp_default_vectors) + return; + + cpu_reset_hyp_mode(); +} + #ifdef CONFIG_CPU_PM static int hyp_init_cpu_pm_notifier(struct notifier_block *self, unsigned long cmd, @@ -1114,9 +1126,20 @@ static int init_hyp_mode(void) } /* + * Init this CPU temporarily to execute kvm_hyp_call() + * during kvm_vgic_hyp_init(). + */ + preempt_disable(); + cpu_init_hyp_mode(); + + /* * Init HYP view of VGIC */ err = kvm_vgic_hyp_init(); + + cpu_reset_hyp_mode(); + preempt_enable(); + if (err) goto out_free_context;