Message ID | 1417113660-23610-6-git-send-email-christoffer.dall@linaro.org |
---|---|
State | New |
Headers | show |
On 27 November 2014 at 18:41, Christoffer Dall <christoffer.dall@linaro.org> wrote: > When a vcpu calls SYSTEM_OFF or SYSTEM_RESET with PSCI v0.2, the vcpus > should really be turned off for the VM adhering to the suggestions in > the PSCI spec, and it's the sane thing to do. > > Also, to ensure a coherent icache/dcache/ram situation when restarting > with the guest MMU off, flush all stage-2 page table entries so we start > taking aborts when the guest reboots, and flush/invalidate the necessary > cache lines. > > Clarify the behavior and expectations for arm/arm64 in the > KVM_EXIT_SYSTEM_EVENT case. > > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> > --- > Documentation/virtual/kvm/api.txt | 4 ++++ > arch/arm/kvm/psci.c | 18 ++++++++++++++++++ > arch/arm64/include/asm/kvm_host.h | 1 + > 3 files changed, 23 insertions(+) > > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt > index fc12b4f..c67e4956 100644 > --- a/Documentation/virtual/kvm/api.txt > +++ b/Documentation/virtual/kvm/api.txt > @@ -2955,6 +2955,10 @@ HVC instruction based PSCI call from the vcpu. The 'type' field describes > the system-level event type. The 'flags' field describes architecture > specific flags for the system-level event. > > +In the case of ARM/ARM64, all vcpus will be powered off when requesting shutdown > +or reset, and it is the responsibility of userspace to reinitialize the vcpus > +using KVM_ARM_VCPU_INIT. Heh, we're not even consistent within this patchseries about the capitalisation of "vcpu" :-) What happens if you try to KVM_RUN a CPU the kernel thinks is powered down? Does the kernel just say "ok, doing nothing"? Also, the clarification we want here should not I think be architecture specific -- the handling of the exit system event in QEMU is in common code. What you want to say is something like: "Valid values for 'type' are: KVM_SYSTEM_EVENT_SHUTDOWN -- the guest has requested a shutdown of the VM. Userspace is not obliged to honour this, and if it does honour this does not need to destroy the VM synchronously (ie it may call KVM_RUN again before shutdown finally occurs). KVM_SYSTEM_EVENT_RESET -- the guest has requested a reset of the VM. As with SHUTDOWN, userspace is permitted to ignore the request, or to schedule the reset to occur in the future and may call KVM_RUN again." The corollary is that it's the kernel's job to deal with any impedance mismatch between this and whatever ABI like PSCI it's implementing, but that's fairly obvious so doesn't really need mentioning in the docs. (I'd like to claim that "the vcpus are powered off when requesting shutdown" is an implementation detail of this, not part of the API. I think we can get away with that...) > + > /* Fix the size of the union. */ > char padding[256]; > }; > diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c > index 09cf377..b4ab613 100644 > --- a/arch/arm/kvm/psci.c > +++ b/arch/arm/kvm/psci.c > @@ -15,11 +15,13 @@ > * along with this program. If not, see <http://www.gnu.org/licenses/>. > */ > > +#include <linux/preempt.h> > #include <linux/kvm_host.h> > #include <linux/wait.h> > > #include <asm/cputype.h> > #include <asm/kvm_emulate.h> > +#include <asm/kvm_mmu.h> > #include <asm/kvm_psci.h> > > /* > @@ -166,6 +168,22 @@ static unsigned long kvm_psci_vcpu_affinity_info(struct kvm_vcpu *vcpu) > > static void kvm_prepare_system_event(struct kvm_vcpu *vcpu, u32 type) > { > + int i; > + struct kvm_vcpu *tmp; > + > + /* Stop all vcpus */ > + kvm_for_each_vcpu(i, tmp, vcpu->kvm) > + tmp->arch.pause = true; > + preempt_disable(); > + force_vm_exit(cpu_all_mask); > + preempt_enable(); > + > + /* > + * Ensure a rebooted VM will fault in RAM pages and detect if the > + * guest MMU is turned off and flush the caches as needed. > + */ > + stage2_unmap_vm(vcpu->kvm); It seems odd to have this unmap happen on attempted system reset/powerdown, not on cpu init/start. (I seem to remember having this conversation on IRC, so maybe I've just forgotten why it has to be this way...) thanks -- PMM
On 27 November 2014 at 23:10, Peter Maydell <peter.maydell@linaro.org> wrote: > It seems odd to have this unmap happen on attempted system reset/powerdown, > not on cpu init/start. Here's a concrete case that I think requires the unmap to be done on cpu init: * start a VM and run it for a bit * from the QEMU monitor, use "loadvm" to load a VM snapshot This will cause QEMU to do a system reset (including calling VCPU_INIT to reset the CPUs), load the contents of guest RAM from the snapshot, set guest CPU registers with a pile of SET_ONE_REG calls, and then KVM_RUN to start the VM. If we don't unmap stage2 on vcpu init, then what in this sequence causes the icaches to be flushed so we execute the newly loaded ram contents rather than stale data from the first VM run? thanks -- PMM
On Mon, Dec 01, 2014 at 05:57:53PM +0000, Peter Maydell wrote: > On 27 November 2014 at 23:10, Peter Maydell <peter.maydell@linaro.org> wrote: > > It seems odd to have this unmap happen on attempted system reset/powerdown, > > not on cpu init/start. > > Here's a concrete case that I think requires the unmap to be > done on cpu init: > * start a VM and run it for a bit > * from the QEMU monitor, use "loadvm" to load a VM snapshot > > This will cause QEMU to do a system reset (including calling > VCPU_INIT to reset the CPUs), load the contents of guest > RAM from the snapshot, set guest CPU registers with a pile > of SET_ONE_REG calls, and then KVM_RUN to start the VM. > > If we don't unmap stage2 on vcpu init, then what in this > sequence causes the icaches to be flushed so we execute > the newly loaded ram contents rather than stale data > from the first VM run? > You're absolutely right that it makes more sense to stick it in vcpu_init. I put it only in the shutdown event handler for debugging and forgot that was what I was doing :) The only down-side is that we'll be trying to free memory that was never mapped on initial startup, but it's not in the critical path and we could add an explicit check to early-out if the vcpu has never been run, which may increase code readibility too (we already have that flag I belive). -Christoffer
On Thu, Nov 27, 2014 at 11:10:14PM +0000, Peter Maydell wrote: > On 27 November 2014 at 18:41, Christoffer Dall > <christoffer.dall@linaro.org> wrote: > > When a vcpu calls SYSTEM_OFF or SYSTEM_RESET with PSCI v0.2, the vcpus > > should really be turned off for the VM adhering to the suggestions in > > the PSCI spec, and it's the sane thing to do. > > > > Also, to ensure a coherent icache/dcache/ram situation when restarting > > with the guest MMU off, flush all stage-2 page table entries so we start > > taking aborts when the guest reboots, and flush/invalidate the necessary > > cache lines. > > > > Clarify the behavior and expectations for arm/arm64 in the > > KVM_EXIT_SYSTEM_EVENT case. > > > > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> > > --- > > Documentation/virtual/kvm/api.txt | 4 ++++ > > arch/arm/kvm/psci.c | 18 ++++++++++++++++++ > > arch/arm64/include/asm/kvm_host.h | 1 + > > 3 files changed, 23 insertions(+) > > > > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt > > index fc12b4f..c67e4956 100644 > > --- a/Documentation/virtual/kvm/api.txt > > +++ b/Documentation/virtual/kvm/api.txt > > @@ -2955,6 +2955,10 @@ HVC instruction based PSCI call from the vcpu. The 'type' field describes > > the system-level event type. The 'flags' field describes architecture > > specific flags for the system-level event. > > > > +In the case of ARM/ARM64, all vcpus will be powered off when requesting shutdown > > +or reset, and it is the responsibility of userspace to reinitialize the vcpus > > +using KVM_ARM_VCPU_INIT. > > Heh, we're not even consistent within this patchseries about the capitalisation > of "vcpu" :-) > > What happens if you try to KVM_RUN a CPU the kernel thinks is powered down? > Does the kernel just say "ok, doing nothing"? yes, it blocks the vcpu execution by putting the thread on a wait-queue. That's exactly what happens for the secondary vcpus in an SMP guest using PSCI. > > Also, the clarification we want here should not I think be architecture > specific -- the handling of the exit system event in QEMU is in common > code. What you want to say is something like: > > "Valid values for 'type' are: > KVM_SYSTEM_EVENT_SHUTDOWN -- the guest has requested a shutdown of the > VM. Userspace is not obliged to honour this, and if it does honour > this does not need to destroy the VM synchronously (ie it may call > KVM_RUN again before shutdown finally occurs). > KVM_SYSTEM_EVENT_RESET -- the guest has requested a reset of the VM. > As with SHUTDOWN, userspace is permitted to ignore the request, or > to schedule the reset to occur in the future and may call KVM_RUN again." ok, this is pretty good, but do we need to say that userspace is permitted to do this or that? The kernel never relies on user space for correct functionality, so do you mean 'for the run a vm semantics to still otherwise be functional'? > > The corollary is that it's the kernel's job to deal with any impedance > mismatch between this and whatever ABI like PSCI it's implementing, but > that's fairly obvious so doesn't really need mentioning in the docs. I didn't find it obvious (which is why I thought we'd spell it out), but I agree that not mentioning it makes this arch-generic and we can put the other stuff into a comment in arch/arm/kvm/psci.c. > > (I'd like to claim that "the vcpus are powered off when requesting shutdown" > is an implementation detail of this, not part of the API. I think we can > get away with that...) > ok > > + > > /* Fix the size of the union. */ > > char padding[256]; > > }; > > diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c > > index 09cf377..b4ab613 100644 > > --- a/arch/arm/kvm/psci.c > > +++ b/arch/arm/kvm/psci.c > > @@ -15,11 +15,13 @@ > > * along with this program. If not, see <http://www.gnu.org/licenses/>. > > */ > > > > +#include <linux/preempt.h> > > #include <linux/kvm_host.h> > > #include <linux/wait.h> > > > > #include <asm/cputype.h> > > #include <asm/kvm_emulate.h> > > +#include <asm/kvm_mmu.h> > > #include <asm/kvm_psci.h> > > > > /* > > @@ -166,6 +168,22 @@ static unsigned long kvm_psci_vcpu_affinity_info(struct kvm_vcpu *vcpu) > > > > static void kvm_prepare_system_event(struct kvm_vcpu *vcpu, u32 type) > > { > > + int i; > > + struct kvm_vcpu *tmp; > > + > > + /* Stop all vcpus */ > > + kvm_for_each_vcpu(i, tmp, vcpu->kvm) > > + tmp->arch.pause = true; > > + preempt_disable(); > > + force_vm_exit(cpu_all_mask); > > + preempt_enable(); > > + > > + /* > > + * Ensure a rebooted VM will fault in RAM pages and detect if the > > + * guest MMU is turned off and flush the caches as needed. > > + */ > > + stage2_unmap_vm(vcpu->kvm); > > It seems odd to have this unmap happen on attempted system reset/powerdown, > not on cpu init/start. (I seem to remember having this conversation on > IRC, so maybe I've just forgotten why it has to be this way...) > no, as I said in the other mail, I forgot I was submitting a hack to the list. Nice job on my side. I'll test an implementation that does this at init time for the next revision. Thanks! -Christoffer
On 2 December 2014 at 15:01, Christoffer Dall <christoffer.dall@linaro.org> wrote: > On Thu, Nov 27, 2014 at 11:10:14PM +0000, Peter Maydell wrote: >> Also, the clarification we want here should not I think be architecture >> specific -- the handling of the exit system event in QEMU is in common >> code. What you want to say is something like: >> >> "Valid values for 'type' are: >> KVM_SYSTEM_EVENT_SHUTDOWN -- the guest has requested a shutdown of the >> VM. Userspace is not obliged to honour this, and if it does honour >> this does not need to destroy the VM synchronously (ie it may call >> KVM_RUN again before shutdown finally occurs). >> KVM_SYSTEM_EVENT_RESET -- the guest has requested a reset of the VM. >> As with SHUTDOWN, userspace is permitted to ignore the request, or >> to schedule the reset to occur in the future and may call KVM_RUN again." > > ok, this is pretty good, but do we need to say that userspace is > permitted to do this or that? The kernel never relies on user space for > correct functionality, so do you mean 'for the run a vm semantics to > still otherwise be functional'? I meant "permitted" in the sense of "the kernel won't kill the VM, return errnos to subsequent KVM_RUN requests or otherwise treat this userspace behaviour as buggy". If you want to rephrase it somehow I don't object, as long as the docs make it clear that it's a valid implementation strategy for userspace to do that. -- PMM
diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index fc12b4f..c67e4956 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2955,6 +2955,10 @@ HVC instruction based PSCI call from the vcpu. The 'type' field describes the system-level event type. The 'flags' field describes architecture specific flags for the system-level event. +In the case of ARM/ARM64, all vcpus will be powered off when requesting shutdown +or reset, and it is the responsibility of userspace to reinitialize the vcpus +using KVM_ARM_VCPU_INIT. + /* Fix the size of the union. */ char padding[256]; }; diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c index 09cf377..b4ab613 100644 --- a/arch/arm/kvm/psci.c +++ b/arch/arm/kvm/psci.c @@ -15,11 +15,13 @@ * along with this program. If not, see <http://www.gnu.org/licenses/>. */ +#include <linux/preempt.h> #include <linux/kvm_host.h> #include <linux/wait.h> #include <asm/cputype.h> #include <asm/kvm_emulate.h> +#include <asm/kvm_mmu.h> #include <asm/kvm_psci.h> /* @@ -166,6 +168,22 @@ static unsigned long kvm_psci_vcpu_affinity_info(struct kvm_vcpu *vcpu) static void kvm_prepare_system_event(struct kvm_vcpu *vcpu, u32 type) { + int i; + struct kvm_vcpu *tmp; + + /* Stop all vcpus */ + kvm_for_each_vcpu(i, tmp, vcpu->kvm) + tmp->arch.pause = true; + preempt_disable(); + force_vm_exit(cpu_all_mask); + preempt_enable(); + + /* + * Ensure a rebooted VM will fault in RAM pages and detect if the + * guest MMU is turned off and flush the caches as needed. + */ + stage2_unmap_vm(vcpu->kvm); + memset(&vcpu->run->system_event, 0, sizeof(vcpu->run->system_event)); vcpu->run->system_event.type = type; vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT; diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 2012c4b..dbd3212 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -200,6 +200,7 @@ struct kvm_vcpu *kvm_arm_get_running_vcpu(void); struct kvm_vcpu * __percpu *kvm_get_running_vcpus(void); u64 kvm_call_hyp(void *hypfn, ...); +void force_vm_exit(const cpumask_t *mask); int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run, int exception_index);
When a vcpu calls SYSTEM_OFF or SYSTEM_RESET with PSCI v0.2, the vcpus should really be turned off for the VM adhering to the suggestions in the PSCI spec, and it's the sane thing to do. Also, to ensure a coherent icache/dcache/ram situation when restarting with the guest MMU off, flush all stage-2 page table entries so we start taking aborts when the guest reboots, and flush/invalidate the necessary cache lines. Clarify the behavior and expectations for arm/arm64 in the KVM_EXIT_SYSTEM_EVENT case. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> --- Documentation/virtual/kvm/api.txt | 4 ++++ arch/arm/kvm/psci.c | 18 ++++++++++++++++++ arch/arm64/include/asm/kvm_host.h | 1 + 3 files changed, 23 insertions(+)