From patchwork Wed Dec 11 19:37:03 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxim Levitsky X-Patchwork-Id: 849487 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2FB851C4612 for ; Wed, 11 Dec 2024 19:37:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733945837; cv=none; b=f7uFcgZeB3n0u7uiP/sBkfNJE16mLaGG753LBiNjNJ27CnJ5SB+xtb4/k/8OesffyucrCDdkmGQrcyE0JAoWxEc6gtuf+vHZcZazH9YIDQSPVRTIOMpatyh3NZs4sHId+oui8ESz9wCACgfdEUIwY9j6a0nYkETLa8mvoE2DTcc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733945837; c=relaxed/simple; bh=gsaF2mp6+IFXG0g59HGoFlRenIZvGwM/zs6wchiu/aQ=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Yljj1Y5j03tEq6pnaEAHa1nKzhRQ2sXLPhpTaudyI1wMY0ZgAvb6wgEJ0LnhSC0jb0MXLrf9mttwuNVHe8ixHfB73AkMpJ4rQZdrqmpp83HvQcn6HbfM9kYYIsPTPDcMLdckM5mCyp9WKocUcdd1dT16cXsXgGGvv6eFJPis6Hs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=SYopwj31; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="SYopwj31" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733945834; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jCr0ZS2R3dDRLHGaXzPRHv9yEgb4A4mVvp8VJCXI81M=; b=SYopwj31A8g/Sd4vXFPiSIF8Wm/meH+pnavXJfbr/rhoewH5Qr/cQ8DNVOmPbNB9epNFyv jwxgZYGnmiUzGBkKJrA2l5YYYFJ0Laoy6qHEsWWmMjxD01eBq9DpayzpAuZG1cwOUMIlcM BCUu3TVlyhN5WcBx9yaF3L/0+kFZVgU= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-197-Q2fhomCGNlaIt0r6kdwOlA-1; Wed, 11 Dec 2024 14:37:12 -0500 X-MC-Unique: Q2fhomCGNlaIt0r6kdwOlA-1 X-Mimecast-MFC-AGG-ID: Q2fhomCGNlaIt0r6kdwOlA Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 452311944B17; Wed, 11 Dec 2024 19:37:11 +0000 (UTC) Received: from starship.lan (unknown [10.22.82.46]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id C43281956048; Wed, 11 Dec 2024 19:37:09 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Cc: linux-kselftest@vger.kernel.org, Sean Christopherson , Paolo Bonzini , linux-kernel@vger.kernel.org, x86@kernel.org, Maxim Levitsky Subject: [PATCH 1/4] KVM: VMX: read the PML log in the same order as it was written Date: Wed, 11 Dec 2024 14:37:03 -0500 Message-Id: <20241211193706.469817-2-mlevitsk@redhat.com> In-Reply-To: <20241211193706.469817-1-mlevitsk@redhat.com> References: <20241211193706.469817-1-mlevitsk@redhat.com> Precedence: bulk X-Mailing-List: linux-kselftest@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 X86 spec specifies that the CPU writes to the PML log 'backwards' or in other words, it first writes entry 511, then entry 510 and so on, until it writes entry 0, after which the 'PML log full' VM exit happens. I also confirmed on the bare metal that the CPU indeed writes the entries in this order. KVM on the other hand, reads the entries in the opposite order, from the last written entry and towards entry 511 and dumps them in this order to the dirty ring. Usually this doesn't matter, except for one complex nesting case: KVM reties the instructions that cause MMU faults. This might cause an emulated PML log entry to be visible to L1's hypervisor before the actual memory write was committed. This happens when the L0 MMU fault is followed directly by the VM exit to L1, for example due to a pending L1 interrupt or due to the L1's 'PML log full' event. This problem doesn't have a noticeable real-world impact because this write retry is not much different from the guest writing to the same page multiple times, which is also not reflected in the dirty log. The users of the dirty logging only rely on correct reporting of the clean pages, or in other words they assume that if a page is clean, then no writes were committed to it since the moment it was marked clean. However KVM has a kvm_dirty_log_test selftest, a test that tests both the clean and the dirty pages vs the memory contents, and can fail if it detects a dirty page which has an old value at the offset 0 which the test writes. To avoid failure, the test has a workaround for this specific problem: The test skips checking memory that belongs to the last dirty ring entry, which it has seen, relying on the fact that as long as memory writes are committed in-order, only the last entry can belong to a not yet committed memory write. However, since L1's KVM is reading the PML log in the opposite direction that L0 wrote it, the last dirty ring entry often will be not the last entry written by the L0. To fix this, switch the order in which KVM reads the PML log. Note that this issue is not present on the bare metal, because on the bare metal, an update of the A/D bits of a present entry, PML logging and the actual memory write are all done by the CPU without any hypervisor intervention and pending interrupt evaluation, thus once a PML log and/or vCPU kick happens, all memory writes that are in the PML log are committed to memory. The only exception to this rule is when the guest hits a not present EPT entry, in which case KVM first reads (backward) the PML log, dumps it to the dirty ring, and *then* sets up a SPTE entry with A/D bits set, and logs this to the dirty ring, thus making the entry be the last one in the dirty ring. Signed-off-by: Maxim Levitsky --- arch/x86/kvm/vmx/vmx.c | 32 +++++++++++++++++++++----------- arch/x86/kvm/vmx/vmx.h | 1 + 2 files changed, 22 insertions(+), 11 deletions(-) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 0f008f5ef6f0..6fb946b58a75 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -6211,31 +6211,41 @@ static void vmx_flush_pml_buffer(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); u64 *pml_buf; - u16 pml_idx; + u16 pml_idx, pml_last_written_entry; + int i; pml_idx = vmcs_read16(GUEST_PML_INDEX); /* Do nothing if PML buffer is empty */ - if (pml_idx == (PML_ENTITY_NUM - 1)) + if (pml_idx == PML_LAST_ENTRY) return; + /* + * PML index always points to the next available PML buffer entity + * unless PML log has just overflowed, in which case PML index will be + * 0xFFFF. + */ + pml_last_written_entry = (pml_idx >= PML_ENTITY_NUM) ? 0 : pml_idx + 1; - /* PML index always points to next available PML buffer entity */ - if (pml_idx >= PML_ENTITY_NUM) - pml_idx = 0; - else - pml_idx++; - + /* + * PML log is written backwards: the CPU first writes the entity 511 + * then the entity 510, and so on, until it writes the entity 0 at which + * point the PML log full VM exit happens and the logging stops. + * + * Read the entries in the same order they were written, to ensure that + * the dirty ring is filled in the same order the CPU wrote them. + */ pml_buf = page_address(vmx->pml_pg); - for (; pml_idx < PML_ENTITY_NUM; pml_idx++) { + + for (i = PML_LAST_ENTRY; i >= pml_last_written_entry; i--) { u64 gpa; - gpa = pml_buf[pml_idx]; + gpa = pml_buf[i]; WARN_ON(gpa & (PAGE_SIZE - 1)); kvm_vcpu_mark_page_dirty(vcpu, gpa >> PAGE_SHIFT); } /* reset PML index */ - vmcs_write16(GUEST_PML_INDEX, PML_ENTITY_NUM - 1); + vmcs_write16(GUEST_PML_INDEX, PML_LAST_ENTRY); } static void vmx_dump_sel(char *name, uint32_t sel) diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index 43f573f6ca46..d14401c8e499 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -331,6 +331,7 @@ struct vcpu_vmx { /* Support for PML */ #define PML_ENTITY_NUM 512 +#define PML_LAST_ENTRY (PML_ENTITY_NUM - 1) struct page *pml_pg; /* apic deadline value in host tsc */ From patchwork Wed Dec 11 19:37:04 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxim Levitsky X-Patchwork-Id: 849902 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5C9FC20B81C for ; Wed, 11 Dec 2024 19:37:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733945840; cv=none; b=U0uDyHxAG1vw3gPXBBtCpu9qx4RilPvyoAp9DxOVQHiBgJEtWaCZmHkb/PEhCgeQNknb0dqYyGoT0J2qZ0y/uXM2VWlW2oitvFlVKs+qKWH5hUlBadfcWa7XMtXATkPmVqwtc8un3NyqOu3x+YYMXx9dnZdh3ERCgRXJCyVzg3M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733945840; c=relaxed/simple; bh=VhDVTu2AuXYfkTvekSpZLqV8NzNnawdQ4JQwCa4nmSY=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Ao9wuA/q+nGIwq7ezsqspoXiQL+dQ+EDpU2vqIbvod/6vFpD9VlEi7M2u+oUf3flecloIMHXhyJqE7EhVpKsyK0qr2I+sNO7EHqSVNMZ2pmCXUAaUBjtGL0bRfZVmzqRvihKfaMPJVXRjKGXqGwdMpm3ZUFIJYgk2FW0IYmer5Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=dde18hsI; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="dde18hsI" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733945837; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=UEKlIRfFGj9gsIIV+NU+wX1aKTv8UmyhOOg+xZ2JnWc=; b=dde18hsIYbyHOI+LLbjC9QMvn2nn3HaZZWTvcs7SsJ3JdLx+KPXlJh+0p2LmpT764aJTUb WGz3XD9v6bT3nZb+C7D65FlUkQzOdTOtbSXmSLj4HhUoyMLz3UywmjD/nw2XU6OnH0SXQA W8ulCZYPGXWjZ9zdFCskcvFap4MByk8= Received: from mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-284-61TFV3KWNzGv4b68XKPhjw-1; Wed, 11 Dec 2024 14:37:14 -0500 X-MC-Unique: 61TFV3KWNzGv4b68XKPhjw-1 X-Mimecast-MFC-AGG-ID: 61TFV3KWNzGv4b68XKPhjw Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id F0A0D1958B13; Wed, 11 Dec 2024 19:37:12 +0000 (UTC) Received: from starship.lan (unknown [10.22.82.46]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 8A7501956048; Wed, 11 Dec 2024 19:37:11 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Cc: linux-kselftest@vger.kernel.org, Sean Christopherson , Paolo Bonzini , linux-kernel@vger.kernel.org, x86@kernel.org, Maxim Levitsky Subject: [PATCH 2/4] KVM: selftests: dirty_log_test: Limit s390x workaround to s390x Date: Wed, 11 Dec 2024 14:37:04 -0500 Message-Id: <20241211193706.469817-3-mlevitsk@redhat.com> In-Reply-To: <20241211193706.469817-1-mlevitsk@redhat.com> References: <20241211193706.469817-1-mlevitsk@redhat.com> Precedence: bulk X-Mailing-List: linux-kselftest@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 s390 specific workaround causes the dirty-log mode of the test to dirty all the guest memory on the first iteration which is very slow when run nested. Limit this workaround to s390x. Signed-off-by: Maxim Levitsky --- tools/testing/selftests/kvm/dirty_log_test.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c index aacf80f57439..f60e2aceeae0 100644 --- a/tools/testing/selftests/kvm/dirty_log_test.c +++ b/tools/testing/selftests/kvm/dirty_log_test.c @@ -98,6 +98,7 @@ static void guest_code(void) uint64_t addr; int i; +#ifdef __s390x__ /* * On s390x, all pages of a 1M segment are initially marked as dirty * when a page of the segment is written to for the very first time. @@ -108,6 +109,7 @@ static void guest_code(void) addr = guest_test_virt_mem + i * guest_page_size; vcpu_arch_put_guest(*(uint64_t *)addr, READ_ONCE(iteration)); } +#endif while (true) { for (i = 0; i < TEST_PAGES_PER_LOOP; i++) { From patchwork Wed Dec 11 19:37:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxim Levitsky X-Patchwork-Id: 849486 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CE2C2211A0B for ; Wed, 11 Dec 2024 19:37:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733945841; cv=none; b=FZlbmic8tI9M8R/+0IOpVsJ81ScQt1x8xsh4kupMWMoHbUuCaGLhFPnksqZFPvi6gnoBU4EzMFaNmhWqJgCnkKPVOxhAtHtYMyWpsVsM5xn17pfLZl0F0nmR7a2tRtYLkLj0JuNqdl1Gtzem+QmoBkQZqkMaTD48SH1nd4OxXIQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733945841; c=relaxed/simple; bh=SS7Zhi5P3a/TDcOuudOGTzMCPtZ3Yq/HAFx0V6V1fTM=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=mEnbBLzIWCiz7xZWOSknQojhTJ1SbfcSXm1EOK/ofSweEMyEGdNkZN/5S6qAn+FJAS5qc9diyfVz/ti37+EiyudEz2QEYx21n9hgivcFpah/bQjJVOi/LpabdUq6O5FoABacBSx3p6NO1N2RAWoFjI2xRt9+0ZTBpdJIFS4/NcQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=ADyMEhaH; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="ADyMEhaH" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733945838; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=eHOQXpYW6L1pN5tyT+YAhgYu/OtoJUuGc/mmebEs6pw=; b=ADyMEhaHosvMs5NxnY3HG/tpV1j5/aVMLoMauiKL0hzAfjltntcZKhPBwVZlwgwCFUiRly FgC4iFdDud8bQtlaHkzYhoA70lKgzgDh62KzwvOaecNd/tq0Yn5rEi2P0mGDZheWwpklSc NCxGUXHS2xP8QDDQbWRNgndOzkx4N+s= Received: from mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-517-ovgfeWSdNi--CCja2vZx1g-1; Wed, 11 Dec 2024 14:37:15 -0500 X-MC-Unique: ovgfeWSdNi--CCja2vZx1g-1 X-Mimecast-MFC-AGG-ID: ovgfeWSdNi--CCja2vZx1g Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id A354D195606B; Wed, 11 Dec 2024 19:37:14 +0000 (UTC) Received: from starship.lan (unknown [10.22.82.46]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 414EC1956048; Wed, 11 Dec 2024 19:37:13 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Cc: linux-kselftest@vger.kernel.org, Sean Christopherson , Paolo Bonzini , linux-kernel@vger.kernel.org, x86@kernel.org, Maxim Levitsky Subject: [PATCH 3/4] KVM: selftests: dirty_log_test: run the guest until some dirty ring entries were harvested Date: Wed, 11 Dec 2024 14:37:05 -0500 Message-Id: <20241211193706.469817-4-mlevitsk@redhat.com> In-Reply-To: <20241211193706.469817-1-mlevitsk@redhat.com> References: <20241211193706.469817-1-mlevitsk@redhat.com> Precedence: bulk X-Mailing-List: linux-kselftest@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 When the dirty_log_test is run nested, due to very slow nature of the environment, it is possible to reach a situation in which no entries were harvested from the dirty ring and that breaks the test logic. Detect this case and just let the guest run a bit more until test obtains some entries from the dirty ring. Signed-off-by: Maxim Levitsky --- tools/testing/selftests/kvm/dirty_log_test.c | 25 +++++++++++++------- 1 file changed, 17 insertions(+), 8 deletions(-) diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c index f60e2aceeae0..a9428076a681 100644 --- a/tools/testing/selftests/kvm/dirty_log_test.c +++ b/tools/testing/selftests/kvm/dirty_log_test.c @@ -227,19 +227,21 @@ static void clear_log_create_vm_done(struct kvm_vm *vm) vm_enable_cap(vm, KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2, manual_caps); } -static void dirty_log_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot, +static bool dirty_log_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot, void *bitmap, uint32_t num_pages, uint32_t *unused) { kvm_vm_get_dirty_log(vcpu->vm, slot, bitmap); + return true; } -static void clear_log_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot, +static bool clear_log_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot, void *bitmap, uint32_t num_pages, uint32_t *unused) { kvm_vm_get_dirty_log(vcpu->vm, slot, bitmap); kvm_vm_clear_dirty_log(vcpu->vm, slot, bitmap, 0, num_pages); + return true; } /* Should only be called after a GUEST_SYNC */ @@ -350,7 +352,7 @@ static void dirty_ring_continue_vcpu(void) sem_post(&sem_vcpu_cont); } -static void dirty_ring_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot, +static bool dirty_ring_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot, void *bitmap, uint32_t num_pages, uint32_t *ring_buf_idx) { @@ -373,6 +375,10 @@ static void dirty_ring_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot, slot, bitmap, num_pages, ring_buf_idx); + /* Retry if no entries were collected */ + if (count == 0) + return false; + cleared = kvm_vm_reset_dirty_ring(vcpu->vm); /* @@ -389,6 +395,7 @@ static void dirty_ring_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot, } pr_info("Iteration %ld collected %u pages\n", iteration, count); + return true; } static void dirty_ring_after_vcpu_run(struct kvm_vcpu *vcpu, int ret, int err) @@ -424,7 +431,7 @@ struct log_mode { /* Hook when the vm creation is done (before vcpu creation) */ void (*create_vm_done)(struct kvm_vm *vm); /* Hook to collect the dirty pages into the bitmap provided */ - void (*collect_dirty_pages) (struct kvm_vcpu *vcpu, int slot, + bool (*collect_dirty_pages)(struct kvm_vcpu *vcpu, int slot, void *bitmap, uint32_t num_pages, uint32_t *ring_buf_idx); /* Hook to call when after each vcpu run */ @@ -488,7 +495,7 @@ static void log_mode_create_vm_done(struct kvm_vm *vm) mode->create_vm_done(vm); } -static void log_mode_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot, +static bool log_mode_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot, void *bitmap, uint32_t num_pages, uint32_t *ring_buf_idx) { @@ -496,7 +503,7 @@ static void log_mode_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot, TEST_ASSERT(mode->collect_dirty_pages != NULL, "collect_dirty_pages() is required for any log mode!"); - mode->collect_dirty_pages(vcpu, slot, bitmap, num_pages, ring_buf_idx); + return mode->collect_dirty_pages(vcpu, slot, bitmap, num_pages, ring_buf_idx); } static void log_mode_after_vcpu_run(struct kvm_vcpu *vcpu, int ret, int err) @@ -783,9 +790,11 @@ static void run_test(enum vm_guest_mode mode, void *arg) while (iteration < p->iterations) { /* Give the vcpu thread some time to dirty some pages */ usleep(p->interval * 1000); - log_mode_collect_dirty_pages(vcpu, TEST_MEM_SLOT_INDEX, + + if (!log_mode_collect_dirty_pages(vcpu, TEST_MEM_SLOT_INDEX, bmap, host_num_pages, - &ring_buf_idx); + &ring_buf_idx)) + continue; /* * See vcpu_sync_stop_requested definition for details on why From patchwork Wed Dec 11 19:37:06 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxim Levitsky X-Patchwork-Id: 849901 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 418272288EE for ; Wed, 11 Dec 2024 19:37:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733945843; cv=none; b=WnALYwMSyzJ80da7yYf/ReOcy5ZcGQeBH3dpKP3Npr08YKyUytmKfxUoKplvVl98t75vpfVQdGJp6zgI502Onx5PQepxYnmHfxLClCsz3bVpYRzq5CrVhLohVa4OrvKf3G7lopXhm3zlI8eooIwkzSorQcPmgS5PJFdVaHvHQ2c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733945843; c=relaxed/simple; bh=SKU4owpgrLuU+O36UTv52dBkluhmqyl+EdfAMYBpPRs=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Yho+wBFWAO3Rk8mAhkpSKurkmtxzUpDsD7GvLn2V+nZV/suYG/j0ftaqrQW7v6NIw1XZUQohxsYpvdvnRRHORUNrWHyNvHERK4WQgvXM/zoG7LhcDbllbhefJ61pzUZts5z2zp78sOjEt3HCJKRNuK0zGOwWyJ8yxO0hqXIDzX8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=Ac4mpF7I; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Ac4mpF7I" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733945841; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oRbzOlXXgzk56NiVC7qJmmj47BtRpf1glV0jPA4JBGg=; b=Ac4mpF7I5a9fAHkc0VUdKNc05Xt+V3q0349/PdvOPeQO7JjqXEvvnZv9INKX61YRMtkx+V xbfgs6c53+3ahCDMw6+oDOQ/uk5PFpK4XBeOwn638WCgAf09f4v5XaBykJbTVHt8upqNQh k5m8VhBAFOrqczsuW5R2+5deyoNTvmA= Received: from mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-588-1xFFN5XpMXe2ml3pKzlnUg-1; Wed, 11 Dec 2024 14:37:17 -0500 X-MC-Unique: 1xFFN5XpMXe2ml3pKzlnUg-1 X-Mimecast-MFC-AGG-ID: 1xFFN5XpMXe2ml3pKzlnUg Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 501C11944B11; Wed, 11 Dec 2024 19:37:16 +0000 (UTC) Received: from starship.lan (unknown [10.22.82.46]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id EB1FC1956052; Wed, 11 Dec 2024 19:37:14 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Cc: linux-kselftest@vger.kernel.org, Sean Christopherson , Paolo Bonzini , linux-kernel@vger.kernel.org, x86@kernel.org, Maxim Levitsky Subject: [PATCH 4/4] KVM: selftests: dirty_log_test: support multiple write retires Date: Wed, 11 Dec 2024 14:37:06 -0500 Message-Id: <20241211193706.469817-5-mlevitsk@redhat.com> In-Reply-To: <20241211193706.469817-1-mlevitsk@redhat.com> References: <20241211193706.469817-1-mlevitsk@redhat.com> Precedence: bulk X-Mailing-List: linux-kselftest@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 If dirty_log_test is run nested, it is possible for entries in the emulated PML log to appear before the actual memory write is committed to the RAM, due to the way KVM retries memory writes as a response to a MMU fault. In addition to that in some very rare cases retry can happen more than once, which will lead to the test failure because once the write is finally committed it may have a very outdated iteration value. Detect and avoid this case. Signed-off-by: Maxim Levitsky --- tools/testing/selftests/kvm/dirty_log_test.c | 52 +++++++++++++++++++- 1 file changed, 50 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c index a9428076a681..f07126b0205d 100644 --- a/tools/testing/selftests/kvm/dirty_log_test.c +++ b/tools/testing/selftests/kvm/dirty_log_test.c @@ -154,6 +154,7 @@ static atomic_t vcpu_sync_stop_requested; * sem_vcpu_stop and before vcpu continues to run. */ static bool dirty_ring_vcpu_ring_full; + /* * This is only used for verifying the dirty pages. Dirty ring has a very * tricky case when the ring just got full, kvm will do userspace exit due to @@ -168,7 +169,51 @@ static bool dirty_ring_vcpu_ring_full; * dirty gfn we've collected, so that if a mismatch of data found later in the * verifying process, we let it pass. */ -static uint64_t dirty_ring_last_page; +static uint64_t dirty_ring_last_page = -1ULL; + +/* + * In addition to the above, it is possible (especially if this + * test is run nested) for the above scenario to repeat multiple times: + * + * The following can happen: + * + * - L1 vCPU: Memory write is logged to PML but not committed. + * + * - L1 test thread: Ignores the write because its last dirty ring entry + * Resets the dirty ring which: + * - Resets the A/D bits in EPT + * - Issues tlb flush (invept), which is intercepted by L0 + * + * - L0: frees the whole nested ept mmu root as the response to invept, + * and thus ensures that when memory write is retried, it will fault again + * + * - L1 vCPU: Same memory write is logged to the PML but not committed again. + * + * - L1 test thread: Ignores the write because its last dirty ring entry (again) + * Resets the dirty ring which: + * - Resets the A/D bits in EPT (again) + * - Issues tlb flush (again) which is intercepted by L0 + * + * ... + * + * N times + * + * - L1 vCPU: Memory write is logged in the PML and then committed. + * Lots of other memory writes are logged and committed. + * ... + * + * - L1 test thread: Sees the memory write along with other memory writes + * in the dirty ring, and since the write is usually not + * the last entry in the dirty-ring and has a very outdated + * iteration, the test fails. + * + * + * Note that this is only possible when the write was the last log entry + * write during iteration N-1, thus remember last iteration last log entry + * and also don't fail when it is reported in the next iteration, together with + * an outdated iteration count. + */ +static uint64_t dirty_ring_prev_iteration_last_page; enum log_mode_t { /* Only use KVM_GET_DIRTY_LOG for logging */ @@ -320,6 +365,8 @@ static uint32_t dirty_ring_collect_one(struct kvm_dirty_gfn *dirty_gfns, struct kvm_dirty_gfn *cur; uint32_t count = 0; + dirty_ring_prev_iteration_last_page = dirty_ring_last_page; + while (true) { cur = &dirty_gfns[*fetch_index % test_dirty_ring_count]; if (!dirty_gfn_is_dirtied(cur)) @@ -622,7 +669,8 @@ static void vm_dirty_log_verify(enum vm_guest_mode mode, unsigned long *bmap) */ min_iter = iteration - 1; continue; - } else if (page == dirty_ring_last_page) { + } else if (page == dirty_ring_last_page || + page == dirty_ring_prev_iteration_last_page) { /* * Please refer to comments in * dirty_ring_last_page.