[v2,5/6] KVM: x86: selftests: Test core events

Message ID	20240918205319.3517569-6-coltonlewis@google.com
State	New
Headers	show Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5E1E11CBE9D for <linux-kselftest@vger.kernel.org>; Wed, 18 Sep 2024 20:54:04 +0000 (UTC) Date: Wed, 18 Sep 2024 20:53:18 +0000 In-Reply-To: <20240918205319.3517569-1-coltonlewis@google.com> Precedence: bulk Mime-Version: 1.0 References: <20240918205319.3517569-1-coltonlewis@google.com> Message-ID: <20240918205319.3517569-6-coltonlewis@google.com> Subject: [PATCH v2 5/6] KVM: x86: selftests: Test core events From: Colton Lewis <coltonlewis@google.com> To: kvm@vger.kernel.org Cc: Mingwei Zhang <mizhang@google.com>, Jinrong Liang <ljr.kernel@gmail.com>, Jim Mattson <jmattson@google.com>, Aaron Lewis <aaronlewis@google.com>, Sean Christopherson <seanjc@google.com>, Paolo Bonzini <pbonzini@redhat.com>, Shuah Khan <shuah@kernel.org>, linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org, Colton Lewis <coltonlewis@google.com> Content-Type: text/plain; charset="UTF-8"
Series	Extend pmu_counters_test to AMD CPUs \| expand [v2,0/6] Extend pmu_counters_test to AMD CPUs [v2,1/6] KVM: x86: selftests: Fix typos in macro variable use [v2,2/6] KVM: x86: selftests: Define AMD PMU CPUID leaves [v2,3/6] KVM: x86: selftests: Set up AMD VM in pmu_counters_test [v2,4/6] KVM: x86: selftests: Test read/write core counters [v2,5/6] KVM: x86: selftests: Test core events [v2,6/6] KVM: x86: selftests: Test PerfMonV2

Message ID

20240918205319.3517569-6-coltonlewis@google.com

State

New

Headers

Date: Wed, 18 Sep 2024 20:53:18 +0000
In-Reply-To: <20240918205319.3517569-1-coltonlewis@google.com>
Precedence: bulk
Mime-Version: 1.0
References: <20240918205319.3517569-1-coltonlewis@google.com>
Message-ID: <20240918205319.3517569-6-coltonlewis@google.com>
Subject: [PATCH v2 5/6] KVM: x86: selftests: Test core events
From: Colton Lewis <coltonlewis@google.com>
To: kvm@vger.kernel.org
Cc: Mingwei Zhang <mizhang@google.com>, Jinrong Liang <ljr.kernel@gmail.com>,
	Jim Mattson <jmattson@google.com>, Aaron Lewis <aaronlewis@google.com>,
	Sean Christopherson <seanjc@google.com>, Paolo Bonzini <pbonzini@redhat.com>,
 Shuah Khan <shuah@kernel.org>,
	linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org,
	Colton Lewis <coltonlewis@google.com>
Content-Type: text/plain; charset="UTF-8"

Series

Extend pmu_counters_test to AMD CPUs | expand

Commit Message

Colton Lewis Sept. 18, 2024, 8:53 p.m. UTC

Test events on core counters by iterating through every combination of
events in amd_pmu_zen_events with every core counter.

For each combination, calculate the appropriate register addresses for
the event selection/control register and the counter register. The
base addresses and layout schemes change depending on whether we have
the CoreExt feature.

To do the testing, reuse GUEST_TEST_EVENT to run a standard known
workload. Decouple it from guest_assert_event_count (now
guest_assert_intel_event_count) to generalize to AMD.

Then assert the most specific detail that can be reasonably known
about the counter result. Exact count is defined and known for some
events and for other events merely asserted to be nonzero.

Note on exact counts: AMD counts one more branch than Intel for the
same workload. Though I can't confirm a reason, the only thing it
could be is the boundary of the loop instruction being counted
differently. Presumably, when the counter reaches 0 and execution
continues to the next instruction, AMD counts this as a branch and
Intel doesn't.

Signed-off-by: Colton Lewis <coltonlewis@google.com>
---
 .../selftests/kvm/x86_64/pmu_counters_test.c  | 87 ++++++++++++++++---
 1 file changed, 77 insertions(+), 10 deletions(-)

Comments

Sean Christopherson Jan. 8, 2025, 7:31 p.m. UTC | #1

On Wed, Sep 18, 2024, Colton Lewis wrote:
> Test events on core counters by iterating through every combination of
> events in amd_pmu_zen_events with every core counter.
> 
> For each combination, calculate the appropriate register addresses for
> the event selection/control register and the counter register. The
> base addresses and layout schemes change depending on whether we have
> the CoreExt feature.
> 
> To do the testing, reuse GUEST_TEST_EVENT to run a standard known
> workload. Decouple it from guest_assert_event_count (now
> guest_assert_intel_event_count) to generalize to AMD.
> 
> Then assert the most specific detail that can be reasonably known
> about the counter result. Exact count is defined and known for some
> events and for other events merely asserted to be nonzero.
> 
> Note on exact counts: AMD counts one more branch than Intel for the
> same workload. Though I can't confirm a reason, the only thing it
> could be is the boundary of the loop instruction being counted
> differently. Presumably, when the counter reaches 0 and execution
> continues to the next instruction, AMD counts this as a branch and
> Intel doesn't

IIRC, VMRUN is counted as a branch instruction for the guest.  Assuming my memory
is correct, that means this test is going to be flaky as an asynchronous exit,
e.g. due to a host IRQ, during the measurement loop will inflate the count.  I'm
not entirely sure what to do about that :-(

> +static void __guest_test_core_event(uint8_t event_idx, uint8_t counter_idx)
> +{
> +	/* One fortunate area of actual compatibility! This register

	/*
	 * This is the proper format for multi-line comments.  We are not the
	 * crazy net/ folks.
	 */

> +	 * layout is the same for both AMD and Intel.

It's not, actually.  There are differences in the layout, it just so happens that
the differences don't throw a wrench in things.

The comments in tools/testing/selftests/kvm/include/x86_64/pmu.h document this
fairly well, I don't see any reason to have a comment here.

> +	 */
> +	uint64_t eventsel = ARCH_PERFMON_EVENTSEL_OS |
> +		ARCH_PERFMON_EVENTSEL_ENABLE |
> +		amd_pmu_zen_events[event_idx];

Align the indentation.

	uint64_t eventsel = ARCH_PERFMON_EVENTSEL_OS |
			    ARCH_PERFMON_EVENTSEL_ENABLE |
			    amd_pmu_zen_events[event_idx];

> +	bool core_ext = this_cpu_has(X86_FEATURE_PERF_CTR_EXT_CORE);
> +	uint64_t esel_msr_base = core_ext ? MSR_F15H_PERF_CTL : MSR_K7_EVNTSEL0;
> +	uint64_t cnt_msr_base = core_ext ? MSR_F15H_PERF_CTR : MSR_K7_PERFCTR0;
> +	uint64_t msr_step = core_ext ? 2 : 1;
> +	uint64_t esel_msr = esel_msr_base + msr_step * counter_idx;
> +	uint64_t cnt_msr = cnt_msr_base + msr_step * counter_idx;

This pattern of code is copy+pasted in three functions.  Please add macros and/or
helpers to consolidate things.  These should also be uint32_t, not 64.

It's a bit evil, but one approach would be to add a macro to iterate over all
PMU counters.  Eating the VM-Exit for the CPUID to get X86_FEATURE_PERF_CTR_EXT_CORE
each time is unfortunate, but I doubt/hope it's not problematic in practice.  If
the cost is meaningful, we could figure out a way to cache the info, e.g. something
awful like this might work:

	/* Note, this relies on guest state being recreated between each test. */
	static int has_perfctr_core = -1;

	if (has_perfctr_core == -1)
		has_perfctr_core = this_cpu_has(X86_FEATURE_PERFCTR_CORE);

	if (has_perfctr_core) {

static bool get_pmu_counter_msrs(int idx, uint32_t *eventsel, uint32_t *pmc)
{
	if (this_cpu_has(X86_FEATURE_PERFCTR_CORE)) {
		*eventsel = MSR_F15H_PERF_CTL + idx * 2;
		*pmc = MSR_F15H_PERF_CTR + idx * 2;
	} else {
		*eventsel = MSR_K7_EVNTSEL0 + idx;
		*pmc = MSR_K7_PERFCTR0 + idx;
	}
	return true;
}

#define for_each_pmu_counter(_i, _nr_counters, _eventsel, _pmc)		\
	for (_i = 0; i < _nr_counters; _i++)				\
		if (get_pmu_counter_msrs(_i, &_eventsel, _pmc))		\

static void guest_test_core_events(void)
{
	uint8_t nr_counters = guest_nr_core_counters();
	uint32_t eventsel_msr, pmc_msr;
	int i, j;

	for (i = 0; i < NR_AMD_ZEN_EVENTS; i++) {
		for_each_pmu_counter(j, nr_counters, eventsel_msr, pmc_msr) {
			uint64_t eventsel = ARCH_PERFMON_EVENTSEL_OS |
					    ARCH_PERFMON_EVENTSEL_ENABLE |
					    amd_pmu_zen_events[event_idx];

			GUEST_TEST_EVENT(pmc_msr, eventsel_msr, eventsel, "");
			guest_assert_amd_event_count(i, j, pmc_msr);

			if (!is_forced_emulation_enabled)
				continue;

			GUEST_TEST_EVENT(pmc_msr, eventsel_msr, eventsel, KVM_FEP);
			guest_assert_amd_event_count(i, j, pmc_msr);
		}
	}
}

diff --git a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
index 79ca7d608e00..cf2941cc7c4c 100644
--- a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
+++ b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
@@ -29,6 +29,9 @@ 
 /* Total number of instructions retired within the measured section. */
 #define NUM_INSNS_RETIRED		(NUM_LOOPS * NUM_INSNS_PER_LOOP + NUM_EXTRA_INSNS)
 
+/* AMD counting one extra branch. Probably at loop boundary condition. */
+#define NUM_BRANCH_INSNS_RETIRED_AMD	(NUM_LOOPS+1)
+#define NUM_INSNS_RETIRED_AMD		(NUM_INSNS_RETIRED+1)
 
 /*
  * Limit testing to MSRs that are actually defined by Intel (in the SDM).  MSRs
@@ -109,7 +112,7 @@  static uint8_t guest_get_pmu_version(void)
  * Sanity check that in all cases, the event doesn't count when it's disabled,
  * and that KVM correctly emulates the write of an arbitrary value.
  */
-static void guest_assert_event_count(uint8_t idx,
+static void guest_assert_intel_event_count(uint8_t idx,
 				     struct kvm_x86_pmu_feature event,
 				     uint32_t pmc, uint32_t pmc_msr)
 {
@@ -151,6 +154,33 @@  static void guest_assert_event_count(uint8_t idx,
 	GUEST_ASSERT_EQ(_rdpmc(pmc), 0xdead);
 }
 
+static void guest_assert_amd_event_count(uint8_t evt_idx, uint8_t cnt_idx, uint32_t pmc_msr)
+{
+	uint64_t count;
+	uint64_t count_pmc;
+
+	count = rdmsr(pmc_msr);
+	count_pmc = _rdpmc(cnt_idx);
+	GUEST_ASSERT_EQ(count, count_pmc);
+
+	switch (evt_idx) {
+	case AMD_ZEN_CORE_CYCLES_INDEX:
+		GUEST_ASSERT_NE(count, 0);
+		break;
+	case AMD_ZEN_INSTRUCTIONS_INDEX:
+		GUEST_ASSERT_EQ(count, NUM_INSNS_RETIRED_AMD);
+		break;
+	case AMD_ZEN_BRANCHES_INDEX:
+		GUEST_ASSERT_EQ(count, NUM_BRANCH_INSNS_RETIRED_AMD);
+		break;
+	case AMD_ZEN_BRANCH_MISSES_INDEX:
+		GUEST_ASSERT_NE(count, 0);
+		break;
+	default:
+		break;
+	}
+
+}
 /*
  * Enable and disable the PMC in a monolithic asm blob to ensure that the
  * compiler can't insert _any_ code into the measured sequence.  Note, ECX
@@ -183,28 +213,29 @@  do {										\
 	);									\
 } while (0)
 
-#define GUEST_TEST_EVENT(_idx, _event, _pmc, _pmc_msr, _ctrl_msr, _value, FEP)	\
+#define GUEST_TEST_EVENT(_pmc_msr, _ctrl_msr, _ctrl_value, FEP)			\
 do {										\
 	wrmsr(_pmc_msr, 0);							\
 										\
 	if (this_cpu_has(X86_FEATURE_CLFLUSHOPT))				\
-		GUEST_MEASURE_EVENT(_ctrl_msr, _value, "clflushopt .", FEP);	\
+		GUEST_MEASURE_EVENT(_ctrl_msr, _ctrl_value, "clflushopt .", FEP);	\
 	else if (this_cpu_has(X86_FEATURE_CLFLUSH))				\
-		GUEST_MEASURE_EVENT(_ctrl_msr, _value, "clflush .", FEP);	\
+		GUEST_MEASURE_EVENT(_ctrl_msr, _ctrl_value, "clflush .", FEP);	\
 	else									\
-		GUEST_MEASURE_EVENT(_ctrl_msr, _value, "nop", FEP);		\
-										\
-	guest_assert_event_count(_idx, _event, _pmc, _pmc_msr);			\
+		GUEST_MEASURE_EVENT(_ctrl_msr, _ctrl_value, "nop", FEP);		\
 } while (0)
 
 static void __guest_test_arch_event(uint8_t idx, struct kvm_x86_pmu_feature event,
 				    uint32_t pmc, uint32_t pmc_msr,
 				    uint32_t ctrl_msr, uint64_t ctrl_msr_value)
 {
-	GUEST_TEST_EVENT(idx, event, pmc, pmc_msr, ctrl_msr, ctrl_msr_value, "");
+	GUEST_TEST_EVENT(pmc_msr, ctrl_msr, ctrl_msr_value, "");
+	guest_assert_intel_event_count(idx, event, pmc, pmc_msr);
 
-	if (is_forced_emulation_enabled)
-		GUEST_TEST_EVENT(idx, event, pmc, pmc_msr, ctrl_msr, ctrl_msr_value, KVM_FEP);
+	if (is_forced_emulation_enabled) {
+		GUEST_TEST_EVENT(pmc_msr, ctrl_msr, ctrl_msr_value, KVM_FEP);
+		guest_assert_intel_event_count(idx, event, pmc, pmc_msr);
+	}
 }
 
 #define X86_PMU_FEATURE_NULL						\
@@ -697,9 +728,45 @@  static void guest_test_rdwr_core_counters(void)
 	}
 }
 
+static void __guest_test_core_event(uint8_t event_idx, uint8_t counter_idx)
+{
+	/* One fortunate area of actual compatibility! This register
+	 * layout is the same for both AMD and Intel.
+	 */
+	uint64_t eventsel = ARCH_PERFMON_EVENTSEL_OS |
+		ARCH_PERFMON_EVENTSEL_ENABLE |
+		amd_pmu_zen_events[event_idx];
+	bool core_ext = this_cpu_has(X86_FEATURE_PERF_CTR_EXT_CORE);
+	uint64_t esel_msr_base = core_ext ? MSR_F15H_PERF_CTL : MSR_K7_EVNTSEL0;
+	uint64_t cnt_msr_base = core_ext ? MSR_F15H_PERF_CTR : MSR_K7_PERFCTR0;
+	uint64_t msr_step = core_ext ? 2 : 1;
+	uint64_t esel_msr = esel_msr_base + msr_step * counter_idx;
+	uint64_t cnt_msr = cnt_msr_base + msr_step * counter_idx;
+
+	GUEST_TEST_EVENT(cnt_msr, esel_msr, eventsel, "");
+	guest_assert_amd_event_count(event_idx, counter_idx, cnt_msr);
+
+	if (is_forced_emulation_enabled) {
+		GUEST_TEST_EVENT(cnt_msr, esel_msr, eventsel, KVM_FEP);
+		guest_assert_amd_event_count(event_idx, counter_idx, cnt_msr);
+	}
+
+}
+
+static void guest_test_core_events(void)
+{
+	uint8_t nr_counters = guest_nr_core_counters();
+
+	for (uint8_t i = 0; i < NR_AMD_ZEN_EVENTS; i++) {
+		for (uint8_t j = 0; j < nr_counters; j++)
+			__guest_test_core_event(i, j);
+	}
+}
+
 static void guest_test_core_counters(void)
 {
 	guest_test_rdwr_core_counters();
+	guest_test_core_events();
 	GUEST_DONE();
 }

[v2,5/6] KVM: x86: selftests: Test core events

Commit Message

Comments

Patch