Message ID | 20250520035641.533706-1-leobras@redhat.com |
---|---|
State | New |
Headers | show |
Series | [RFC] cyclictest: Add skip_sec option | expand |
On Tue, May 20, 2025 at 4:35 AM John Ogness <john.ogness@linutronix.de> wrote: > > On 2025-05-20, Leonardo Bras <leobras@redhat.com> wrote: > > When running cyclictest with break trace (-b) option, wait for skip_sec > > seconds before issuing the break. > > > > Cyclictest may present high latency on the initial cycles, which can be > > caused by initial resource allocations, and may not represent the > > expected latency for the running workload. > > If cyclictest is programmed correctly, this will not happen. Can you > provide a technical explanation for high latencies on initial cycles? Hi John, We are currently investigating the source of this latency, but I heard from other team members it's also happening on Intel Secure VMs as well. Scenario: Host: ARM64, kernel-rt, 120 out of 128 cpu isolated, hugepages for guest KVM Guest: kernel-rt, 120 vcpus pinned on above isolated cpus, 116 vcpus isolated on guest Cyclictest runs with trace-cmd attaching to a guest agent: trace-cmd record --poll -m 1000 -e csd -e sched/sched_switch -e timer -e irq_handler_entry -e irq_handler_exit -e tick_stop -e ipi_send_cpumask -e ipi_send_cpu -A 3 -e csd -e sched/sched_switch -e timer -e irq_handler_entry -e irq_handler_exit -e tick_stop -e ipi_send_cpumask -e ipi_send_cpu bash -c "ssh $my_guest 'cyclictest -m -q -p95 --policy=fifo -D 2h -i 200 -h60 -t 116 -a 4-119 -b 50 --mainaffinity 0,1,2,3 --tracemark'" What we see is a peak of >50us in the first couple cycles, and then a max peak of 15 in our tests. We also tested a sleep before the above cyclictest command, in order to remove any trace-cmd setup that might have been in progress, but it did not help. If you have any idea on why this might happen, or suggest a debugging setup, it would be appreciated. Thanks! Leo
Hi Leo, On 2025-05-20, Leonardo Bras Soares Passos <leobras@redhat.com> wrote: > On Tue, May 20, 2025 at 4:35 AM John Ogness <john.ogness@linutronix.de> wrote: >> >> On 2025-05-20, Leonardo Bras <leobras@redhat.com> wrote: >> > When running cyclictest with break trace (-b) option, wait for skip_sec >> > seconds before issuing the break. >> > >> > Cyclictest may present high latency on the initial cycles, which can be >> > caused by initial resource allocations, and may not represent the >> > expected latency for the running workload. >> >> If cyclictest is programmed correctly, this will not happen. Can you >> provide a technical explanation for high latencies on initial cycles? > > We are currently investigating the source of this latency, but I heard > from other team members it's also happening on Intel Secure VMs as > well. > > Scenario: > Host: ARM64, kernel-rt, 120 out of 128 cpu isolated, hugepages for guest > KVM Guest: kernel-rt, 120 vcpus pinned on above isolated cpus, 116 > vcpus isolated on guest > > Cyclictest runs with trace-cmd attaching to a guest agent: > > trace-cmd record --poll -m 1000 -e csd -e sched/sched_switch -e timer > -e irq_handler_entry -e irq_handler_exit -e tick_stop -e > ipi_send_cpumask -e ipi_send_cpu -A 3 -e csd -e sched/sched_switch -e > timer -e irq_handler_entry -e irq_handler_exit -e tick_stop -e > ipi_send_cpumask -e ipi_send_cpu bash -c "ssh $my_guest 'cyclictest -m > -q -p95 --policy=fifo -D 2h -i 200 -h60 -t 116 -a 4-119 -b 50 > --mainaffinity 0,1,2,3 --tracemark'" > > What we see is a peak of >50us in the first couple cycles, and then a > max peak of 15 in our tests. I wonder if this related to cache misses and/or memory bandwidth. If you keep this patch but you increase the interval, do you see >50us during your tests? For example: -i 10000 If so, your "max peak" of 15us is only valid when cache hot. IMHO whatever situation is happening for the "first couple cycles" could happen later. In that case, the patch is just sweeping some of the bad numbers under the rug. It is really important to understand exactly what the problem is before changing cyclictest to ignore such problems. John
On Wed, May 21, 2025 at 5:40 AM John Ogness <john.ogness@linutronix.de> wrote: > > Hi Leo, > > On 2025-05-20, Leonardo Bras Soares Passos <leobras@redhat.com> wrote: > > On Tue, May 20, 2025 at 4:35 AM John Ogness <john.ogness@linutronix.de> wrote: > >> > >> On 2025-05-20, Leonardo Bras <leobras@redhat.com> wrote: > >> > When running cyclictest with break trace (-b) option, wait for skip_sec > >> > seconds before issuing the break. > >> > > >> > Cyclictest may present high latency on the initial cycles, which can be > >> > caused by initial resource allocations, and may not represent the > >> > expected latency for the running workload. > >> > >> If cyclictest is programmed correctly, this will not happen. Can you > >> provide a technical explanation for high latencies on initial cycles? > > > > We are currently investigating the source of this latency, but I heard > > from other team members it's also happening on Intel Secure VMs as > > well. > > > > Scenario: > > Host: ARM64, kernel-rt, 120 out of 128 cpu isolated, hugepages for guest > > KVM Guest: kernel-rt, 120 vcpus pinned on above isolated cpus, 116 > > vcpus isolated on guest > > > > Cyclictest runs with trace-cmd attaching to a guest agent: > > > > trace-cmd record --poll -m 1000 -e csd -e sched/sched_switch -e timer > > -e irq_handler_entry -e irq_handler_exit -e tick_stop -e > > ipi_send_cpumask -e ipi_send_cpu -A 3 -e csd -e sched/sched_switch -e > > timer -e irq_handler_entry -e irq_handler_exit -e tick_stop -e > > ipi_send_cpumask -e ipi_send_cpu bash -c "ssh $my_guest 'cyclictest -m > > -q -p95 --policy=fifo -D 2h -i 200 -h60 -t 116 -a 4-119 -b 50 > > --mainaffinity 0,1,2,3 --tracemark'" > > > > What we see is a peak of >50us in the first couple cycles, and then a > > max peak of 15 in our tests. > > I wonder if this related to cache misses and/or memory bandwidth. If you > keep this patch but you increase the interval, do you see >50us during > your tests? For example: > > -i 10000 > > If so, your "max peak" of 15us is only valid when cache hot. > That's a smart move, we will run this test as suggested and bring the results. > IMHO whatever situation is happening for the "first couple cycles" could > happen later. In that case, the patch is just sweeping some of the bad > numbers under the rug. It is really important to understand exactly what > the problem is before changing cyclictest to ignore such problems. That's a valid point. We are currently investigating it, and I will bring the results when we have some. I have a v2 ready, but I think it makes sense to send it once we find the above investigation if it makes sense to proceed with the suggestion. Thanks! Leo
diff --git a/src/cyclictest/cyclictest.c b/src/cyclictest/cyclictest.c index 890da5d..26e181a 100644 --- a/src/cyclictest/cyclictest.c +++ b/src/cyclictest/cyclictest.c @@ -183,20 +183,21 @@ static int use_nsecs = 0; static int refresh_on_max; static int force_sched_other; static int priospread = 0; static int check_clock_resolution; static int ct_debug; static int use_fifo = 0; static pthread_t fifo_threadid; static int laptop = 0; static int power_management = 0; static int use_histfile = 0; +static int skip_sec = 0; #ifdef ARCH_HAS_SMI_COUNTER static int smi = 0; #else #define smi 0 #endif static pthread_cond_t refresh_on_max_cond = PTHREAD_COND_INITIALIZER; static pthread_mutex_t refresh_on_max_lock = PTHREAD_MUTEX_INITIALIZER; @@ -637,20 +638,21 @@ static void *timerthread(void *param) sigset_t sigset; timer_t timer; struct timespec now, next, interval, stop = { 0 }; struct itimerval itimer; struct itimerspec tspec; struct thread_stat *stat = par->stats; int stopped = 0; cpu_set_t mask; pthread_t thread; unsigned long smi_now, smi_old = 0; + unsigned long skip_cycles = skip_sec * USEC_PER_SEC / par->interval; /* if we're running in numa mode, set our memory node */ if (par->node != -1) rt_numa_set_numa_run_on_node(par->node, par->cpu); if (par->cpu != -1) { CPU_ZERO(&mask); CPU_SET(par->cpu, &mask); thread = pthread_self(); if (pthread_setaffinity_np(thread, sizeof(mask), &mask) != 0) @@ -837,21 +839,21 @@ static void *timerthread(void *param) pthread_cond_signal(&refresh_on_max_cond); } stat->avg += (double) diff; if (trigger && (diff > trigger)) trigger_update(par, diff, calctime(now)); if (duration && (calcdiff(now, stop) >= 0)) shutdown++; - if (!stopped && tracelimit && (diff > tracelimit)) { + if (!stopped && tracelimit && (diff > tracelimit) && stat->cycles > skip_cycles) { stopped++; shutdown++; pthread_mutex_lock(&break_thread_id_lock); if (break_thread_id == 0) { break_thread_id = stat->tid; tracemark("hit latency threshold (%llu > %d)", (unsigned long long) diff, tracelimit); break_thread_value = diff; } pthread_mutex_unlock(&break_thread_id_lock); @@ -1081,20 +1083,21 @@ enum option_values { OPT_DEFAULT_SYSTEM, OPT_DISTANCE, OPT_DURATION, OPT_LATENCY, OPT_FIFO, OPT_HISTOGRAM, OPT_HISTOFALL, OPT_HISTFILE, OPT_INTERVAL, OPT_JSON, OPT_MAINAFFINITY, OPT_LOOPS, OPT_MLOCKALL, OPT_REFRESH, OPT_NANOSLEEP, OPT_NSECS, OPT_OSCOPE, OPT_PRIORITY, OPT_QUIET, OPT_PRIOSPREAD, OPT_RELATIVE, OPT_RESOLUTION, OPT_SYSTEM, OPT_SMP, OPT_THREADS, OPT_TRIGGER, OPT_TRIGGER_NODES, OPT_UNBUFFERED, OPT_NUMA, OPT_VERBOSE, OPT_DBGCYCLIC, OPT_POLICY, OPT_HELP, OPT_NUMOPTS, OPT_ALIGNED, OPT_SECALIGNED, OPT_LAPTOP, OPT_SMI, OPT_TRACEMARK, OPT_POSIX_TIMERS, OPT_DEEPEST_IDLE_STATE, + OPT_SKIP_SEC }; /* Process commandline options */ static void process_options(int argc, char *argv[], int max_cpus) { int error = 0; int option_affinity = 0; for (;;) { int option_index = 0; @@ -1137,20 +1140,21 @@ static void process_options(int argc, char *argv[], int max_cpus) {"spike-nodes", required_argument, NULL, OPT_TRIGGER_NODES }, {"threads", optional_argument, NULL, OPT_THREADS }, {"tracemark", no_argument, NULL, OPT_TRACEMARK }, {"unbuffered", no_argument, NULL, OPT_UNBUFFERED }, {"verbose", no_argument, NULL, OPT_VERBOSE }, {"dbg_cyclictest", no_argument, NULL, OPT_DBGCYCLIC }, {"policy", required_argument, NULL, OPT_POLICY }, {"help", no_argument, NULL, OPT_HELP }, {"posix_timers", no_argument, NULL, OPT_POSIX_TIMERS }, {"deepest-idle-state", required_argument, NULL, OPT_DEEPEST_IDLE_STATE }, + {"skip_sec", required_argument, NULL, OPT_SKIP_SEC }, {NULL, 0, NULL, 0 }, }; int c = getopt_long(argc, argv, "a::A::b:c:d:D:F:h:H:i:l:MNo:p:mqrRsSt::uvD:x", long_options, &option_index); if (c == -1) break; switch (c) { case 'a': case OPT_AFFINITY: option_affinity = 1; @@ -1340,20 +1344,25 @@ static void process_options(int argc, char *argv[], int max_cpus) smi = 1; #else fatal("--smi is not available on your arch\n"); #endif break; case OPT_TRACEMARK: trace_marker = 1; break; case OPT_DEEPEST_IDLE_STATE: deepest_idle_state = atoi(optarg); break; + case OPT_SKIP_SEC: + skip_sec = atoi(optarg); + if (skip_sec < 0) + skip_sec = 0; + break; } } if ((use_system == MODE_SYS_OFFSET) && (use_nanosleep == MODE_CYCLIC)) { warn("The system option requires clock_nanosleep\n"); warn("and is not compatible with posix_timers\n"); warn("Using clock_nanosleep\n"); use_nanosleep = MODE_CLOCK_NANOSLEEP; } diff --git a/src/cyclictest/cyclictest.8 b/src/cyclictest/cyclictest.8 index 6becd19..a0eaf7e 100644 --- a/src/cyclictest/cyclictest.8 +++ b/src/cyclictest/cyclictest.8 @@ -174,20 +174,22 @@ Output values on stdout for statistics. This option is used to gather statistica n:c:v where n=task number c=count v=latency value in us. .TP .B \-\-dbg_cyclictest Print info userful for debugging cyclictest .TP .B \-x, \-\-posix_timers Use POSIX timers instead of clock_nanosleep. +.B \-\-skip_sec=SEC +When using -b, skip sending break trace command for the first SEC seconds. .SH SEE ALSO .BR numa (3), .BR numactl (8), .\" .br .\" The programs are documented fully by .\" .IR "The Rise and Fall of a Fooish Bar" , .\" available via the Info system. .SH AUTHOR cyclictest was written by Thomas Gleixner <tglx@linuxtronix.de>.
When running cyclictest with break trace (-b) option, wait for skip_sec seconds before issuing the break. Cyclictest may present high latency on the initial cycles, which can be caused by initial resource allocations, and may not represent the expected latency for the running workload. So add skip_sec so user can make cyclictest ignore any break that may happen in those inital sseconds. Signed-off-by: Leonardo Bras <leobras@redhat.com> --- This behavior was found by Sana and me when we were testing cyclictest on an ARM KVM VM, while running trace-cmd at host and guest (with trace-cmd agent). src/cyclictest/cyclictest.c | 11 ++++++++++- src/cyclictest/cyclictest.8 | 2 ++ 2 files changed, 12 insertions(+), 1 deletion(-)