From patchwork Fri Nov 1 20:19:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joseph Salisbury X-Patchwork-Id: 841008 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8A0861E22F7; Fri, 1 Nov 2024 20:20:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.165.32 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730492418; cv=none; b=ABKCB76uWnwsMU1/wnOBZmCCsMfMxcj7HQaNQ+Jr7pwlE2RXAfLr0Py1NMxb/sI7mKO/AZSQsuNoVSmWg2luY5gBkzLIfhI4iNEvghJPtTMXY21YSg4pKK1lCxy8HLBCWRsPydRv2KnU10Lo1vVTF2RJn2VmhAJDelOu60UZZhA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730492418; c=relaxed/simple; bh=AFUpe+H1mNJVfXWSJkjNwzFzjwdPZYCjS/sobTIgr54=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=h0qgPymdWCstkuB9S4Ba0yrTIrvWRtj/eptJEST4qyd+0uO2pkzF2wenDfxR9bPVOq+DHWzSDdJruVfcFwn+ccNwzefkk8jOs5Wpdks7ySiiNyF6hPkiRbGcTinjBiF3Th+cvGZoUOYO/Ayv5NkyHU3i0BAFL0LcNdC3Q68Viqo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com; spf=pass smtp.mailfrom=oracle.com; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b=WYunDx6f; arc=none smtp.client-ip=205.220.165.32 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oracle.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="WYunDx6f" Received: from pps.filterd (m0246627.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 4A1GC6S2004424; Fri, 1 Nov 2024 20:19:32 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=corp-2023-11-20; bh=sQoNs 2Tk7nSHAs0mifMTS7TrsAwKY4GV3yCpcdi0yN8=; b=WYunDx6fsGmzLtpHuq1EZ r0T4Q7AFKu23OHTLBvo1YmqCSjC619w0mViJiUxZei6zb2vb8tGBkdrM7+gaVxAn wHm4uq+BaLaKPvBpihMo/rg56H3lWW7vfjQkw9b2yyhVw12Z19UAFtBPx4thumPR 2KMog28rzo959OIyhsW9x7dI3/BBNULr4iACHVbKolCvE8RfqzQMC+L18BdG+ZaI Q0X1n8470DZcQ4Hei2zfIHMSPnJaJpy8gm+FVZ8a19fiNKG6u7c+ePgNCn0qcjw4 GSdlBCIOPS+SKgP0p/09XP1c4Jx3d1NdDLaIiWPclU5tFJWUNwBjcXBigwIruxPx Q== Received: from iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta02.appoci.oracle.com [147.154.18.20]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 42grc9529d-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 01 Nov 2024 20:19:32 +0000 (GMT) Received: from pps.filterd (iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (8.18.1.2/8.18.1.2) with ESMTP id 4A1I1KHJ035029; Fri, 1 Nov 2024 20:19:30 GMT Received: from pps.reinject (localhost [127.0.0.1]) by iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTPS id 42hndbxf41-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 01 Nov 2024 20:19:30 +0000 Received: from iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 4A1KJOG3009855; Fri, 1 Nov 2024 20:19:30 GMT Received: from jupiter.nl.oracle.com (dhcp-10-175-4-20.vpn.oracle.com [10.175.4.20]) by iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTP id 42hndbxeys-2; Fri, 01 Nov 2024 20:19:27 +0000 From: Joseph Salisbury To: LKML , linux-rt-users , Steven Rostedt , Thomas Gleixner , Carsten Emde , John Kacur , Sebastian Andrzej Siewior , Daniel Wagner , Tom Zanussi , Clark Williams , Pavel Machek , Joseph Salisbury , Joseph Salisbury Cc: Eric Dumazet , Pablo Neira Ayuso Subject: [PATCH RT 1/2] netfilter: nft_counter: Use u64_stats_t for statistic. Date: Fri, 1 Nov 2024 16:19:16 -0400 Message-ID: <20241101201917.2010908-2-joseph.salisbury@oracle.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241101201917.2010908-1-joseph.salisbury@oracle.com> References: <20241101201917.2010908-1-joseph.salisbury@oracle.com> Precedence: bulk X-Mailing-List: linux-rt-users@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1051,Hydra:6.0.680,FMLib:17.12.62.30 definitions=2024-11-01_14,2024-11-01_01,2024-09-30_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 malwarescore=0 spamscore=0 phishscore=0 mlxscore=0 bulkscore=0 suspectscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2409260000 definitions=main-2411010145 X-Proofpoint-GUID: DweYJZjPfEGXHcnc5itnBmWEP6E2Lwr9 X-Proofpoint-ORIG-GUID: DweYJZjPfEGXHcnc5itnBmWEP6E2Lwr9 From: Sebastian Andrzej Siewior v5.15.167-rt80-rc1 stable review patch. If anyone has any objections, please let me know. ----------- The nft_counter uses two s64 counters for statistics. Those two are protected by a seqcount to ensure that the 64bit variable is always properly seen during updates even on 32bit architectures where the store is performed by two writes. A side effect is that the two counter (bytes and packet) are written and read together in the same window. This can be replaced with u64_stats_t. write_seqcount_begin()/ end() is replaced with u64_stats_update_begin()/ end() and behaves the same way as with seqcount_t on 32bit architectures. Additionally there is a preempt_disable on PREEMPT_RT to ensure that a reader does not preempt a writer. On 64bit architectures the macros are removed and the reads happen without any retries. This also means that the reader can observe one counter (bytes) from before the update and the other counter (packets) but that is okay since there is no requirement to have both counter from the same update window. Convert the statistic to u64_stats_t. There is one optimisation: nft_counter_do_init() and nft_counter_clone() allocate a new per-CPU counter and assign a value to it. During this assignment preemption is disabled which is not needed because the counter is not yet exposed to the system so there can not be another writer or reader. Therefore disabling preemption is omitted and raw_cpu_ptr() is used to obtain a pointer to a counter for the assignment. Cc: Eric Dumazet Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Pablo Neira Ayuso (cherry picked from commit 4a1d3acd6ea86075e77fcc1188c3fc372833ba73) [jsalisbury: changed seqcount_init to u64_stats_init in nft_counter_module_init()] Signed-off-by: Joseph Salisbury --- net/netfilter/nft_counter.c | 91 +++++++++++++++++++------------------ 1 file changed, 47 insertions(+), 44 deletions(-) diff --git a/net/netfilter/nft_counter.c b/net/netfilter/nft_counter.c index 30b24d002c3d..e387b6610aaf 100644 --- a/net/netfilter/nft_counter.c +++ b/net/netfilter/nft_counter.c @@ -8,7 +8,7 @@ #include #include #include -#include +#include #include #include #include @@ -16,6 +16,11 @@ #include struct nft_counter { + u64_stats_t bytes; + u64_stats_t packets; +}; + +struct nft_counter_tot { s64 bytes; s64 packets; }; @@ -24,25 +29,24 @@ struct nft_counter_percpu_priv { struct nft_counter __percpu *counter; }; -static DEFINE_PER_CPU(seqcount_t, nft_counter_seq); +static DEFINE_PER_CPU(struct u64_stats_sync, nft_counter_sync); static inline void nft_counter_do_eval(struct nft_counter_percpu_priv *priv, struct nft_regs *regs, const struct nft_pktinfo *pkt) { + struct u64_stats_sync *nft_sync; struct nft_counter *this_cpu; - seqcount_t *myseq; local_bh_disable(); this_cpu = this_cpu_ptr(priv->counter); - myseq = this_cpu_ptr(&nft_counter_seq); + nft_sync = this_cpu_ptr(&nft_counter_sync); - write_seqcount_begin(myseq); + u64_stats_update_begin(nft_sync); + u64_stats_add(&this_cpu->bytes, pkt->skb->len); + u64_stats_inc(&this_cpu->packets); + u64_stats_update_end(nft_sync); - this_cpu->bytes += pkt->skb->len; - this_cpu->packets++; - - write_seqcount_end(myseq); local_bh_enable(); } @@ -65,17 +69,16 @@ static int nft_counter_do_init(const struct nlattr * const tb[], if (cpu_stats == NULL) return -ENOMEM; - preempt_disable(); - this_cpu = this_cpu_ptr(cpu_stats); + this_cpu = raw_cpu_ptr(cpu_stats); if (tb[NFTA_COUNTER_PACKETS]) { - this_cpu->packets = - be64_to_cpu(nla_get_be64(tb[NFTA_COUNTER_PACKETS])); + u64_stats_set(&this_cpu->packets, + be64_to_cpu(nla_get_be64(tb[NFTA_COUNTER_PACKETS]))); } if (tb[NFTA_COUNTER_BYTES]) { - this_cpu->bytes = - be64_to_cpu(nla_get_be64(tb[NFTA_COUNTER_BYTES])); + u64_stats_set(&this_cpu->bytes, + be64_to_cpu(nla_get_be64(tb[NFTA_COUNTER_BYTES]))); } - preempt_enable(); + priv->counter = cpu_stats; return 0; } @@ -103,40 +106,41 @@ static void nft_counter_obj_destroy(const struct nft_ctx *ctx, } static void nft_counter_reset(struct nft_counter_percpu_priv *priv, - struct nft_counter *total) + struct nft_counter_tot *total) { + struct u64_stats_sync *nft_sync; struct nft_counter *this_cpu; - seqcount_t *myseq; local_bh_disable(); this_cpu = this_cpu_ptr(priv->counter); - myseq = this_cpu_ptr(&nft_counter_seq); + nft_sync = this_cpu_ptr(&nft_counter_sync); + + u64_stats_update_begin(nft_sync); + u64_stats_add(&this_cpu->packets, -total->packets); + u64_stats_add(&this_cpu->bytes, -total->bytes); + u64_stats_update_end(nft_sync); - write_seqcount_begin(myseq); - this_cpu->packets -= total->packets; - this_cpu->bytes -= total->bytes; - write_seqcount_end(myseq); local_bh_enable(); } static void nft_counter_fetch(struct nft_counter_percpu_priv *priv, - struct nft_counter *total) + struct nft_counter_tot *total) { struct nft_counter *this_cpu; - const seqcount_t *myseq; u64 bytes, packets; unsigned int seq; int cpu; memset(total, 0, sizeof(*total)); for_each_possible_cpu(cpu) { - myseq = per_cpu_ptr(&nft_counter_seq, cpu); + struct u64_stats_sync *nft_sync = per_cpu_ptr(&nft_counter_sync, cpu); + this_cpu = per_cpu_ptr(priv->counter, cpu); do { - seq = read_seqcount_begin(myseq); - bytes = this_cpu->bytes; - packets = this_cpu->packets; - } while (read_seqcount_retry(myseq, seq)); + seq = u64_stats_fetch_begin(nft_sync); + bytes = u64_stats_read(&this_cpu->bytes); + packets = u64_stats_read(&this_cpu->packets); + } while (u64_stats_fetch_retry(nft_sync, seq)); total->bytes += bytes; total->packets += packets; @@ -147,7 +151,7 @@ static int nft_counter_do_dump(struct sk_buff *skb, struct nft_counter_percpu_priv *priv, bool reset) { - struct nft_counter total; + struct nft_counter_tot total; nft_counter_fetch(priv, &total); @@ -236,7 +240,7 @@ static int nft_counter_clone(struct nft_expr *dst, const struct nft_expr *src, g struct nft_counter_percpu_priv *priv_clone = nft_expr_priv(dst); struct nft_counter __percpu *cpu_stats; struct nft_counter *this_cpu; - struct nft_counter total; + struct nft_counter_tot total; nft_counter_fetch(priv, &total); @@ -244,11 +248,9 @@ static int nft_counter_clone(struct nft_expr *dst, const struct nft_expr *src, g if (cpu_stats == NULL) return -ENOMEM; - preempt_disable(); - this_cpu = this_cpu_ptr(cpu_stats); - this_cpu->packets = total.packets; - this_cpu->bytes = total.bytes; - preempt_enable(); + this_cpu = raw_cpu_ptr(cpu_stats); + u64_stats_set(&this_cpu->packets, total.packets); + u64_stats_set(&this_cpu->bytes, total.bytes); priv_clone->counter = cpu_stats; return 0; @@ -266,17 +268,17 @@ static void nft_counter_offload_stats(struct nft_expr *expr, const struct flow_stats *stats) { struct nft_counter_percpu_priv *priv = nft_expr_priv(expr); + struct u64_stats_sync *nft_sync; struct nft_counter *this_cpu; - seqcount_t *myseq; local_bh_disable(); this_cpu = this_cpu_ptr(priv->counter); - myseq = this_cpu_ptr(&nft_counter_seq); + nft_sync = this_cpu_ptr(&nft_counter_sync); - write_seqcount_begin(myseq); - this_cpu->packets += stats->pkts; - this_cpu->bytes += stats->bytes; - write_seqcount_end(myseq); + u64_stats_update_begin(nft_sync); + u64_stats_add(&this_cpu->packets, stats->pkts); + u64_stats_add(&this_cpu->bytes, stats->bytes); + u64_stats_update_end(nft_sync); local_bh_enable(); } @@ -308,7 +310,8 @@ static int __init nft_counter_module_init(void) int cpu, err; for_each_possible_cpu(cpu) - seqcount_init(per_cpu_ptr(&nft_counter_seq, cpu)); + u64_stats_init(per_cpu_ptr(&nft_counter_sync, cpu)); + err = nft_register_obj(&nft_counter_obj_type); if (err < 0)