Message ID | 20190625211520.43181-4-honnappa.nagarahalli@arm.com |
---|---|
State | New |
Headers | show |
Series | lib/hash: perf improvements for lock-free | expand |
>-----Original Message----- >From: Honnappa Nagarahalli [mailto:honnappa.nagarahalli@arm.com] >Sent: Tuesday, June 25, 2019 2:15 PM >To: Wang, Yipeng1 <yipeng1.wang@intel.com>; Gobriel, Sameh <sameh.gobriel@intel.com>; Richardson, Bruce ><bruce.richardson@intel.com>; De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>; honnappa.nagarahalli@arm.com >Cc: gavin.hu@arm.com; ruifeng.wang@arm.com; dev@dpdk.org; nd@arm.com; stable@dpdk.org >Subject: [PATCH 3/3] lib/hash: adjust tbl_chng_cnt position > >tbl_chng_cnt is one of the first elements of the structure used in >the lookup. Move it to the beginning of the cache line to gain >performance. > >Fixes: e605a1d36 ("hash: add lock-free r/w concurrency") >Cc: stable@dpdk.org > >Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> >Reviewed-by: Gavin Hu <gavin.hu@arm.com> >Tested-by: Ruifeng Wang <ruifeng.wang@arm.com> >--- > lib/librte_hash/rte_cuckoo_hash.h | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > >diff --git a/lib/librte_hash/rte_cuckoo_hash.h b/lib/librte_hash/rte_cuckoo_hash.h >index fb19bb27d..af6451b5c 100644 >--- a/lib/librte_hash/rte_cuckoo_hash.h >+++ b/lib/librte_hash/rte_cuckoo_hash.h >@@ -170,7 +170,9 @@ struct rte_hash { > > /* Fields used in lookup */ > >- uint32_t key_len __rte_cache_aligned; >+ uint32_t *tbl_chng_cnt __rte_cache_aligned; >+ /**< Indicates if the hash table changed from last read. */ >+ uint32_t key_len; > /**< Length of hash key. */ > uint8_t hw_trans_mem_support; > /**< If hardware transactional memory is used. */ >@@ -218,8 +220,6 @@ struct rte_hash { > * is piggy-backed to freeing of the key index. > */ > uint32_t *ext_bkt_to_free; >- uint32_t *tbl_chng_cnt; >- /**< Indicates if the hash table changed from last read. */ > } __rte_cache_aligned; > > struct queue_node { >-- >2.17.1 [Wang, Yipeng] I am not sure about this change. By moving counter to front, I think you seems push key_store out of the cache line. And key_store Is also used in lookup (and more commonly). My tests also show perf drop in many cases.
<snip> > > > >tbl_chng_cnt is one of the first elements of the structure used in the > >lookup. Move it to the beginning of the cache line to gain performance. > > > >Fixes: e605a1d36 ("hash: add lock-free r/w concurrency") > >Cc: stable@dpdk.org > > > >Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> > >Reviewed-by: Gavin Hu <gavin.hu@arm.com> > >Tested-by: Ruifeng Wang <ruifeng.wang@arm.com> > >--- > > lib/librte_hash/rte_cuckoo_hash.h | 6 +++--- > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > >diff --git a/lib/librte_hash/rte_cuckoo_hash.h > >b/lib/librte_hash/rte_cuckoo_hash.h > >index fb19bb27d..af6451b5c 100644 > >--- a/lib/librte_hash/rte_cuckoo_hash.h > >+++ b/lib/librte_hash/rte_cuckoo_hash.h > >@@ -170,7 +170,9 @@ struct rte_hash { > > > > /* Fields used in lookup */ > > > >- uint32_t key_len __rte_cache_aligned; > >+ uint32_t *tbl_chng_cnt __rte_cache_aligned; > >+ /**< Indicates if the hash table changed from last read. */ > >+ uint32_t key_len; > > /**< Length of hash key. */ > > uint8_t hw_trans_mem_support; > > /**< If hardware transactional memory is used. */ @@ -218,8 +220,6 > @@ > >struct rte_hash { > > * is piggy-backed to freeing of the key index. > > */ > > uint32_t *ext_bkt_to_free; > >- uint32_t *tbl_chng_cnt; > >- /**< Indicates if the hash table changed from last read. */ > > } __rte_cache_aligned; > > > > struct queue_node { > >-- > >2.17.1 > > [Wang, Yipeng] > I am not sure about this change. By moving counter to front, I think you > seems push key_store out of the cache line. And key_store Is also used in > lookup (and more commonly). > My tests also show perf drop in many cases. I ran hash_readwrite_lf tests and L3 fwd application. Both of them showed improvements for both lock-free and using locks for Arm platforms (L3 fwd was not run on x86). Which tests are resulting in performance drops for you? But, I do agree that this work is not complete. We can drop this patch and take this up separately.
>-----Original Message----- >From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com] >Sent: Tuesday, July 2, 2019 10:23 AM >To: Wang, Yipeng1 <yipeng1.wang@intel.com>; Gobriel, Sameh <sameh.gobriel@intel.com>; Richardson, Bruce ><bruce.richardson@intel.com>; De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com> >Cc: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>; Ruifeng Wang (Arm Technology China) <Ruifeng.Wang@arm.com>; >dev@dpdk.org; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>; stable@dpdk.org; nd <nd@arm.com> >Subject: RE: [PATCH 3/3] lib/hash: adjust tbl_chng_cnt position > ><snip> > >> > >> >tbl_chng_cnt is one of the first elements of the structure used in the >> >lookup. Move it to the beginning of the cache line to gain performance. >> > >> >Fixes: e605a1d36 ("hash: add lock-free r/w concurrency") >> >Cc: stable@dpdk.org >> > >> >Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> >> >Reviewed-by: Gavin Hu <gavin.hu@arm.com> >> >Tested-by: Ruifeng Wang <ruifeng.wang@arm.com> >> >--- >> > lib/librte_hash/rte_cuckoo_hash.h | 6 +++--- >> > 1 file changed, 3 insertions(+), 3 deletions(-) >> > >> >diff --git a/lib/librte_hash/rte_cuckoo_hash.h >> >b/lib/librte_hash/rte_cuckoo_hash.h >> >index fb19bb27d..af6451b5c 100644 >> >--- a/lib/librte_hash/rte_cuckoo_hash.h >> >+++ b/lib/librte_hash/rte_cuckoo_hash.h >> >@@ -170,7 +170,9 @@ struct rte_hash { >> > >> > /* Fields used in lookup */ >> > >> >- uint32_t key_len __rte_cache_aligned; >> >+ uint32_t *tbl_chng_cnt __rte_cache_aligned; >> >+ /**< Indicates if the hash table changed from last read. */ >> >+ uint32_t key_len; >> > /**< Length of hash key. */ >> > uint8_t hw_trans_mem_support; >> > /**< If hardware transactional memory is used. */ @@ -218,8 +220,6 >> @@ >> >struct rte_hash { >> > * is piggy-backed to freeing of the key index. >> > */ >> > uint32_t *ext_bkt_to_free; >> >- uint32_t *tbl_chng_cnt; >> >- /**< Indicates if the hash table changed from last read. */ >> > } __rte_cache_aligned; >> > >> > struct queue_node { >> >-- >> >2.17.1 >> >> [Wang, Yipeng] >> I am not sure about this change. By moving counter to front, I think you >> seems push key_store out of the cache line. And key_store Is also used in >> lookup (and more commonly). >> My tests also show perf drop in many cases. >I ran hash_readwrite_lf tests and L3 fwd application. Both of them showed improvements for both lock-free and using locks for Arm >platforms (L3 fwd was not run on x86). Which tests are resulting in performance drops for you? > >But, I do agree that this work is not complete. We can drop this patch and take this up separately. [Wang, Yipeng] I run the LF test on x86. For 1 reader case it seems a clear improvement for lookup without key-shifts. Other results are a mixed bag even for lock-free. For most HTM it is a drop. My opinion is to delay similar fine tuning effort tied to architecture specs.
diff --git a/lib/librte_hash/rte_cuckoo_hash.h b/lib/librte_hash/rte_cuckoo_hash.h index fb19bb27d..af6451b5c 100644 --- a/lib/librte_hash/rte_cuckoo_hash.h +++ b/lib/librte_hash/rte_cuckoo_hash.h @@ -170,7 +170,9 @@ struct rte_hash { /* Fields used in lookup */ - uint32_t key_len __rte_cache_aligned; + uint32_t *tbl_chng_cnt __rte_cache_aligned; + /**< Indicates if the hash table changed from last read. */ + uint32_t key_len; /**< Length of hash key. */ uint8_t hw_trans_mem_support; /**< If hardware transactional memory is used. */ @@ -218,8 +220,6 @@ struct rte_hash { * is piggy-backed to freeing of the key index. */ uint32_t *ext_bkt_to_free; - uint32_t *tbl_chng_cnt; - /**< Indicates if the hash table changed from last read. */ } __rte_cache_aligned; struct queue_node {