Message ID | 20230719-mcrc-upstream-v2-0-4152b987e4c2@ti.com |
---|---|
Headers | show |
Series | Add support for Texas Instruments MCRC64 engine | expand |
On 00:58-20230811, Kamlesh Gurudasani wrote: > K3 devices include MCRC64 engine for crc64 calculation. > Enable module to be built for K3 devices. > > Also enable algif_hash module, which is needed to access MCRC64 module > from userspace. > > Signed-off-by: Kamlesh Gurudasani <kamlesh@ti.com> > --- There are few things to improve in this series, but we can discuss this as part of defconfig merge See thread: https://lore.kernel.org/linux-arm-kernel/ae2ad056-96de-41b7-8df4-1d9c0f5c469b@app.fastmail.com/ for additional info. K3 devices is too broad, you want to specify specific boards that will benefit out of this. I suggest to keep this as "DONOTMERGE" to indicate this should'nt go via subsystem maintainer tree (most maintainers are aware of it, but explicitly calling it out helps keep things sane) > arch/arm64/configs/defconfig | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig > index bf13d5c46578..4d555a125315 100644 > --- a/arch/arm64/configs/defconfig > +++ b/arch/arm64/configs/defconfig > @@ -1535,6 +1535,7 @@ CONFIG_CRYPTO_TEST=m > CONFIG_CRYPTO_ECHAINIV=y > CONFIG_CRYPTO_MICHAEL_MIC=m > CONFIG_CRYPTO_ANSI_CPRNG=y > +CONFIG_CRYPTO_USER_API_HASH=m > CONFIG_CRYPTO_USER_API_RNG=m > CONFIG_CRYPTO_CHACHA20_NEON=m > CONFIG_CRYPTO_GHASH_ARM64_CE=y > @@ -1558,6 +1559,7 @@ CONFIG_CRYPTO_DEV_HISI_ZIP=m > CONFIG_CRYPTO_DEV_HISI_HPRE=m > CONFIG_CRYPTO_DEV_HISI_TRNG=m > CONFIG_CRYPTO_DEV_SA2UL=m > +CONFIG_CRYPTO_DEV_TI_MCRC64=m > CONFIG_DMA_RESTRICTED_POOL=y > CONFIG_CMA_SIZE_MBYTES=32 > CONFIG_PRINTK_TIME=y > > -- > 2.34.1 >
On Fri, Aug 11, 2023 at 12:58:50AM +0530, Kamlesh Gurudasani wrote: > Add binding for Texas Instruments MCRC64 > > MCRC64 engine calculates 64-bit cyclic redundancy checks (CRC) > according to the ISO 3309 standard. > > The ISO 3309 64-bit CRC model parameters are as follows: > Generator Polynomial: x^64 + x^4 + x^3 + x + 1 > Polynomial Value: 0x000000000000001B > Initial value: 0x0000000000000000 > Reflected Input: False > Reflected Output: False > Xor Final: 0x0000000000000000 > > Signed-off-by: Kamlesh Gurudasani <kamlesh@ti.com> > --- > Documentation/devicetree/bindings/crypto/ti,mcrc64.yaml | 47 +++++++++++++++++++++++++++++++++++++++++++++++ > MAINTAINERS | 5 +++++ > 2 files changed, 52 insertions(+) > > diff --git a/Documentation/devicetree/bindings/crypto/ti,mcrc64.yaml b/Documentation/devicetree/bindings/crypto/ti,mcrc64.yaml > new file mode 100644 > index 000000000000..38bc7efebd68 > --- /dev/null > +++ b/Documentation/devicetree/bindings/crypto/ti,mcrc64.yaml > @@ -0,0 +1,47 @@ > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) > +%YAML 1.2 > +--- > +$id: http://devicetree.org/schemas/crypto/ti,mcrc64.yaml# > +$schema: http://devicetree.org/meta-schemas/core.yaml# > + > +title: Texas Instruments MCRC64 > + > +description: The MCRC64 engine calculates 64-bit cyclic redundancy checks A newline after "description" please. > + (CRC) according to the ISO 3309 standard. > + > +maintainers: > + - Kamlesh Gurudasani <kamlesh@ti.com> > + > +properties: > + compatible: > + const: ti,am62-mcrc64 Is the am62 an SoC or a family of SoCs? I googled a wee bit for am62 & there seems to be an am625 and an am623? Otherwise, this looks good to me. > + > + reg: > + maxItems: 1 > + > + clocks: > + maxItems: 1 > + > + power-domains: > + maxItems: 1 > + > +required: > + - compatible > + - reg > + - clocks > + - power-domains > + > +additionalProperties: false > + > +examples: > + - | > + #include <dt-bindings/soc/ti,sci_pm_domain.h> > + > + crc@30300000 { > + compatible = "ti,am62-mcrc64"; > + reg = <0x30300000 0x1000>; > + clocks = <&k3_clks 116 0>; > + power-domains = <&k3_pds 116 TI_SCI_PD_EXCLUSIVE>; > + }; > + > +... > diff --git a/MAINTAINERS b/MAINTAINERS > index 02a3192195af..66b51f43d196 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -21481,6 +21481,11 @@ S: Maintained > F: Documentation/devicetree/bindings/iio/adc/ti,lmp92064.yaml > F: drivers/iio/adc/ti-lmp92064.c > > +TI MEMORY CYCLIC REDUNDANCY CHECK (MCRC64) DRIVER > +M: Kamlesh Gurudasani <kamlesh@ti.com> > +S: Maintained > +F: Documentation/devicetree/bindings/crypto/ti,mcrc64.yaml > + > TI PCM3060 ASoC CODEC DRIVER > M: Kirill Marinushkin <kmarinushkin@birdec.com> > L: alsa-devel@alsa-project.org (moderated for non-subscribers) > > -- > 2.34.1 >
On Fri, Aug 11, 2023 at 04:34:33PM +0100, Conor Dooley wrote: > On Fri, Aug 11, 2023 at 12:58:50AM +0530, Kamlesh Gurudasani wrote: > > Add binding for Texas Instruments MCRC64 > > > > MCRC64 engine calculates 64-bit cyclic redundancy checks (CRC) > > according to the ISO 3309 standard. > > > > The ISO 3309 64-bit CRC model parameters are as follows: > > Generator Polynomial: x^64 + x^4 + x^3 + x + 1 > > Polynomial Value: 0x000000000000001B > > Initial value: 0x0000000000000000 > > Reflected Input: False > > Reflected Output: False > > Xor Final: 0x0000000000000000 > > > > Signed-off-by: Kamlesh Gurudasani <kamlesh@ti.com> > > --- > > Documentation/devicetree/bindings/crypto/ti,mcrc64.yaml | 47 +++++++++++++++++++++++++++++++++++++++++++++++ > > MAINTAINERS | 5 +++++ > > 2 files changed, 52 insertions(+) > > > > diff --git a/Documentation/devicetree/bindings/crypto/ti,mcrc64.yaml b/Documentation/devicetree/bindings/crypto/ti,mcrc64.yaml > > new file mode 100644 > > index 000000000000..38bc7efebd68 > > --- /dev/null > > +++ b/Documentation/devicetree/bindings/crypto/ti,mcrc64.yaml > > @@ -0,0 +1,47 @@ > > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) > > +%YAML 1.2 > > +--- > > +$id: http://devicetree.org/schemas/crypto/ti,mcrc64.yaml# > > +$schema: http://devicetree.org/meta-schemas/core.yaml# > > + > > +title: Texas Instruments MCRC64 > > + > > +description: The MCRC64 engine calculates 64-bit cyclic redundancy checks > > A newline after "description" please. > > > + (CRC) according to the ISO 3309 standard. > > + > > +maintainers: > > + - Kamlesh Gurudasani <kamlesh@ti.com> > > + > > +properties: > > + compatible: > > + const: ti,am62-mcrc64 > > Is the am62 an SoC or a family of SoCs? I googled a wee bit for am62 & > there seems to be an am625 and an am623? Or is it an am62p5, in which case the compatible should contain ti,am62p5 I suppose. Sorry for my confusion here, its not really clear me too since I've been seeing many different-but-similar product names the last few days. Thanks, Conor. > > Otherwise, this looks good to me. > > > + > > + reg: > > + maxItems: 1 > > + > > + clocks: > > + maxItems: 1 > > + > > + power-domains: > > + maxItems: 1 > > + > > +required: > > + - compatible > > + - reg > > + - clocks > > + - power-domains > > + > > +additionalProperties: false > > + > > +examples: > > + - | > > + #include <dt-bindings/soc/ti,sci_pm_domain.h> > > + > > + crc@30300000 { > > + compatible = "ti,am62-mcrc64"; > > + reg = <0x30300000 0x1000>; > > + clocks = <&k3_clks 116 0>; > > + power-domains = <&k3_pds 116 TI_SCI_PD_EXCLUSIVE>; > > + }; > > + > > +... > > diff --git a/MAINTAINERS b/MAINTAINERS > > index 02a3192195af..66b51f43d196 100644 > > --- a/MAINTAINERS > > +++ b/MAINTAINERS > > @@ -21481,6 +21481,11 @@ S: Maintained > > F: Documentation/devicetree/bindings/iio/adc/ti,lmp92064.yaml > > F: drivers/iio/adc/ti-lmp92064.c > > > > +TI MEMORY CYCLIC REDUNDANCY CHECK (MCRC64) DRIVER > > +M: Kamlesh Gurudasani <kamlesh@ti.com> > > +S: Maintained > > +F: Documentation/devicetree/bindings/crypto/ti,mcrc64.yaml > > + > > TI PCM3060 ASoC CODEC DRIVER > > M: Kirill Marinushkin <kmarinushkin@birdec.com> > > L: alsa-devel@alsa-project.org (moderated for non-subscribers) > > > > -- > > 2.34.1 > >
On Fri, Aug 11, 2023 at 12:58:47AM +0530, Kamlesh Gurudasani wrote: > Add support for MCRC64 engine to calculate 64-bit CRC in Full-CPU mode > > MCRC64 engine calculates 64-bit cyclic redundancy checks (CRC) > according to the ISO 3309 standard. > > The ISO 3309 64-bit CRC model parameters are as follows: > Generator Polynomial: x^64 + x^4 + x^3 + x + 1 > Polynomial Value: 0x000000000000001B > Initial value: 0x0000000000000000 > Reflected Input: False > Reflected Output: False > Xor Final: 0x0000000000000000 > > Tested with > CONFIG_CRYPTO_MANAGER_DISABLE_TESTS is not set > CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y > > and tcrypt, > sudo modprobe tcrypt mode=329 sec=1 > > User space application implemented using algif_hash, > https://gist.github.com/ti-kamlesh/73abfcc1a33318bb3b199d36b6209e59 > > Signed-off-by: Kamlesh Gurudasani <kamlesh@ti.com> I do not see any in-kernel user of this CRC variant being introduced, which leaves algif_hash as the only use case. Can you elaborate on the benefit this brings to your application? Yes, it allows you to use your hardware CRC engine. But, that comes with all the overhead from the syscalls, algif_hash, and the driver. How does performance compare to a properly optimized software CRC implementation on your platform, i.e. an implementation using carryless multiplication instructions (e.g. ARMv8 CE) if available on your platform, otherwise an implementation using the slice-by-8 or slice-by-16 method? - Eric
Eric Biggers <ebiggers@kernel.org> writes: > On Fri, Aug 11, 2023 at 12:58:47AM +0530, Kamlesh Gurudasani wrote: >> Add support for MCRC64 engine to calculate 64-bit CRC in Full-CPU mode >> >> MCRC64 engine calculates 64-bit cyclic redundancy checks (CRC) >> according to the ISO 3309 standard. >> >> The ISO 3309 64-bit CRC model parameters are as follows: >> Generator Polynomial: x^64 + x^4 + x^3 + x + 1 >> Polynomial Value: 0x000000000000001B >> Initial value: 0x0000000000000000 >> Reflected Input: False >> Reflected Output: False >> Xor Final: 0x0000000000000000 >> >> Tested with >> CONFIG_CRYPTO_MANAGER_DISABLE_TESTS is not set >> CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y >> >> and tcrypt, >> sudo modprobe tcrypt mode=329 sec=1 >> >> User space application implemented using algif_hash, >> https://gist.github.com/ti-kamlesh/73abfcc1a33318bb3b199d36b6209e59 >> >> Signed-off-by: Kamlesh Gurudasani <kamlesh@ti.com> > > I do not see any in-kernel user of this CRC variant being introduced, which > leaves algif_hash as the only use case. > > Can you elaborate on the benefit this brings to your application? Yes, it > allows you to use your hardware CRC engine. But, that comes with all the > overhead from the syscalls, algif_hash, and the driver. How does performance > compare to a properly optimized software CRC implementation on your platform, > i.e. an implementation using carryless multiplication instructions (e.g. ARMv8 > CE) if available on your platform, otherwise an implementation using the > slice-by-8 or slice-by-16 method? > > - Eric Hi Eric, We are more interested in offload than performance, with splice system call and DMA mode in driver(will be implemented after this series gets merged), good amount of cpu cycles will be saved. There is one more mode(auto mode) in mcrc64 which helps to verify crc64 values against pre calculated crc64, saving the efforts of comparing in userspace. Current generic implementation of crc64-iso(part of this series) gives 173 Mb/s of speed as opposed to mcrc64 which gives speed of 812 Mb/s when tested with tcrypt. Regard, Kamlesh
On Fri, Aug 18, 2023 at 02:36:34PM +0530, Kamlesh Gurudasani wrote: > Hi Eric, > > We are more interested in offload than performance, with splice system > call and DMA mode in driver(will be implemented after this series gets > merged), good amount of cpu cycles will be saved. So it's for power usage, then? Or freeing up CPU for other tasks? > There is one more mode(auto mode) in mcrc64 which helps to verify crc64 > values against pre calculated crc64, saving the efforts of comparing in > userspace. Is there any path forward to actually support this? > > Current generic implementation of crc64-iso(part of this series) > gives 173 Mb/s of speed as opposed to mcrc64 which gives speed of 812 > Mb/s when tested with tcrypt. This doesn't answer my question, which to reiterate was: How does performance compare to a properly optimized software CRC implementation on your platform, i.e. an implementation using carryless multiplication instructions (e.g. ARMv8 CE) if available on your platform, otherwise an implementation using the slice-by-8 or slice-by-16 method? The implementation you tested was slice-by-1. Compared to that, it's common for slice-by-8 to speed up CRCs by about 4 times and for folding with carryless multiplication to speed up CRCs by 10-30 times, sometimes limited only by memory bandwidth. I don't know what specific results you would get on your specific CPU and for this specific CRC, and you could certainly see something different if you e.g. have some low-end embedded CPU. But those are the typical results I've seen for other CRCs on different CPUs. So, a software implementation may be more attractive than you realize. It could very well be the case that a PMULL based CRC implementation actually ends up with less CPU load than your "hardware offload", when taking into syscall, algif_hash, and driver overhead... - Eric
Eric Biggers <ebiggers@kernel.org> writes: > On Fri, Aug 18, 2023 at 02:36:34PM +0530, Kamlesh Gurudasani wrote: >> Hi Eric, >> >> We are more interested in offload than performance, with splice system >> call and DMA mode in driver(will be implemented after this series gets >> merged), good amount of cpu cycles will be saved. > > So it's for power usage, then? Or freeing up CPU for other tasks? It is for freeing CPU for other tasks > >> There is one more mode(auto mode) in mcrc64 which helps to verify crc64 >> values against pre calculated crc64, saving the efforts of comparing in >> userspace. > > Is there any path forward to actually support this? > >> >> Current generic implementation of crc64-iso(part of this series) >> gives 173 Mb/s of speed as opposed to mcrc64 which gives speed of 812 >> Mb/s when tested with tcrypt. > > This doesn't answer my question, which to reiterate was: > > How does performance compare to a properly optimized software CRC > implementation on your platform, i.e. an implementation using carryless > multiplication instructions (e.g. ARMv8 CE) if available on your platform, > otherwise an implementation using the slice-by-8 or slice-by-16 method? > > The implementation you tested was slice-by-1. Compared to that, it's common for > slice-by-8 to speed up CRCs by about 4 times and for folding with carryless > multiplication to speed up CRCs by 10-30 times, sometimes limited only by memory > bandwidth. I don't know what specific results you would get on your specific > CPU and for this specific CRC, and you could certainly see something different > if you e.g. have some low-end embedded CPU. But those are the typical results > I've seen for other CRCs on different CPUs. So, a software implementation may > be more attractive than you realize. It could very well be the case that a > PMULL based CRC implementation actually ends up with less CPU load than your > "hardware offload", when taking into syscall, algif_hash, and driver overhead... > > - Eric Hi Eric, thanks for your detailed and valuable inputs. As per your suggestion, we did some profiling. Use case is to calculate crc32/crc64 for file input from user space. Instead of directly implementing PMULL based CRC64, we made first comparison between Case 1. CRC32 (splice() + kernel space SW driver) https://gist.github.com/ti-kamlesh/5be75dbde292e122135ddf795fad9f21 Case 2. CRC32(mmap() + userspace armv8 crc32 instruction implementation) (tried read() as well to get contents of file, but that lost to mmap() so not mentioning number here) https://gist.github.com/ti-kamlesh/002df094dd522422c6cb62069e15c40d Case 3. CRC64 (splice() + MCRC64 HW) https://gist.github.com/ti-kamlesh/98b1fc36c9a7c3defcc2dced4136b8a0 Overall, overhead of userspace + af_alg + driver in (Case 1) and ( Case 3) is ~0.025s, which is constant for any file size. This is calculated using real time to calculate crc - driver time (time spend inside init() + update() +final()) = overhead ~0.025s +-------------------+-----------------------------+-----------------------+------------------------+------------------------+ | | | | | | | File size | 120mb(ideal size for us) | 20mb | 15mb | 5mb | +===================+=============================+=======================+========================+========================+ | | | | | | | CRC32 (Case 1) | Driver time 0.155s | Driver time 0.0325s | Driver time 0.019s | Driver time 0.0062s | | | real time 0.18s | real time 0.06s | real time 0.04s | real time 0.03s | | | overhead 0.025s | overhead 0.025s | overhead 0.021s | overhead ~0.023s | +-------------------+-----------------------------+-----------------------+------------------------+------------------------+ | | | | | | | CRC32 (Case 2) | Real time 0.30s | Real time 0.05s | Real time 0.04s | Real time 0.02s | +-------------------+-----------------------------+-----------------------+------------------------+------------------------+ | | | | | | | CRC64 (Case 3) | Driver time 0.385s | Driver time 0.0665s | Driver time 0.0515s | Driver time 0.019s | | | real time 0.41s | real time 0.09s | real time 0.08s | real time 0.04s | | | overhead 0.025s | overhead 0.025s | overhead ~0.025s | overhead ~0.021s | +-------------------+-----------------------------+-----------------------+------------------------+------------------------+ Here, if we consider similar numbers for crc64 PMULL implementation as crc32 (case 2) , we save good number of cpu cycles using mcrc64 in case of files bigger than 5-10mb as most of the time is being spent in HW offload. Regards, Kamlesh
Eric Biggers <ebiggers@kernel.org> writes: > On Fri, Aug 18, 2023 at 02:36:34PM +0530, Kamlesh Gurudasani wrote: >> Hi Eric, >> >> We are more interested in offload than performance, with splice system >> call and DMA mode in driver(will be implemented after this series gets >> merged), good amount of cpu cycles will be saved. > > So it's for power usage, then? Or freeing up CPU for other tasks? > It's for freeing up CPU for other tasks >> There is one more mode(auto mode) in mcrc64 which helps to verify crc64 >> values against pre calculated crc64, saving the efforts of comparing in >> userspace. > > Is there any path forward to actually support this? > >> >> Current generic implementation of crc64-iso(part of this series) >> gives 173 Mb/s of speed as opposed to mcrc64 which gives speed of 812 >> Mb/s when tested with tcrypt. > > This doesn't answer my question, which to reiterate was: > > How does performance compare to a properly optimized software CRC > implementation on your platform, i.e. an implementation using carryless > multiplication instructions (e.g. ARMv8 CE) if available on your platform, > otherwise an implementation using the slice-by-8 or slice-by-16 method? > > The implementation you tested was slice-by-1. Compared to that, it's common for > slice-by-8 to speed up CRCs by about 4 times and for folding with carryless > multiplication to speed up CRCs by 10-30 times, sometimes limited only by memory > bandwidth. I don't know what specific results you would get on your specific > CPU and for this specific CRC, and you could certainly see something different > if you e.g. have some low-end embedded CPU. But those are the typical results > I've seen for other CRCs on different CPUs. So, a software implementation may > be more attractive than you realize. It could very well be the case that a > PMULL based CRC implementation actually ends up with less CPU load than your > "hardware offload", when taking into syscall, algif_hash, and driver overhead... > > - Eric Hi Eric, thanks for your detailed and valuable inputs. As per your suggestion, we did some profiling. Use case is to calculate crc32/crc64 for file input from user space. Instead of directly implementing PMULL based CRC64, we made first comparison between Case 1. CRC32 (splice() + kernel space SW driver) https://gist.github.com/ti-kamlesh/5be75dbde292e122135ddf795fad9f21 Case 2. CRC32(mmap() + userspace armv8 crc32 instruction implementation) (tried read() as well to get contents of file, but that lost to mmap() so not mentioning number here) https://gist.github.com/ti-kamlesh/002df094dd522422c6cb62069e15c40d Case 3. CRC64 (splice() + MCRC64 HW) https://gist.github.com/ti-kamlesh/98b1fc36c9a7c3defcc2dced4136b8a0 Overall, overhead of userspace + af_alg + driver in (Case 1) and ( Case 3) is ~0.025s, which is constant for any file size. This is calculated using real time to calculate crc - driver time (time spend inside init() + update() +final()) = overhead ~0.025s Here, if we consider similar numbers for crc64 PMULL implementation as crc32 (case 2) , we save good number of cpu cycles using mcrc64 in case of files bigger than 5-10mb as most of the time is being spent in HW offload. +-------------------+-----------------------------+-----------------------+------------------------+------------------------+ | | | | | | | File size | 120mb(ideal size for us) | 20mb | 15mb | 5mb | +===================+=============================+=======================+========================+========================+ | | | | | | | CRC32 (Case 1) | Driver time 0.155s | Driver time 0.0325s | Driver time 0.019s | Driver time 0.0062s | | | real time 0.18s | real time 0.06s | real time 0.04s | real time 0.03s | | | overhead 0.025s | overhead 0.025s | overhead 0.021s | overhead ~0.023s | +-------------------+-----------------------------+-----------------------+------------------------+------------------------+ | | | | | | | CRC32 (Case 2) | Real time 0.30s | Real time 0.05s | Real time 0.04s | Real time 0.02s | +-------------------+-----------------------------+-----------------------+------------------------+------------------------+ | | | | | | | CRC64 (Case 3) | Driver time 0.385s | Driver time 0.0665s | Driver time 0.0515s | Driver time 0.019s | | | real time 0.41s | real time 0.09s | real time 0.08s | real time 0.04s | | | overhead 0.025s | overhead 0.025s | overhead ~0.025s | overhead ~0.021s | +-------------------+-----------------------------+-----------------------+------------------------+------------------------+
Eric Biggers <ebiggers@kernel.org> writes: > On Fri, Aug 18, 2023 at 02:36:34PM +0530, Kamlesh Gurudasani wrote: >> Hi Eric, >> >> We are more interested in offload than performance, with splice system >> call and DMA mode in driver(will be implemented after this series gets >> merged), good amount of cpu cycles will be saved. > > So it's for power usage, then? Or freeing up CPU for other tasks? > It's for freeing CPU fpr other tasks >> There is one more mode(auto mode) in mcrc64 which helps to verify crc64 >> values against pre calculated crc64, saving the efforts of comparing in >> userspace. > > Is there any path forward to actually support this? > >> >> Current generic implementation of crc64-iso(part of this series) >> gives 173 Mb/s of speed as opposed to mcrc64 which gives speed of 812 >> Mb/s when tested with tcrypt. > > This doesn't answer my question, which to reiterate was: > > How does performance compare to a properly optimized software CRC > implementation on your platform, i.e. an implementation using carryless > multiplication instructions (e.g. ARMv8 CE) if available on your platform, > otherwise an implementation using the slice-by-8 or slice-by-16 method? > > The implementation you tested was slice-by-1. Compared to that, it's common for > slice-by-8 to speed up CRCs by about 4 times and for folding with carryless > multiplication to speed up CRCs by 10-30 times, sometimes limited only by memory > bandwidth. I don't know what specific results you would get on your specific > CPU and for this specific CRC, and you could certainly see something different > if you e.g. have some low-end embedded CPU. But those are the typical results > I've seen for other CRCs on different CPUs. So, a software implementation may > be more attractive than you realize. It could very well be the case that a > PMULL based CRC implementation actually ends up with less CPU load than your > "hardware offload", when taking into syscall, algif_hash, and driver overhead... > > - Eric Hi Eric, thanks for your detailed and valuable inputs. As per your suggestion, we did some profiling. Use case is to calculate crc32/crc64 for file input from user space. Instead of directly implementing PMULL based CRC64, we made first comparison between Case 1. CRC32 (splice() + kernel space SW driver) https://gist.github.com/ti-kamlesh/5be75dbde292e122135ddf795fad9f21 Case 2. CRC32(mmap() + userspace armv8 crc32 instruction implementation) (tried read() as well to get contents of file, but that lost to mmap() so not mentioning number here) https://gist.github.com/ti-kamlesh/002df094dd522422c6cb62069e15c40d Case 3. CRC64 (splice() + MCRC64 HW) https://gist.github.com/ti-kamlesh/98b1fc36c9a7c3defcc2dced4136b8a0 Overall, overhead of userspace + af_alg + driver in (Case 1) and ( Case 3) is ~0.025s, which is constant for any file size. This is calculated using real time to calculate crc - driver time (time spend inside init() + update() +final()) = overhead ~0.025s Here, if we consider similar numbers for crc64 PMULL implementation as crc32 (case 2) , we save good number of cpu cycles using mcrc64 in case of files bigger than 5-10mb as most of the time is being spent in HW offload. âââââââââââââââââââââ¦ââââââââââââââââââââââââââââââ¦ââââââââââââââââââââââââ¦âââââââââââââââââââââââââ¦âââââââââââââââââââââââââ â â â â â â â File size â 120mb(ideal size for us) â 20mb â 15mb â 5mb â â ââââââââââââââââââââ¬ââââââââââââââââââââââââââââââ¬ââââââââââââââââââââââââ¬âââââââââââââââââââââââââ¬ââââââââââââââââââââââââ⣠â â â â â â â CRC32 (Case 1) â Driver time 0.155s â Driver time 0.0325s â Driver time 0.019s â Driver time 0.0062s â â â real time 0.18s â real time 0.06s â real time 0.04s â real time 0.03s â â â overhead 0.025s â overhead 0.025s â overhead 0.021s â overhead ~0.023s â â ââââââââââââââââââââ¬ââââââââââââââââââââââââââââââ¬ââââââââââââââââââââââââ¬âââââââââââââââââââââââââ¬ââââââââââââââââââââââââ⣠â â â â â â â CRC32 (Case 2) â Real time 0.30s â Real time 0.05s â Real time 0.04s â Real time 0.02s â â ââââââââââââââââââââ¬ââââââââââââââââââââââââââââââ¬ââââââââââââââââââââââââ¬âââââââââââââââââââââââââ¬ââââââââââââââââââââââââ⣠â â â â â â â CRC64 (Case 3) â Driver time 0.385s â Driver time 0.0665s â Driver time 0.0515s â Driver time 0.019s â â â real time 0.41s â real time 0.09s â real time 0.08s â real time 0.04s â â â overhead 0.025s â overhead 0.025s â overhead ~0.025s â overhead ~0.021s â âââââââââââââââââââââ©ââââââââââââââââââââââââââââââ©ââââââââââââââââââââââââ©âââââââââââââââââââââââââ©âââââââââââââââââââââââââ
Eric Biggers <ebiggers@kernel.org> writes: Somehow couple of my earlier mails got blocked mailing list because of table formatting, I guess. Resending. Accept my apologies for spamming. > On Fri, Aug 18, 2023 at 02:36:34PM +0530, Kamlesh Gurudasani wrote: >> Hi Eric, >> >> We are more interested in offload than performance, with splice system >> call and DMA mode in driver(will be implemented after this series gets >> merged), good amount of cpu cycles will be saved. > > So it's for power usage, then? Or freeing up CPU for other tasks? It's for freeing the CPU for other tasks > >> There is one more mode(auto mode) in mcrc64 which helps to verify crc64 >> values against pre calculated crc64, saving the efforts of comparing in >> userspace. > > Is there any path forward to actually support this? > >> >> Current generic implementation of crc64-iso(part of this series) >> gives 173 Mb/s of speed as opposed to mcrc64 which gives speed of 812 >> Mb/s when tested with tcrypt. > > This doesn't answer my question, which to reiterate was: > > How does performance compare to a properly optimized software CRC > implementation on your platform, i.e. an implementation using carryless > multiplication instructions (e.g. ARMv8 CE) if available on your platform, > otherwise an implementation using the slice-by-8 or slice-by-16 method? > > The implementation you tested was slice-by-1. Compared to that, it's common for > slice-by-8 to speed up CRCs by about 4 times and for folding with carryless > multiplication to speed up CRCs by 10-30 times, sometimes limited only by memory > bandwidth. I don't know what specific results you would get on your specific > CPU and for this specific CRC, and you could certainly see something different > if you e.g. have some low-end embedded CPU. But those are the typical results > I've seen for other CRCs on different CPUs. So, a software implementation may > be more attractive than you realize. It could very well be the case that a > PMULL based CRC implementation actually ends up with less CPU load than your > "hardware offload", when taking into syscall, algif_hash, and driver overhead... > > - Eric Hi Eric, thanks for your detailed and valuable inputs. As per your suggestion, we did some profiling. Use case is to calculate crc32/crc64 for file input from user space. Instead of directly implementing PMULL based CRC64, we made first comparison between Case 1. CRC32 (splice() + kernel space SW driver) https://gist.github.com/ti-kamlesh/5be75dbde292e122135ddf795fad9f21 Case 2. CRC32(mmap() + userspace armv8 crc32 instruction implementation) (tried read() as well to get contents of file, but that lost to mmap() so not mentioning number here) https://gist.github.com/ti-kamlesh/002df094dd522422c6cb62069e15c40d Case 3. CRC64 (splice() + MCRC64 HW) https://gist.github.com/ti-kamlesh/98b1fc36c9a7c3defcc2dced4136b8a0 Overall, overhead of userspace + af_alg + driver in (Case 1) and ( Case 3) is ~0.025s, which is constant for any file size. This is calculated using real time to calculate crc - driver time (time spend inside init() + update() +final()) = overhead ~0.025s Here, if we consider similar numbers for crc64 PMULL implementation as crc32 (case 2) , we save good number of cpu cycles using mcrc64 in case of files bigger than 5-10mb as most of the time is being spent in HW offload. Comparison table: https://gist.github.com/ti-kamlesh/8117b6f7120960a71541ab67c671602a
Kamlesh Gurudasani <kamlesh@ti.com> writes: ... > Hi Eric, thanks for your detailed and valuable inputs. > > As per your suggestion, we did some profiling. > > Use case is to calculate crc32/crc64 for file input from user space. > > Instead of directly implementing PMULL based CRC64, we made first comparison between > Case 1. > CRC32 (splice() + kernel space SW driver) > https://gist.github.com/ti-kamlesh/5be75dbde292e122135ddf795fad9f21 > > Case 2. > CRC32(mmap() + userspace armv8 crc32 instruction implementation) > (tried read() as well to get contents of file, but that lost to mmap() so not mentioning number here) > https://gist.github.com/ti-kamlesh/002df094dd522422c6cb62069e15c40d > > Case 3. > CRC64 (splice() + MCRC64 HW) > https://gist.github.com/ti-kamlesh/98b1fc36c9a7c3defcc2dced4136b8a0 > > > Overall, overhead of userspace + af_alg + driver in (Case 1) and > ( Case 3) is ~0.025s, which is constant for any file size. > This is calculated using real time to calculate crc - > driver time (time spend inside init() + update() +final()) = overhead ~0.025s > > > > +-------------------+-----------------------------+-----------------------+------------------------+------------------------+ > | | | | | | > | File size | 120mb(ideal size for us) | 20mb | 15mb | 5mb | > +===================+=============================+=======================+========================+========================+ > | | | | | | > | CRC32 (Case 1) | Driver time 0.155s | Driver time 0.0325s | Driver time 0.019s | Driver time 0.0062s | > | | real time 0.18s | real time 0.06s | real time 0.04s | real time 0.03s | > | | overhead 0.025s | overhead 0.025s | overhead 0.021s | overhead ~0.023s | > +-------------------+-----------------------------+-----------------------+------------------------+------------------------+ > | | | | | | > | CRC32 (Case 2) | Real time 0.30s | Real time 0.05s | Real time 0.04s | Real time 0.02s | > +-------------------+-----------------------------+-----------------------+------------------------+------------------------+ > | | | | | | > | CRC64 (Case 3) | Driver time 0.385s | Driver time 0.0665s | Driver time 0.0515s | Driver time 0.019s | > | | real time 0.41s | real time 0.09s | real time 0.08s | real time 0.04s | > | | overhead 0.025s | overhead 0.025s | overhead ~0.025s | overhead ~0.021s | > +-------------------+-----------------------------+-----------------------+------------------------+------------------------+ > > Here, if we consider similar numbers for crc64 PMULL implementation as > crc32 (case 2) , we save good number of cpu cycles using mcrc64 > in case of files bigger than 5-10mb as most of the time is being spent in HW offload. > > Regards, > Kamlesh Hi Eric, Please let me know if above numbers make sense to you and I should send next revision. Regards, Kamlesh
Conor Dooley <conor@kernel.org> writes: > On Fri, Aug 11, 2023 at 04:34:33PM +0100, Conor Dooley wrote: >> On Fri, Aug 11, 2023 at 12:58:50AM +0530, Kamlesh Gurudasani wrote: >> > Add binding for Texas Instruments MCRC64 >> > >> > MCRC64 engine calculates 64-bit cyclic redundancy checks (CRC) >> > according to the ISO 3309 standard. >> > >> > The ISO 3309 64-bit CRC model parameters are as follows: >> > Generator Polynomial: x^64 + x^4 + x^3 + x + 1 >> > Polynomial Value: 0x000000000000001B >> > Initial value: 0x0000000000000000 >> > Reflected Input: False >> > Reflected Output: False >> > Xor Final: 0x0000000000000000 >> > >> > Signed-off-by: Kamlesh Gurudasani <kamlesh@ti.com> >> > --- >> > Documentation/devicetree/bindings/crypto/ti,mcrc64.yaml | 47 +++++++++++++++++++++++++++++++++++++++++++++++ >> > MAINTAINERS | 5 +++++ >> > 2 files changed, 52 insertions(+) >> > >> > diff --git a/Documentation/devicetree/bindings/crypto/ti,mcrc64.yaml b/Documentation/devicetree/bindings/crypto/ti,mcrc64.yaml >> > new file mode 100644 >> > index 000000000000..38bc7efebd68 >> > --- /dev/null >> > +++ b/Documentation/devicetree/bindings/crypto/ti,mcrc64.yaml >> > @@ -0,0 +1,47 @@ >> > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) >> > +%YAML 1.2 >> > +--- >> > +$id: http://devicetree.org/schemas/crypto/ti,mcrc64.yaml# >> > +$schema: http://devicetree.org/meta-schemas/core.yaml# >> > + >> > +title: Texas Instruments MCRC64 >> > + >> > +description: The MCRC64 engine calculates 64-bit cyclic redundancy checks >> >> A newline after "description" please. >> >> > + (CRC) according to the ISO 3309 standard. >> > + >> > +maintainers: >> > + - Kamlesh Gurudasani <kamlesh@ti.com> >> > + >> > +properties: >> > + compatible: >> > + const: ti,am62-mcrc64 >> >> Is the am62 an SoC or a family of SoCs? I googled a wee bit for am62 & >> there seems to be an am625 and an am623? > > Or is it an am62p5, in which case the compatible should contain > ti,am62p5 I suppose. Sorry for my confusion here, its not really clear > me too since I've been seeing many different-but-similar product names > the last few days. > > Thanks, > Conor. > Hi Conor, Thanks for the review. am62 is family of SOCs. All devices under this family, like am623/5/p5 and etc, have MCRC64. I have kept the naming convention similar to SA2UL/SA3UL[0]. [0] https://elixir.bootlin.com/linux/latest/source/Documentation/devicetree/bindings/crypto/ti,sa2ul.yaml#L18 Kamlesh >> >> Otherwise, this looks good to me. >> >> > + >> > + reg: >> > + maxItems: 1 >> > + >> > + clocks: >> > + maxItems: 1 >> > + >> > + power-domains: >> > + maxItems: 1 >> > + >> > +required: >> > + - compatible >> > + - reg >> > + - clocks >> > + - power-domains >> > + >> > +additionalProperties: false >> > + >> > +examples: >> > + - | >> > + #include <dt-bindings/soc/ti,sci_pm_domain.h> >> > + >> > + crc@30300000 { >> > + compatible = "ti,am62-mcrc64"; >> > + reg = <0x30300000 0x1000>; >> > + clocks = <&k3_clks 116 0>; >> > + power-domains = <&k3_pds 116 TI_SCI_PD_EXCLUSIVE>; >> > + }; >> > + >> > +... >> > diff --git a/MAINTAINERS b/MAINTAINERS >> > index 02a3192195af..66b51f43d196 100644 >> > --- a/MAINTAINERS >> > +++ b/MAINTAINERS >> > @@ -21481,6 +21481,11 @@ S: Maintained >> > F: Documentation/devicetree/bindings/iio/adc/ti,lmp92064.yaml >> > F: drivers/iio/adc/ti-lmp92064.c >> > >> > +TI MEMORY CYCLIC REDUNDANCY CHECK (MCRC64) DRIVER >> > +M: Kamlesh Gurudasani <kamlesh@ti.com> >> > +S: Maintained >> > +F: Documentation/devicetree/bindings/crypto/ti,mcrc64.yaml >> > + >> > TI PCM3060 ASoC CODEC DRIVER >> > M: Kirill Marinushkin <kmarinushkin@birdec.com> >> > L: alsa-devel@alsa-project.org (moderated for non-subscribers) >> > >> > -- >> > 2.34.1 >> >
On 27/05/2024 10:25, Kamlesh Gurudasani wrote: > Conor Dooley <conor@kernel.org> writes: > >> On Fri, Aug 11, 2023 at 04:34:33PM +0100, Conor Dooley wrote: >>> On Fri, Aug 11, 2023 at 12:58:50AM +0530, Kamlesh Gurudasani wrote: >>>> Add binding for Texas Instruments MCRC64 >>>> >>>> MCRC64 engine calculates 64-bit cyclic redundancy checks (CRC) >>>> according to the ISO 3309 standard. >>>> >>>> The ISO 3309 64-bit CRC model parameters are as follows: >>>> Generator Polynomial: x^64 + x^4 + x^3 + x + 1 >>>> Polynomial Value: 0x000000000000001B >>>> Initial value: 0x0000000000000000 >>>> Reflected Input: False >>>> Reflected Output: False >>>> Xor Final: 0x0000000000000000 >>>> >>>> Signed-off-by: Kamlesh Gurudasani <kamlesh@ti.com> >>>> --- >>>> Documentation/devicetree/bindings/crypto/ti,mcrc64.yaml | 47 +++++++++++++++++++++++++++++++++++++++++++++++ >>>> MAINTAINERS | 5 +++++ >>>> 2 files changed, 52 insertions(+) >>>> >>>> diff --git a/Documentation/devicetree/bindings/crypto/ti,mcrc64.yaml b/Documentation/devicetree/bindings/crypto/ti,mcrc64.yaml >>>> new file mode 100644 >>>> index 000000000000..38bc7efebd68 >>>> --- /dev/null >>>> +++ b/Documentation/devicetree/bindings/crypto/ti,mcrc64.yaml >>>> @@ -0,0 +1,47 @@ >>>> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) >>>> +%YAML 1.2 >>>> +--- >>>> +$id: http://devicetree.org/schemas/crypto/ti,mcrc64.yaml# >>>> +$schema: http://devicetree.org/meta-schemas/core.yaml# >>>> + >>>> +title: Texas Instruments MCRC64 >>>> + >>>> +description: The MCRC64 engine calculates 64-bit cyclic redundancy checks >>> >>> A newline after "description" please. >>> >>>> + (CRC) according to the ISO 3309 standard. >>>> + >>>> +maintainers: >>>> + - Kamlesh Gurudasani <kamlesh@ti.com> >>>> + >>>> +properties: >>>> + compatible: >>>> + const: ti,am62-mcrc64 >>> >>> Is the am62 an SoC or a family of SoCs? I googled a wee bit for am62 & >>> there seems to be an am625 and an am623? >> >> Or is it an am62p5, in which case the compatible should contain >> ti,am62p5 I suppose. Sorry for my confusion here, its not really clear >> me too since I've been seeing many different-but-similar product names >> the last few days. >> >> Thanks, >> Conor. >> > Hi Conor, > > Thanks for the review. > > am62 is family of SOCs. > > All devices under this family, like am623/5/p5 and etc, have MCRC64. > > I have kept the naming convention similar to SA2UL/SA3UL[0]. > > [0] https://elixir.bootlin.com/linux/latest/source/Documentation/devicetree/bindings/crypto/ti,sa2ul.yaml#L18 Usual answer is: no families. There are exceptions, though, so is this case on the exception list? https://elixir.bootlin.com/linux/v6.10-rc1/source/Documentation/devicetree/bindings/writing-bindings.rst#L42 P.S. Your email client added some weird subject prefix - please fix it. Best regards, Krzysztof
Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> writes: > This message was sent from outside of Texas Instruments. > Do not click links or open attachments unless you recognize the source of this email and know the content is safe. If you wish > to report this message to IT Security, please forward the message as an attachment to phishing@list.ti.com > > On 27/05/2024 10:25, Kamlesh Gurudasani wrote: >> Conor Dooley <conor@kernel.org> writes: >> >>> On Fri, Aug 11, 2023 at 04:34:33PM +0100, Conor Dooley wrote: >>>> On Fri, Aug 11, 2023 at 12:58:50AM +0530, Kamlesh Gurudasani wrote: >>>>> Add binding for Texas Instruments MCRC64 >>>>> >>>>> MCRC64 engine calculates 64-bit cyclic redundancy checks (CRC) >>>>> according to the ISO 3309 standard. >>>>> >>>>> The ISO 3309 64-bit CRC model parameters are as follows: >>>>> Generator Polynomial: x^64 + x^4 + x^3 + x + 1 >>>>> Polynomial Value: 0x000000000000001B >>>>> Initial value: 0x0000000000000000 >>>>> Reflected Input: False >>>>> Reflected Output: False >>>>> Xor Final: 0x0000000000000000 >>>>> >>>>> Signed-off-by: Kamlesh Gurudasani <kamlesh@ti.com> >>>>> --- >>>>> Documentation/devicetree/bindings/crypto/ti,mcrc64.yaml | 47 +++++++++++++++++++++++++++++++++++++++++++++++ >>>>> MAINTAINERS | 5 +++++ >>>>> 2 files changed, 52 insertions(+) >>>>> >>>>> diff --git a/Documentation/devicetree/bindings/crypto/ti,mcrc64.yaml b/Documentation/devicetree/bindings/crypto/ti,mcrc64.yaml >>>>> new file mode 100644 >>>>> index 000000000000..38bc7efebd68 >>>>> --- /dev/null >>>>> +++ b/Documentation/devicetree/bindings/crypto/ti,mcrc64.yaml >>>>> @@ -0,0 +1,47 @@ >>>>> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) >>>>> +%YAML 1.2 >>>>> +--- >>>>> +$id: https://urldefense.com/v3/__http://devicetree.org/schemas/crypto/ti,mcrc64.yaml*__;Iw!!G3vK!Qw75749h2ysFlROkyfLIUT9MGWlHfBEvPAbLVjScJXCPJ7vbwgxH-8hNWlJGBXGwz9Ny47eQi2mPS5R6La54vZo$ >>>>> +$schema: https://urldefense.com/v3/__http://devicetree.org/meta-schemas/core.yaml*__;Iw!!G3vK!Qw75749h2ysFlROkyfLIUT9MGWlHfBEvPAbLVjScJXCPJ7vbwgxH-8hNWlJGBXGwz9Ny47eQi2mPS5R6P2LNJCQ$ >>>>> + >>>>> +title: Texas Instruments MCRC64 >>>>> + >>>>> +description: The MCRC64 engine calculates 64-bit cyclic redundancy checks >>>> >>>> A newline after "description" please. >>>> >>>>> + (CRC) according to the ISO 3309 standard. >>>>> + >>>>> +maintainers: >>>>> + - Kamlesh Gurudasani <kamlesh@ti.com> >>>>> + >>>>> +properties: >>>>> + compatible: >>>>> + const: ti,am62-mcrc64 >>>> >>>> Is the am62 an SoC or a family of SoCs? I googled a wee bit for am62 & >>>> there seems to be an am625 and an am623? >>> >>> Or is it an am62p5, in which case the compatible should contain >>> ti,am62p5 I suppose. Sorry for my confusion here, its not really clear >>> me too since I've been seeing many different-but-similar product names >>> the last few days. >>> >>> Thanks, >>> Conor. >>> >> Hi Conor, >> >> Thanks for the review. >> >> am62 is family of SOCs. >> >> All devices under this family, like am623/5/p5 and etc, have MCRC64. >> >> I have kept the naming convention similar to SA2UL/SA3UL[0]. >> >> [0] https://urldefense.com/v3/__https://elixir.bootlin.com/linux/latest/source/Documentation/devicetree/bindings/crypto/ti,sa2ul.yaml*L18__;Iw!!G3vK!Qw75749h2ysFlROkyfLIUT9MGWlHfBEvPAbLVjScJXCPJ7vbwgxH-8hNWlJGBXGwz9Ny47eQi2mPS5R6afCEd8s$ > > Usual answer is: no families. There are exceptions, though, so is this > case on the exception list? Okay, will use ti,am625-mcrc64 as compatible and as fallback compatible for other devices. I hope that is right. Thanks. Kamlesh > > https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.10-rc1/source/Documentation/devicetree/bindings/writing-bindings.rst*L42__;Iw!!G3vK!Qw75749h2ysFlROkyfLIUT9MGWlHfBEvPAbLVjScJXCPJ7vbwgxH-8hNWlJGBXGwz9Ny47eQi2mPS5R6WaRq1VM$ > > P.S. Your email client added some weird subject prefix - please fix it. Thanks for bringing this to my notice, Will fix it. > > > > Best regards, > Krzysztof
Hi Conor/Krzysztof On 27/05/24 15:41, Kamlesh Gurudasani wrote: > Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> writes: > >> This message was sent from outside of Texas Instruments. >> Do not click links or open attachments unless you recognize the source of this email and know the content is safe. If you wish >> to report this message to IT Security, please forward the message as an attachment to phishing@list.ti.com >> >> On 27/05/2024 10:25, Kamlesh Gurudasani wrote: >>> Conor Dooley <conor@kernel.org> writes: >>> >>>> On Fri, Aug 11, 2023 at 04:34:33PM +0100, Conor Dooley wrote: >>>>> On Fri, Aug 11, 2023 at 12:58:50AM +0530, Kamlesh Gurudasani wrote: >>>>>> Add binding for Texas Instruments MCRC64 >>>>>> >>>>>> MCRC64 engine calculates 64-bit cyclic redundancy checks (CRC) >>>>>> according to the ISO 3309 standard. >>>>>> >>>>>> The ISO 3309 64-bit CRC model parameters are as follows: >>>>>> Generator Polynomial: x^64 + x^4 + x^3 + x + 1 >>>>>> Polynomial Value: 0x000000000000001B >>>>>> Initial value: 0x0000000000000000 >>>>>> Reflected Input: False >>>>>> Reflected Output: False >>>>>> Xor Final: 0x0000000000000000 >>>>>> >>>>>> Signed-off-by: Kamlesh Gurudasani <kamlesh@ti.com> >>>>>> --- >>>>>> Documentation/devicetree/bindings/crypto/ti,mcrc64.yaml | 47 +++++++++++++++++++++++++++++++++++++++++++++++ >>>>>> MAINTAINERS | 5 +++++ >>>>>> 2 files changed, 52 insertions(+) >>>>>> >>>>>> diff --git a/Documentation/devicetree/bindings/crypto/ti,mcrc64.yaml b/Documentation/devicetree/bindings/crypto/ti,mcrc64.yaml >>>>>> new file mode 100644 >>>>>> index 000000000000..38bc7efebd68 >>>>>> --- /dev/null >>>>>> +++ b/Documentation/devicetree/bindings/crypto/ti,mcrc64.yaml >>>>>> @@ -0,0 +1,47 @@ >>>>>> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) >>>>>> +%YAML 1.2 >>>>>> +--- >>>>>> +$id: https://urldefense.com/v3/__http://devicetree.org/schemas/crypto/ti,mcrc64.yaml*__;Iw!!G3vK!Qw75749h2ysFlROkyfLIUT9MGWlHfBEvPAbLVjScJXCPJ7vbwgxH-8hNWlJGBXGwz9Ny47eQi2mPS5R6La54vZo$ >>>>>> +$schema: https://urldefense.com/v3/__http://devicetree.org/meta-schemas/core.yaml*__;Iw!!G3vK!Qw75749h2ysFlROkyfLIUT9MGWlHfBEvPAbLVjScJXCPJ7vbwgxH-8hNWlJGBXGwz9Ny47eQi2mPS5R6P2LNJCQ$ >>>>>> + >>>>>> +title: Texas Instruments MCRC64 >>>>>> + >>>>>> +description: The MCRC64 engine calculates 64-bit cyclic redundancy checks >>>>> >>>>> A newline after "description" please. >>>>> >>>>>> + (CRC) according to the ISO 3309 standard. >>>>>> + >>>>>> +maintainers: >>>>>> + - Kamlesh Gurudasani <kamlesh@ti.com> >>>>>> + >>>>>> +properties: >>>>>> + compatible: >>>>>> + const: ti,am62-mcrc64 >>>>> >>>>> Is the am62 an SoC or a family of SoCs? I googled a wee bit for am62 & >>>>> there seems to be an am625 and an am623? >>>> Let me put this confusion to rest and clarify the ratioanle of namings so far. AM62 is a the name of the SoC variant. AM625/AM623/AM621/AM620 are the Orderable Part Number (OPNs - full OPN is several digits long), the only difference b/w them is number of CPU/GPU/PRU cores. At SoC level they are all same with AM625 being superset. Similarly AM62A is another SoC variant, with AM62A7 and AM62A3 are OPNs. AM62P is yet another SoC variant with AM62P5 as OPN Linux DT is written to support superset part numbers (AM625, AM62A7, AM62P5) as TI EVMs always have superset parts. Board dts files are named accordingly (eg.: k3-am625-sk.dts) Bootloader does the appropriate fixups to disable components for subset devices based on eFUSE indications when needed. >>>> Or is it an am62p5, in which case the compatible should contain >>>> ti,am62p5 I suppose. Sorry for my confusion here, its not really clear >>>> me too since I've been seeing many different-but-similar product names >>>> the last few days. >>>> MCRC64 is on all K3 platforms. We have been using ti,am62-xxxx for all modules that were verified (or first supported) on any of the AM625/3/1/0 SoCs. Last digit really is represents CPU/GPU numbers, thus not relevant for peripherals and there should be no change in HW If peripheral is specific to AM62A (eg.: DSP) or AM62P, then ti,am62a-xxxx and ti,am62p-xxxx is being used correspondingly. >>>> Thanks, >>>> Conor. >>>> >>> Hi Conor, >>> >>> Thanks for the review. >>> >>> am62 is family of SOCs. >>> >>> All devices under this family, like am623/5/p5 and etc, have MCRC64. >>> >>> I have kept the naming convention similar to SA2UL/SA3UL[0]. >>> >>> [0] https://urldefense.com/v3/__https://elixir.bootlin.com/linux/latest/source/Documentation/devicetree/bindings/crypto/ti,sa2ul.yaml*L18__;Iw!!G3vK!Qw75749h2ysFlROkyfLIUT9MGWlHfBEvPAbLVjScJXCPJ7vbwgxH-8hNWlJGBXGwz9Ny47eQi2mPS5R6afCEd8s$ >> >> Usual answer is: no families. There are exceptions, though, so is this >> case on the exception list? > Okay, will use ti,am625-mcrc64 as compatible and as fallback compatible for > other devices. I hope that is right. As mentioned above ti,am62-mcrc64 would be better option here and would be consistent with rest of the peripherals being supported on the SoC. But if we really want be accurate to exact part number on which Kamlesh has tested then, perhaps ti,am625-mcrc64 is fine with me. > > Thanks. > > Kamlesh >> >> https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.10-rc1/source/Documentation/devicetree/bindings/writing-bindings.rst*L42__;Iw!!G3vK!Qw75749h2ysFlROkyfLIUT9MGWlHfBEvPAbLVjScJXCPJ7vbwgxH-8hNWlJGBXGwz9Ny47eQi2mPS5R6WaRq1VM$ >> >> P.S. Your email client added some weird subject prefix - please fix it. > Thanks for bringing this to my notice, Will fix it. >> >> >> >> Best regards, >> Krzysztof
Add support for MCRC64 engine to calculate 64-bit CRC in Full-CPU mode MCRC64 engine calculates 64-bit cyclic redundancy checks (CRC) according to the ISO 3309 standard. The ISO 3309 64-bit CRC model parameters are as follows: Generator Polynomial: x^64 + x^4 + x^3 + x + 1 Polynomial Value: 0x000000000000001B Initial value: 0x0000000000000000 Reflected Input: False Reflected Output: False Xor Final: 0x0000000000000000 Tested with CONFIG_CRYPTO_MANAGER_DISABLE_TESTS is not set CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y and tcrypt, sudo modprobe tcrypt mode=329 sec=1 User space application implemented using algif_hash, https://gist.github.com/ti-kamlesh/73abfcc1a33318bb3b199d36b6209e59 Signed-off-by: Kamlesh Gurudasani <kamlesh@ti.com> --- Changes in v2: - Add generic implementation of crc64-iso - Fixes according to review comments - Link to v1: https://lore.kernel.org/r/20230719-mcrc-upstream-v1-0-dc8798a24c47@ti.com --- Kamlesh Gurudasani (6): lib: add ISO 3309 model crc64 crypto: crc64 - add crc64-iso framework dt-bindings: crypto: Add Texas Instruments MCRC64 crypto: ti - add driver for MCRC64 engine arm64: dts: ti: k3-am62: Add dt node, cbass_main ranges for MCRC64 arm64: defconfig: enable TI MCRC64 module Documentation/devicetree/bindings/crypto/ti,mcrc64.yaml | 47 ++++++++ MAINTAINERS | 7 ++ arch/arm64/boot/dts/ti/k3-am62-main.dtsi | 7 ++ arch/arm64/boot/dts/ti/k3-am62.dtsi | 1 + arch/arm64/configs/defconfig | 2 + crypto/Kconfig | 11 ++ crypto/Makefile | 1 + crypto/crc64_iso_generic.c | 119 ++++++++++++++++++ crypto/tcrypt.c | 5 + crypto/testmgr.c | 7 ++ crypto/testmgr.h | 404 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ drivers/crypto/Kconfig | 1 + drivers/crypto/Makefile | 1 + drivers/crypto/ti/Kconfig | 10 ++ drivers/crypto/ti/Makefile | 2 + drivers/crypto/ti/mcrc64.c | 442 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ include/linux/crc64.h | 5 + lib/crc64-iso.c | 126 +++++++++++++++++++ lib/crc64.c | 27 +++++ lib/gen_crc64table.c | 6 + 20 files changed, 1231 insertions(+) --- base-commit: 21ef7b1e17d039053edaeaf41142423810572741 change-id: 20230719-mcrc-upstream-7ae9a75cab37 Best regards,