mbox series

[v6,RESEND,0/5] crypto: LEA block cipher implementation

Message ID 20240112022859.2384-1-letrhee@nsr.re.kr
Headers show
Series crypto: LEA block cipher implementation | expand

Message

Dongsoo Lee Jan. 12, 2024, 2:28 a.m. UTC
This submission contains a generic C implementation of the LEA cipher and test vectors for it. It also includes modifications to use the LEA in fscrypt.

The LEA algorithm is a lightweight block cipher that processes data blocks of 128-bits and has three different key lengths, each with a different number of rounds:

- LEA-128: 128-bit key, 24 rounds,
- LEA-192: 192-bit key, 28 rounds, and
- LEA-256: 256-bit key, 32 rounds.

The round function of LEA consists of 32-bit ARX (modular Addition, bitwise Rotation, and bitwise XOR) operations. See [2, 5, 7] for details.

LEA is a Korean national standard block cipher, described in "KS X 3246"[1] and is also included in the international standard, "ISO/IEC 29192-2:2019 standard"[2].

It is one of the approved block ciphers for the current Korean Cryptographic Module Validation Program (KCMVP).

At the time of submission, no successful attack on full-round LEA is known. As is typical for iterated block ciphers, reduced-round variants have been attacked. The best published attacks on LEA in the standard attack model (CPA/CCA with unknown key) are boomerang attacks and differential linear attacks. The security margin to the whole rounds ratio is greater than 29% against various existing cryptanalytic techniques for block ciphers. [3]

We expect that the first application of the patch would be the disk encryption on the Gooroom platform ('Gooroom' is a Korean word, meaning 'cloud') [4]. The Gooroom platform is a government-driven Debian-based Linux distribution in South Korea. In Korea, there are many crypto companies that want to bundle Linux into their products and sell them. They create their own Gooroom platforms by modifying the original Gooroom platform for their services. (Of course, the Gooroom platform is not mandatory, and companies wishing to use Linux are free to choose an appropriate distribution.) BTW, in Korea, many crypto companies want to use LEA, because LEA is one of the block ciphers of the KCMVP, a validation program for commercial crypto S/W to be delivered to the Korean government.

Currently, the Gooroom platform uses AES-XTS for disk encryption. The main reason for submitting this patch is to make disk encryption with LEA (e.g. LEA-XTS) available on there. If this submission is accepted, LEA can be used without any additional modifications in dm-crypt, a module that provides disk encryption functionality within the kernel.

This patch also includes a modification to enable LEA for use in fscrypt, another data-at-rest encryption method available within the kernel, and a modification to blk-crypto-fallback to enable the "inlinecrypt" mount option in fscrypt.

The Linux Crypto API already has another Korean block cipher, ARIA, also one of the block ciphers of the KCVMP. However, LEA is more widely used than ARIA in industry nowadays, because LEA is one of the lightweight cryptography standard of ISO/IEC [2] and performs well on low-end devices that support 32-bit operations. So we think they are complementary to each other.

In general, it's obvious that the hardware-accelerated AES is the best performer. However, there exist not only environments where the hardware-accelerated AES is not supported, but also situations where AES is not preferred for various reasons. In these cases, if someone wants to encrypt using a block cipher, LEA could be an alternative.

This submission includes a SIMD implementation for the x86-64 platform. The LEA cipher consists of 32-bit integer addition, rotation, and XOR operations, allowing for 4 blocks (XMM), 8 blocks (YMM), and 16 blocks (ZMM) of parallelism depending on the size of the registers. In addition, AVX2 and AVX-512F have more instructions to increase parallel encryption performance, which can be implemented differently even though they use the same registers. Therefore, lea-x86_64 selects the appropriate implementation in one glue code at module initialization. If additional SIMD instructions are added in the future, such as AVX10, this can be handled as well.

Below are the speedtest performed with the tcrypt module for AES, LEA, ARIA, and Adiantum on three different platforms (AMD Ryzen 9 5950X, Intel(R) Core(TM) i5-12600K, and Intel(R) Xeon(R) Gold 6254).

(4,096-byte block enc/decryption results in the tcrypt speedtest. Unit: cycles)

- AMD Ryzen 9 5950X (Virtual Machine)
  - aesni        ecb 128-bit key:  1,956 /   1,892
  - aesni        ecb 256-bit key:  2,086 /   2,098
  - lea-x86_64   ecb 128-bit key:  5,647 /   6,133
  - lea-x86_64   ecb 256-bit key:  6,702 /   7,444
  - aria-avx2    ecb 128-bit key:  8,316 /   8,153
  - aria-avx2    ecb 256-bit key: 10,539 /  10,550

  - aesni        cbc 128-bit key:  7,758 /   1,830
  - aesni        cbc 256-bit key: 10,660 /   2,071
  - lea-x86_64   cbc 128-bit key: 22,501 /   6,283
  - lea-x86_64   cbc 256-bit key: 28,125 /   7,592

  - aesni        ctr 128-bit key:  1,514 /   1,505
  - aesni        ctr 256-bit key:  1,884 /   1,867
  - lea-x86_64   ctr 128-bit key:  5,804 /   5,792
  - lea-x86_64   ctr 256-bit key:  6,958 /   6,951
  - aria-avx2    ctr 128-bit key:  8,819 /   8,736
  - aria-avx2    ctr 256-bit key: 11,101 /  10,636

  - adiantum(xchacha12-simd,...):  8,390 /   8,427
  - adiantum(xchacha20-simd,...):  9,698 /   9,732

  - aesni        xts 256-bit key:  2,177 /   2,165
  - aesni        xts 512-bit key:  2,589 /   2,527
  - lea-x86_64   xts 256-bit key:  6,488 /   6,745
  - lea-x86_64   xts 512-bit key:  7,484 /   8,083

  - aes-generic  ecb 128-bit key: 35,768 /  36,329
  - aes-generic  ecb 256-bit key: 35,785 /  35,237
  - lea-generic  ecb 128-bit key: 30,719 /  38,092
  - lea-generic  ecb 256-bit key: 35,373 /  46,941
  - aria-generic ecb 128-bit key:186,660 / 188,674
  - aria-generic ecb 256-bit key:247,919 / 245,527

- Intel(R) Core(TM) i5-12600K (microcode 0x15, AVX-512F Enabled)
  - aesni        ecb 128-bit key:  1,436 /   1,441
  - aesni        ecb 256-bit key:  1,984 /   1,987
  - lea-x86_64   ecb 128-bit key:  5,318 /   5,916
  - lea-x86_64   ecb 256-bit key:  6,209 /   7,071
  - aria-avx512  ecb 128-bit key:  4,786 /   4,799
  - aria-avx512  ecb 256-bit key:  5,988 /   5,989

  - aesni        cbc 128-bit key:  8,741 /   1,467
  - aesni        cbc 256-bit key: 11,803 /   1,995
  - lea-x86_64   cbc 128-bit key: 31,070 /   6,063
  - lea-x86_64   cbc 256-bit key: 39,117 /   7,173

  - aesni        ctr 128-bit key:  2,120 /   2,112
  - aesni        ctr 256-bit key:  2,588 /   2,595
  - lea-x86_64   ctr 128-bit key:  4,438 /   4,397
  - lea-x86_64   ctr 256-bit key:  5,217 /   5,196
  - aria-avx512  ctr 128-bit key:  6,270 /   6,272
  - aria-avx512  ctr 256-bit key:  7,469 /   7,473

  - adiantum(xchacha12-simd,...):  7,526 /   7,453
  - adiantum(xchacha20-simd,...):  8,983 /   8,892

  - aesni        xts 256-bit key:  2,234 /   2,241
  - aesni        xts 512-bit key:  2,525 /   2,538
  - lea-x86_64   xts 256-bit key:  6,687 /   7,333
  - lea-x86_64   xts 512-bit key:  7,626 /   8,457

  - aes-generic  ecb 128-bit key: 34,399 /  34,765
  - aes-generic  ecb 256-bit key: 48,568 /  49,245
  - lea-generic  ecb 128-bit key: 23,576 /  36,230
  - lea-generic  ecb 256-bit key: 31,715 /  50,461
  - aria-generic ecb 128-bit key:108,227 / 108,135
  - aria-generic ecb 256-bit key:146,669 / 145,993

- Intel(R) Xeon(R) Gold 6254 (Virtual Machine)
  - aesni        ecb 128-bit key:  3,390 /   3,396
  - aesni        ecb 256-bit key:  4,533 /   4,549
  - lea-x86_64   ecb 128-bit key:  5,500 /   6,594
  - lea-x86_64   ecb 256-bit key:  6,506 /   7,467
  - aria-avx2    ecb 128-bit key: 14,109 /  13,573
  - aria-avx2    ecb 256-bit key: 17,605 /  16,955

  - aesni        cbc 128-bit key: 12,559 /   3,544
  - aesni        cbc 256-bit key: 17,150 /   4,681
  - lea-x86_64   cbc 128-bit key: 33,471 /   5,900
  - lea-x86_64   cbc 256-bit key: 41,024 /   6,948

  - aesni        ctr 128-bit key:  3,099 /   3,095
  - aesni        ctr 256-bit key:  4,126 /   4,124
  - lea-x86_64   ctr 128-bit key:  5,054 /   4,909
  - lea-x86_64   ctr 256-bit key:  5,795 /   5,797
  - aria-avx2    ctr 128-bit key: 13,439 /  13,017
  - aria-avx2    ctr 256-bit key: 17,325 /  16,731

  - adiantum(xchacha12-simd,...):  9,064 /   9,006
  - adiantum(xchacha20-simd,...): 10,702 /  10,628

  - aesni        xts 256-bit key:  3,886 /   3,857
  - aesni        xts 512-bit key:  4,949 /   5,008
  - lea-x86_64   xts 256-bit key:  6,457 /   7,409
  - lea-x86_64   xts 512-bit key:  7,438 /   8,510

  - aes-generic  ecb 128-bit key: 49,438 /  48,803
  - aes-generic  ecb 256-bit key: 72,348 /  73,804
  - lea-generic  ecb 128-bit key: 30,300 /  45,072
  - lea-generic  ecb 256-bit key: 39,054 /  60,472
  - aria-generic ecb 128-bit key:189,850 / 175,073
  - aria-generic ecb 256-bit key:243,704 / 228,347

If this submission is accepted, future submissions may include an LEA implementation for aarch64 and an implementation with masks for AVX-512F.

Although the designers of LEA did not provide test vectors in their paper [5], the ISO/IEC standard [2] and the KS standard [1] do. Furthermore, the Block Cipher LEA Specification("블록암호 LEA 규격서", written in Korean) document on the LEA introduction page [6] and the Wikipedia article on LEA [7] show the same test vectors as in the standards.

The test vectors for ECB, CBC, CTR, and GCM modes included in the testmgr module are taken from the KCMVP Cryptographic Algorithm Verification Criteria V3.0("KCMVP 검증대상 암호알고리즘 검증기준 V3.0", written in Korean) [8]. Test vectors for the XTS mode were generated by ourselves, and we crosschecked them using Crypto++ [9] and testmgr on Linux.

The implementation was tested with kernel module tcrypt.ko and passed the selftest using the above-mentioned test vectors. It also has been tested with CONFIG_CRYPTO_MANAGER_EXTRA_TESTS. The fscrypt patch was tested using a modified tool by forking https://github.com/google/fscrypt.

The AVX2 and AVX-512F implementations were tested on the device that performed the speedtest, while the SSE2 implementation was tested using QEMU's x86-64 binary emulation.

[1] KS X 3246, 128-bit block cipher LEA.
[2] ISO/IEC 29192-2:2019, Information security — Lightweight cryptography — Part 2: Block ciphers.
[3] Yi, Chen, et al. "Differential-Linear Approximation Semi-Unconstrained Searching and Partition Tree: Application to LEA and Speck", Asiacrypt 2023. (eprint 2023/1414)
[4] https://github.com/gooroom https://www.gooroom.kr/
[5] Hong, Deukjo, et al. "LEA: A 128-bit block cipher for fast encryption on common processors.", WISA 2013.
[6] https://seed.kisa.or.kr/kisa/algorithm/EgovLeaInfo.do
[7] https://en.wikipedia.org/wiki/LEA_(cipher)
[8] https://seed.kisa.or.kr/kisa/kcmvp/EgovVerification.do
[9] https://www.cryptopp.com/

Changelog:
v6:
- Resended due to missing subsystem and incorrect title
  - The patch is unchanged from v5.
v5:
- Added SSE2/AVX2/AVX-512F implementation
  - Single glue code to determine proper SIMD acceleration
- Adjusted ordering within structures to align with 16-byte boundaries.
- Added more test vectors.
  - Increased the maximum test-vector length to evaluate 16-block parallelism.
  - Added the CBC-CTS test vector.
v4:
- Removed documentation to describe LEAs in fscrypt.
v3:
- Added implementations to enable LEA in fscrypt and blk-crypt.
v2:
- Reimplemented the Generic C implementation as a Loop version.
  - The decryption code was adapted from an optimized implementation by Eric Biggers.
    https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git/commit/?h=old/wip-lea&id=1d1cbba14380f8a1abc76baf939b9e51de047fb6
- Removed AVX2 SIMD implementation.
- Added comments for functions.
- Improved the description in Kconfig.
- Added test vectors from the standard documentation.

Dongsoo Lee (5):
  crypto: LEA block cipher implementation
  crypto: add LEA testmgr tests
  blk-crypto: Add LEA-256-XTS blk-crypto support
  fscrypt: Add LEA-256-XTS, LEA-256-CTS support
  crypto: LEA block cipher x86_64 optimization

 arch/x86/crypto/Kconfig            |   29 +
 arch/x86/crypto/Makefile           |    3 +
 arch/x86/crypto/lea-x86_64-asm.S   | 2272 +++++++++++++++++++++
 arch/x86/crypto/lea-x86_64-glue.c  |  820 ++++++++
 block/blk-crypto.c                 |    6 +
 crypto/Kconfig                     |   18 +
 crypto/Makefile                    |    1 +
 crypto/lea_generic.c               |  410 ++++
 crypto/tcrypt.c                    |   97 +
 crypto/testmgr.c                   |   38 +
 crypto/testmgr.h                   | 3022 ++++++++++++++++++++++++++++
 fs/crypto/fscrypt_private.h        |    2 +-
 fs/crypto/keysetup.c               |   15 +
 fs/crypto/policy.c                 |    4 +
 include/crypto/lea.h               |   44 +
 include/linux/blk-crypto.h         |    1 +
 include/uapi/linux/fscrypt.h       |    4 +-
 tools/include/uapi/linux/fscrypt.h |    4 +-
 18 files changed, 6787 insertions(+), 3 deletions(-)
 create mode 100644 arch/x86/crypto/lea-x86_64-asm.S
 create mode 100644 arch/x86/crypto/lea-x86_64-glue.c
 create mode 100644 crypto/lea_generic.c
 create mode 100644 include/crypto/lea.h

Comments

Dongsoo Lee Jan. 15, 2024, 5:17 a.m. UTC | #1
On  Fri, 12 Jan 2024 12:59:56 +0000, David Laight <David.Laight@ACULAB.COM> wrote:
> From: Dongsoo Lee
>> Sent: 12 January 2024 02:29
>>
>> For the x86_64 environment, we use AVX-512F/AVX2/SSE2 instructions.
>> Since LEA uses 128-bit blocks of four 32-bit integers, for optimization,
>> SSE2 encrypts 4 blocks, AVX2 encrypts 4/8 blocks, and AVX-512F encrypts
>> 4/8/16 blocks at a time.
>>
>> Our submission provides a optimized implementation of ECB, CBC
>> decryption, CTR, and XTS cipher operation modes on x86_64 CPUs
>> supporting.
> 
> Given you say in 0/0:
> 
> The LEA algorithm is a lightweight block cipher that processes data blocks of 128-bits and has three different key lengths, each with a different number of rounds:
> 
> Just how big is it ?
> Doesn't look 'lightweight' to me.
> 
> 	David
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
> 

Firstly, it's worth mentioning that LEA is an encryption algorithm designed to ensure 128-bit security.

The LEA cipher provides a balance between code size and required memory, allowing for trade-offs with performance. The implementation of LEA that we have submitted is oriented towards achieving optimal performance.

While it's difficult to compare to the most recent implementations of cryptographic algorithms because the test is out of date, you can see the results of a previous FELICS test [1] that implemented LEA for a smaller code size.

For example, a detailed example of skipping the key schedule to reduce memory usage and minimize code size can be found in [2].

Thank you for your interest.

    Dongsoo Lee

[1] https://www.cryptolux.org/index.php/FELICS#Results
[2] https://github.com/cryptolu/FELICS/blob/master/block_ciphers/source/ciphers/LEA_128_128_v03/source/encrypt.c