Message ID | 20220210232812.798387-1-nhuck@google.com |
---|---|
Headers | show |
Series | crypto: HCTR2 support | expand |
On Thu, Feb 10, 2022 at 11:28:06PM +0000, Nathan Huckleberry wrote: > Add a generic implementation of XCTR mode as a template. XCTR is a > blockcipher mode similar to CTR mode. XCTR uses XORs and little-endian > addition rather than big-endian arithmetic which has two advantages: It > is slightly faster on little-endian CPUs and it is less likely to be > implemented incorrect since integer overflows are not possible on > practical input sizes. XCTR is used as a component to implement HCTR2. > > More information on XCTR mode can be found in the HCTR2 paper: > https://eprint.iacr.org/2021/1441.pdf > > Signed-off-by: Nathan Huckleberry <nhuck@google.com> > > Changes since v1: > * Restricted blocksize to 16-bytes > * Removed xctr.h and u32_to_le_block > * Use single crypto_template instead of array > --- Changelog text conventionally goes in the cover letter, not in the individual patches. Having the changelog be above the scissors line ("---") is especially problematic, as it will show up in the git commit message. > diff --git a/crypto/Kconfig b/crypto/Kconfig > index fa1741bb568f..8543f34fa200 100644 > --- a/crypto/Kconfig > +++ b/crypto/Kconfig > @@ -452,6 +452,15 @@ config CRYPTO_PCBC > PCBC: Propagating Cipher Block Chaining mode > This block cipher algorithm is required for RxRPC. > > +config CRYPTO_XCTR > + tristate > + select CRYPTO_SKCIPHER > + select CRYPTO_MANAGER > + help > + XCTR: XOR Counter mode. This blockcipher mode is a variant of CTR mode > + using XORs and little-endian addition rather than big-endian arithmetic. > + XCTR mode is used to implement HCTR2. Now that this option isn't user-selectable, no one will see this help text. I think it would be best to remove it, and make sure that the comment in crypto/xctr.c fully explains what XCTR is (currently it's a bit inadequate). > +/* > + * Test vectors generated using https://github.com/google/hctr2 > + */ > +static const struct cipher_testvec aes_xctr_tv_template[] = { [...] > + .klen = 16, > + .len = 255, [...] > + .klen = 16, > + .len = 255, I commented on the test vectors before in the context of the HCTR2 ones, but the same comments apply here: the actual test coverage for the number of test vectors included is not great, due to lengths being repeated. It would be better to vary the lengths a bit more, especially the message lengths. What you have here isn't bad, but I think there's some room for improvement. > +/* > + * XCTR mode is a blockcipher mode of operation used to implement HCTR2. XCTR is > + * closely related to the CTR mode of operation; the main difference is that CTR > + * generates the keystream using E(CTR + IV) whereas XCTR generates the > + * keystream using E(CTR ^ IV). > + * > + * See the HCTR2 paper for more details: > + * Length-preserving encryption with HCTR2 > + * (https://eprint.iacr.org/2021/1441.pdf) > + */ The above comment could use a bit more detail, e.g. mentioning endianness as well as the fact that XCTR avoids having to deal with multi-limb integers. > +static void crypto_xctr_crypt_final(struct skcipher_walk *walk, > + struct crypto_cipher *tfm, u32 byte_ctr) > +{ > + unsigned long alignmask = crypto_cipher_alignmask(tfm); > + u8 tmp[XCTR_BLOCKSIZE + MAX_CIPHER_ALIGNMASK]; > + u8 *keystream = PTR_ALIGN(tmp + 0, alignmask + 1); > + u8 *src = walk->src.virt.addr; > + u8 *dst = walk->dst.virt.addr; > + unsigned int nbytes = walk->nbytes; > + __le32 ctr32 = cpu_to_le32(byte_ctr / XCTR_BLOCKSIZE + 1); > + > + crypto_xor(walk->iv, (u8 *)&ctr32, sizeof(ctr32)); > + crypto_cipher_encrypt_one(tfm, keystream, walk->iv); > + crypto_xor_cpy(dst, keystream, src, nbytes); > + crypto_xor(walk->iv, (u8 *)&ctr32, sizeof(ctr32)); > +} When crypto_cipher_encrypt_one() is used instead of ->cia_encrypt directly, the caller doesn't need to align the buffers, sincec crypto_cipher_encrypt_one() handles that. > +static int crypto_xctr_crypt_segment(struct skcipher_walk *walk, > + struct crypto_cipher *tfm, u32 byte_ctr) > +{ > + void (*fn)(struct crypto_tfm *, u8 *, const u8 *) = > + crypto_cipher_alg(tfm)->cia_encrypt; > + u8 *src = walk->src.virt.addr; > + u8 *dst = walk->dst.virt.addr; > + unsigned int nbytes = walk->nbytes; > + __le32 ctr32 = cpu_to_le32(byte_ctr / XCTR_BLOCKSIZE + 1); > + > + do { > + /* create keystream */ > + crypto_xor(walk->iv, (u8 *)&ctr32, sizeof(ctr32)); > + fn(crypto_cipher_tfm(tfm), dst, walk->iv); > + crypto_xor(dst, src, XCTR_BLOCKSIZE); > + crypto_xor(walk->iv, (u8 *)&ctr32, sizeof(ctr32)); > + > + ctr32++; > + > + src += XCTR_BLOCKSIZE; > + dst += XCTR_BLOCKSIZE; > + } while ((nbytes -= XCTR_BLOCKSIZE) >= XCTR_BLOCKSIZE); > + > + return nbytes; > +} This won't work on big endian systems due to the 'ctr32++' on a __le32 variable. I recommend installing 'sparse' and passing C=2 to make, as it will warn about endianness bugs like this. Either endianness needs to be converted for the increment, or it needs to be converted when doing the XOR. (The latter would work if the XOR is done manually instead of with crypto_xor, which could be a nice optimization separately from fixing the endianness bug.) > + /* Block size must be >= 4 bytes. */ > + err = -EINVAL; > + if (alg->cra_blocksize != XCTR_BLOCKSIZE) > + goto out_free_inst; The comment above is outdated. - Eric
On Thu, Feb 10, 2022 at 11:28:08PM +0000, Nathan Huckleberry wrote: > Changes since v1: > * Rename streamcipher -> xctr > * Rename hash -> polyval > * Use __le64 instead of u64 for little-endian length > * memzero_explicit in set_key > * Use crypto request length instead of scatterlist length for polyval > * Add comments referencing the paper's pseudocode > * Derive blockcipher name from xctr name > * Pass IV through request context > * Use .generic_driver > * Make tests more comprehensive This is looking much better than v1. A few comments below. > +static int hctr2_hash_tweak(struct skcipher_request *req) > +{ > + __le64 tweak_length_block[2]; > + struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); > + const struct hctr2_tfm_ctx *tctx = crypto_skcipher_ctx(tfm); > + struct hctr2_request_ctx *rctx = skcipher_request_ctx(req); > + struct shash_desc *hash_desc = &rctx->u.hash_desc; > + int err; > + > + memset(tweak_length_block, 0, sizeof(tweak_length_block)); > + if (req->cryptlen % POLYVAL_BLOCK_SIZE == 0) > + tweak_length_block[0] = cpu_to_le64(TWEAK_SIZE * 8 * 2 + 2); > + else > + tweak_length_block[0] = cpu_to_le64(TWEAK_SIZE * 8 * 2 + 3); > + > + hash_desc->tfm = tctx->polyval; > + err = crypto_shash_init(hash_desc); > + if (err) > + return err; > + > + err = crypto_shash_update(hash_desc, (u8 *)tweak_length_block, > + sizeof(tweak_length_block)); > + if (err) > + return err; > + return crypto_shash_update(hash_desc, req->iv, TWEAK_SIZE); > +} Have you considered taking advantage of the hash function precomputation that is possible? That should improve the performance a bit, especially on short inputs. Per-key, a hash state could be precomputed containing the tweak_length_block, since it will always have the same contents, due to this implementation only supporting a single tweak length. hctr2_setkey() could compute that and export it using crypto_shash_export() into the hctr2_tfm_ctx. hctr2_crypt() could import it using crypto_shash_import(). Similarly, the tweak only needs to be hashed once per request, as the state could be exported after hashing it the first time, then imported for the second hash step. The saved state would need to be part of the hctr2_request_ctx. > +static int hctr2_finish(struct skcipher_request *req) > +{ > + struct hctr2_request_ctx *rctx = skcipher_request_ctx(req); > + u8 digest[POLYVAL_DIGEST_SIZE]; > + int err; > + > + // U = UU ^ H(T || V) > + err = hctr2_hash_tweak(req); > + if (err) > + return err; > + err = hctr2_hash_message(req, rctx->bulk_part_dst, digest); > + if (err) > + return err; > + crypto_xor(rctx->first_block, digest, BLOCKCIPHER_BLOCK_SIZE); > + > + // Copy U into dst scatterlist > + scatterwalk_map_and_copy(rctx->first_block, req->dst, > + 0, BLOCKCIPHER_BLOCK_SIZE, 1); > + return 0; > +} The comments here and in hctr2_crypt() are assuming encryption, but the code also handles decryption. It would be helpful to give the pseudocode for both, e.g.: // U = UU ^ H(T || V) // or M = MM ^ H(T || N) > +static int hctr2_create_base(struct crypto_template *tmpl, struct rtattr **tb) > +{ > + const char *xctr_name; > + const char *polyval_name; > + char blockcipher_name[CRYPTO_MAX_ALG_NAME]; > + int len; > + > + xctr_name = crypto_attr_alg_name(tb[1]); > + if (IS_ERR(xctr_name)) > + return PTR_ERR(xctr_name); > + > + if (!strncmp(xctr_name, "xctr(", 5)) { > + len = strscpy(blockcipher_name, xctr_name + 5, > + sizeof(blockcipher_name)); > + > + if (len < 1) > + return -EINVAL; > + > + if (blockcipher_name[len - 1] != ')') > + return -EINVAL; > + > + blockcipher_name[len - 1] = 0; > + } else > + return -EINVAL; I don't think this is exactly what Herbert and I had in mind. It's close, but the problem with grabbing the block cipher name from the raw string the user passes in is that the string could be an implementation name like "xctr-aes-ni", rather than an algorithm name like "xctr(aes)". The correct way to do this is to wait to determine the block cipher algorithm until after calling crypto_grab_skcipher(). Then it will be available in the ->cra_name of the skcipher algorithm (that should be xctr). > diff --git a/crypto/testmgr.h b/crypto/testmgr.h > index da3736e51982..a16b631730e9 100644 > --- a/crypto/testmgr.h > +++ b/crypto/testmgr.h > @@ -33630,4 +33630,674 @@ static const struct hash_testvec polyval_tv_template[] = { > }, > }; > > +/* > + * Test vectors generated using https://github.com/google/hctr2 > + */ > +static const struct cipher_testvec aes_hctr2_tv_template[] = { > + .klen = 16, > + .len = 16, > + }, > + { [...] > + .klen = 16, > + .len = 16, > + }, > + { [...] > + .klen = 16, > + .len = 17, > + }, > + { [...] > + .klen = 16, > + .len = 17, > + }, > + { [...] > + .klen = 24, > + .len = 31, > + }, > + { > + .klen = 24, > + .len = 31, > + }, [...] > + { > + .klen = 24, > + .len = 48, > + }, [...] > + .klen = 24, > + .len = 48, [...] > + .klen = 32, > + .len = 128, [...] > + .klen = 32, > + .len = 128, [...] > + .klen = 32, > + .len = 255, [...] > + .klen = 32, > + .len = 255, [...] > + .klen = 32, > + .len = 512, [...] > + .klen = 32, > + .len = 512, These are better but still could use some improvement. There are two test vectors for each message length, but they also have the same key length. That provides little additional test coverage over having just half the test vectors. It would be better to use different key lengths within each pair. Maybe always use klen=32 for one test vector, and randomly choose klen=16 or klen=24 for the other test vector. The overall point is: we can't test a zillion test vectors, so we should make sure to get as most test coverage as possible with the smaller set that will actually be included. - Eric