Message ID | c16e9ab9-13e0-b911-e33a-c9ae81e93a8d@I-love.SAKURA.ne.jp |
---|---|
State | New |
Headers | show |
Series | net: tls: enable __GFP_ZERO upon tls_init() | expand |
On 2023/06/29 6:03, Jakub Kicinski wrote: > On Wed, 28 Jun 2023 22:48:01 +0900 Tetsuo Handa wrote: >> syzbot is reporting uninit-value at aes_encrypt(), for block cipher assumes >> that bytes to encrypt/decrypt is multiple of block size for that cipher but >> tls_alloc_encrypted_msg() is not initializing padding bytes when >> required_size is not multiple of block cipher's block size. > > Sounds odd, so crypto layer reads beyond what we submitted as > the buffer? I don't think the buffer needs to be aligned, so > the missing bits may well fall into a different (unmapped?) page. Since passing __GFP_ZERO to skb_page_frag_refill() hides this problem, I think that crypto layer is reading up to block size when requested size is not multiple of block size. > > This needs more careful investigation. Always zeroing the input > is just covering up the real issue. Since block cipher needs to read up to block size, someone has to initialize padding bytes. I guess that crypto API caller is responsible for allocating and initializing padding bytes, otherwise such crypto API caller will fail to encrypt/decrypt last partial bytes which are not multiple of cipher's block size. Which function in this report is responsible for initializing padding bytes?
On Thu, Jun 29, 2023 at 07:15:21AM +0900, Tetsuo Handa wrote: > On 2023/06/29 6:03, Jakub Kicinski wrote: > > On Wed, 28 Jun 2023 22:48:01 +0900 Tetsuo Handa wrote: > >> syzbot is reporting uninit-value at aes_encrypt(), for block cipher assumes > >> that bytes to encrypt/decrypt is multiple of block size for that cipher but > >> tls_alloc_encrypted_msg() is not initializing padding bytes when > >> required_size is not multiple of block cipher's block size. > > > > Sounds odd, so crypto layer reads beyond what we submitted as > > the buffer? I don't think the buffer needs to be aligned, so > > the missing bits may well fall into a different (unmapped?) page. > > Since passing __GFP_ZERO to skb_page_frag_refill() hides this problem, > I think that crypto layer is reading up to block size when requested > size is not multiple of block size. > > > > > This needs more careful investigation. Always zeroing the input > > is just covering up the real issue. > > Since block cipher needs to read up to block size, someone has to initialize > padding bytes. I guess that crypto API caller is responsible for allocating > and initializing padding bytes, otherwise such crypto API caller will fail to > encrypt/decrypt last partial bytes which are not multiple of cipher's block > size. > > Which function in this report is responsible for initializing padding bytes? According to the sample crash report from https://syzkaller.appspot.com/bug?extid=828dfc12440b4f6f305d, the uninitialized memory access happens while the TLS layer is doing an AES-CCM encryption operation. CCM supports arbitrarily-aligned additional authenticated data and plaintext/ciphertext. Also, an encryption with crypto_aead_encrypt() reads exactly 'assoclen + cryptlen' bytes from the 'src' scatterlist; it's not supposed to ever go past that, even if the data isn't "block aligned". The "aead" API (include/crypto/aead.h) is still confusing and hard to use correctly, though, mainly because of the weird scatterlist layout it expects with the AAD and plaintext/ciphertext concatenated to each other. I wouldn't be surprised if the TLS layer is making some error. What is the exact sequence of crypto_aead_* calls that results in this issue? - Eric
On Wed, 28 Jun 2023 at 15:49, Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote: > > syzbot is reporting uninit-value at aes_encrypt(), for block cipher assumes > that bytes to encrypt/decrypt is multiple of block size for that cipher but > tls_alloc_encrypted_msg() is not initializing padding bytes when > required_size is not multiple of block cipher's block size. > > In order to make sure that padding bytes are automatically initialized, > enable __GFP_ZERO flag when setsockopt(SOL_TCP, TCP_ULP, "tls") is called. > > Reported-by: syzbot <syzbot+828dfc12440b4f6f305d@syzkaller.appspotmail.com> > Closes: https://syzkaller.appspot.com/bug?extid=828dfc12440b4f6f305d Maybe I am missing something, but this syzkaller report appears to b a huge collection of uninit-value reports, of which only one is related to bpx_tx_verdict(), and that one has nor reproducer. So even if this would be the right fix, I don't think closing that issue makes sense, so? The issue refers to a couple of other occcurences of uninit-value in aesti_encrypt, which do seem related. > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > --- > According to C reproducer, this problem happens when bpf_exec_tx_verdict() is > called with lower 4 bits of required_size being 0001 and does not happen when > being 0100. Thus, I assumed that this problem is caused by lack of initializing > padding bytes. > But I couldn't figure out why KMSAN reports this problem when bpf_exec_tx_verdict() > is called with lower 4 bits of required_size being 0001 for the second time and > does not report this problem when bpf_exec_tx_verdict() is called with lower > 4 bits of required_size being 0001 for the first time. More deeper problem exists? > KMSAN reporting this problem when accessing u64 relevant? > As Eric pointed out as well, zeroing the allocation papers over the problem. Why are you sending this now? Do you have a reproducer for this issue? > net/tls/tls_main.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c > index f2e7302a4d96..cd5366966864 100644 > --- a/net/tls/tls_main.c > +++ b/net/tls/tls_main.c > @@ -1025,6 +1025,7 @@ static int tls_init(struct sock *sk) > struct tls_context *ctx; > int rc = 0; > > + sk->sk_allocation |= __GFP_ZERO; > tls_build_proto(sk); > > #ifdef CONFIG_TLS_TOE > -- > 2.34.1 >
On 2023/06/30 18:36, Ard Biesheuvel wrote: > Why are you sending this now? Just because this is currently top crasher and I can reproduce locally. > Do you have a reproducer for this issue? Yes. https://syzkaller.appspot.com/text?tag=ReproC&x=12931621900000 works.
On Fri, 30 Jun 2023 at 11:53, Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote: > > On 2023/06/30 18:36, Ard Biesheuvel wrote: > > Why are you sending this now? > > Just because this is currently top crasher and I can reproduce locally. > > > Do you have a reproducer for this issue? > > Yes. https://syzkaller.appspot.com/text?tag=ReproC&x=12931621900000 works. > Could you please share your kernel config and the resulting kernel log when running the reproducer? I'll try to reproduce locally as well, and see if I can figure out what is going on in the crypto layer
On Fri, Jun 30, 2023 at 12:02 PM Ard Biesheuvel <ardb@kernel.org> wrote: > > On Fri, 30 Jun 2023 at 11:53, Tetsuo Handa > <penguin-kernel@i-love.sakura.ne.jp> wrote: > > > > On 2023/06/30 18:36, Ard Biesheuvel wrote: > > > Why are you sending this now? > > > > Just because this is currently top crasher and I can reproduce locally. > > > > > Do you have a reproducer for this issue? > > > > Yes. https://syzkaller.appspot.com/text?tag=ReproC&x=12931621900000 works. > > > > Could you please share your kernel config and the resulting kernel log > when running the reproducer? I'll try to reproduce locally as well, > and see if I can figure out what is going on in the crypto layer The config together with the repro is available at https://syzkaller.appspot.com/bug?extid=828dfc12440b4f6f305d, see the latest row of the "Crashes" table that contains a C repro. Config: https://syzkaller.appspot.com/text?tag=KernelConfig&x=ee5f7a0b2e48ed66 Report: https://syzkaller.appspot.com/text?tag=CrashReport&x=1325260d900000 Syz repro: https://syzkaller.appspot.com/text?tag=ReproSyz&x=11af973e900000 C repro: https://syzkaller.appspot.com/text?tag=ReproC&x=163a1e45900000 The bug is reproducible for me locally as well (and Tetsuo's patch makes it disappear, although I have no opinion on its correctness).
On Fri, 30 Jun 2023 at 12:11, Alexander Potapenko <glider@google.com> wrote: > > On Fri, Jun 30, 2023 at 12:02 PM Ard Biesheuvel <ardb@kernel.org> wrote: > > > > On Fri, 30 Jun 2023 at 11:53, Tetsuo Handa > > <penguin-kernel@i-love.sakura.ne.jp> wrote: > > > > > > On 2023/06/30 18:36, Ard Biesheuvel wrote: > > > > Why are you sending this now? > > > > > > Just because this is currently top crasher and I can reproduce locally. > > > > > > > Do you have a reproducer for this issue? > > > > > > Yes. https://syzkaller.appspot.com/text?tag=ReproC&x=12931621900000 works. > > > > > > > Could you please share your kernel config and the resulting kernel log > > when running the reproducer? I'll try to reproduce locally as well, > > and see if I can figure out what is going on in the crypto layer > > The config together with the repro is available at > https://syzkaller.appspot.com/bug?extid=828dfc12440b4f6f305d, see the > latest row of the "Crashes" table that contains a C repro. Could you explain why that bug contains ~50 reports that seem entirely unrelated? AIUI, this actual issue has not been reproduced since 2020?? > Config: https://syzkaller.appspot.com/text?tag=KernelConfig&x=ee5f7a0b2e48ed66 > Report: https://syzkaller.appspot.com/text?tag=CrashReport&x=1325260d900000 > Syz repro: https://syzkaller.appspot.com/text?tag=ReproSyz&x=11af973e900000 > C repro: https://syzkaller.appspot.com/text?tag=ReproC&x=163a1e45900000 > > The bug is reproducible for me locally as well (and Tetsuo's patch > makes it disappear, although I have no opinion on its correctness). What I'd like to do is run a kernel plus initrd locally in OVMF and reproduce the issue - can I do that without all the syzkaller machinery?
On 2023/06/30 19:18, Ard Biesheuvel wrote: > On Fri, 30 Jun 2023 at 12:11, Alexander Potapenko <glider@google.com> wrote: >> >> On Fri, Jun 30, 2023 at 12:02 PM Ard Biesheuvel <ardb@kernel.org> wrote: >>> >>> On Fri, 30 Jun 2023 at 11:53, Tetsuo Handa >>> <penguin-kernel@i-love.sakura.ne.jp> wrote: >>>> >>>> On 2023/06/30 18:36, Ard Biesheuvel wrote: >>>>> Why are you sending this now? >>>> >>>> Just because this is currently top crasher and I can reproduce locally. >>>> >>>>> Do you have a reproducer for this issue? >>>> >>>> Yes. https://syzkaller.appspot.com/text?tag=ReproC&x=12931621900000 works. >>>> >>> >>> Could you please share your kernel config and the resulting kernel log >>> when running the reproducer? I'll try to reproduce locally as well, >>> and see if I can figure out what is going on in the crypto layer >> >> The config together with the repro is available at >> https://syzkaller.appspot.com/bug?extid=828dfc12440b4f6f305d, see the >> latest row of the "Crashes" table that contains a C repro. Kernel is commit e6bc8833d80f of https://github.com/google/kmsan/commits/master . Config is available in the dashboard page, but a smaller one is available at https://I-love.SAKURA.ne.jp/tmp/config-6.4.0-rc7-kmsan . I'm using a debug printk() patch shown below. ---------------------------------------- diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 1a53c8f481e9..b32bb015995c 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -1210,7 +1210,8 @@ static int tls_sw_do_sendpage(struct sock *sk, struct page *page, if (!sk_stream_memory_free(sk)) goto wait_for_sndbuf; alloc_payload: - ret = tls_alloc_encrypted_msg(sk, required_size); + ret = tls_alloc_encrypted_msg(sk, required_size); ///// + pr_info("required_size=%d ret=%d\n", required_size, ret); if (ret) { if (ret != -ENOSPC) goto wait_for_memory; @@ -1232,7 +1233,9 @@ static int tls_sw_do_sendpage(struct sock *sk, struct page *page, tls_ctx->pending_open_record_frags = true; if (full_record || eor || sk_msg_full(msg_pl)) { - ret = bpf_exec_tx_verdict(msg_pl, sk, full_record, + pr_info("full_record=%d eor=%d sk_msg_full(msg_pl)=%d copied=%d\n", + full_record, eor, sk_msg_full(msg_pl), copied); + ret = bpf_exec_tx_verdict(msg_pl, sk, full_record, ///// record_type, &copied, flags); if (ret) { if (ret == -EINPROGRESS) ---------------------------------------- Output (on Ubuntu 22.04 on Oracle VM VirtualBox) is shown below. Please check tendency of the sum of required_size= values up to the full_record= line. It seems that the value of required_size= might vary depending on the timings, but the sum of the values seems to have some rule. 4125+8221+12317+16413=41076 (the lower 4 bits are 0100) 2461+6557+10653+14749+16413=50833 (the lower 4 bits are 0001) 2461+6573+10669+14765+16413=50881 (the lower 4 bits are 0001) KMSAN reports this problem when the lower 4 bits became 0001 for the second time. Unless KMSAN's reporting is asynchronous, maybe the reason of "for the second time" part is that the previous state is relevant... ---------------------------------------- [ 157.471712][ T3414] required_size=4125 ret=0 [ 157.475879][ T3414] required_size=8221 ret=0 [ 157.480471][ T3414] required_size=12317 ret=0 [ 157.484604][ T3414] required_size=16413 ret=0 [ 157.490499][ T3414] full_record=1 eor=0 sk_msg_full(msg_pl)=0 copied=4096 [ 157.513772][ T3414] required_size=4125 ret=0 [ 157.523782][ T3414] required_size=8221 ret=0 [ 157.533658][ T3414] required_size=12317 ret=0 [ 157.539579][ T3414] required_size=16413 ret=0 [ 157.543785][ T3414] full_record=1 eor=0 sk_msg_full(msg_pl)=0 copied=4096 [ 157.572869][ T3414] required_size=4125 ret=0 [ 157.579350][ T3414] required_size=8221 ret=0 [ 157.584699][ T3414] required_size=12317 ret=0 [ 157.591756][ T3414] required_size=16413 ret=0 [ 157.595891][ T3414] full_record=1 eor=0 sk_msg_full(msg_pl)=0 copied=4096 [ 157.790734][ T3424] required_size=2461 ret=0 [ 157.800725][ T3424] required_size=6557 ret=0 [ 157.804560][ T3424] required_size=10653 ret=0 [ 157.808433][ T3424] required_size=14749 ret=0 [ 157.810125][ T3424] required_size=16413 ret=0 [ 157.829564][ T3424] full_record=1 eor=0 sk_msg_full(msg_pl)=0 copied=1664 [ 157.848397][ T3424] required_size=2461 ret=0 [ 157.854875][ T3424] required_size=6573 ret=0 [ 157.860883][ T3424] required_size=10669 ret=0 [ 157.865463][ T3424] required_size=14765 ret=0 [ 157.871794][ T3424] required_size=16413 ret=0 [ 157.877333][ T3424] full_record=1 eor=0 sk_msg_full(msg_pl)=0 copied=1648 [ 157.885187][ T3424] ===================================================== [ 157.887262][ T3424] BUG: KMSAN: uninit-value in aes_encrypt+0x1692/0x1fa0 [ 157.887262][ T3424] aes_encrypt+0x1692/0x1fa0 [ 157.887262][ T3424] aesti_encrypt+0xe1/0x160 [ 157.887262][ T3424] crypto_cipher_encrypt_one+0x1d1/0x2e0 [ 157.887262][ T3424] crypto_cbcmac_digest_update+0x3ff/0x5a0 [ 157.887262][ T3424] shash_ahash_finup+0x79d/0xd00 [ 157.887262][ T3424] shash_async_finup+0xbf/0x110 [ 157.887262][ T3424] crypto_ahash_finup+0x244/0x500 [ 157.887262][ T3424] crypto_ccm_auth+0x14df/0x15a0 [ 157.887262][ T3424] crypto_ccm_encrypt+0x2ad/0x8b0 [ 157.887262][ T3424] crypto_aead_encrypt+0x116/0x1a0 [ 157.887262][ T3424] tls_push_record+0x2bbe/0x3bf0 [ 157.887262][ T3424] bpf_exec_tx_verdict+0x5ba/0x2530 [ 157.887262][ T3424] tls_sw_do_sendpage+0x1779/0x21f0 [ 157.887262][ T3424] tls_sw_sendpage+0x247/0x2b0 [ 157.887262][ T3424] inet_sendpage+0x1de/0x2f0 [ 157.887262][ T3424] kernel_sendpage+0x4cc/0x940 [ 158.004827][ T3424] sock_sendpage+0x162/0x220 [ 158.004827][ T3424] pipe_to_sendpage+0x3df/0x4f0 [ 158.004827][ T3424] __splice_from_pipe+0x5c7/0x1010 [ 158.004827][ T3424] generic_splice_sendpage+0x1c6/0x2a0 [ 158.004827][ T3424] do_splice+0x26d8/0x32f0 [ 158.004827][ T3424] __se_sys_splice+0x81f/0xba0 [ 158.004827][ T3424] __x64_sys_splice+0x1a1/0x200 [ 158.004827][ T3424] do_syscall_64+0x41/0x90 [ 158.004827][ T3424] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 158.004827][ T3424] [ 158.004827][ T3424] Uninit was stored to memory at: [ 158.004827][ T3424] __crypto_xor+0x285/0x1700 [ 158.004827][ T3424] crypto_cbcmac_digest_update+0x2ba/0x5a0 [ 158.004827][ T3424] shash_ahash_finup+0x79d/0xd00 [ 158.004827][ T3424] shash_async_finup+0xbf/0x110 [ 158.004827][ T3424] crypto_ahash_finup+0x244/0x500 [ 158.004827][ T3424] crypto_ccm_auth+0x14df/0x15a0 [ 158.004827][ T3424] crypto_ccm_encrypt+0x2ad/0x8b0 [ 158.004827][ T3424] crypto_aead_encrypt+0x116/0x1a0 [ 158.004827][ T3424] tls_push_record+0x2bbe/0x3bf0 [ 158.004827][ T3424] bpf_exec_tx_verdict+0x5ba/0x2530 [ 158.004827][ T3424] tls_sw_do_sendpage+0x1779/0x21f0 [ 158.004827][ T3424] tls_sw_sendpage+0x247/0x2b0 [ 158.004827][ T3424] inet_sendpage+0x1de/0x2f0 [ 158.004827][ T3424] kernel_sendpage+0x4cc/0x940 [ 158.004827][ T3424] sock_sendpage+0x162/0x220 [ 158.004827][ T3424] pipe_to_sendpage+0x3df/0x4f0 [ 158.004827][ T3424] __splice_from_pipe+0x5c7/0x1010 [ 158.004827][ T3424] generic_splice_sendpage+0x1c6/0x2a0 [ 158.004827][ T3424] do_splice+0x26d8/0x32f0 [ 158.004827][ T3424] __se_sys_splice+0x81f/0xba0 [ 158.004827][ T3424] __x64_sys_splice+0x1a1/0x200 [ 158.004827][ T3424] do_syscall_64+0x41/0x90 [ 158.004827][ T3424] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 158.004827][ T3424] [ 158.004827][ T3424] Uninit was created at: [ 158.004827][ T3424] __alloc_pages+0x925/0x1050 [ 158.004827][ T3424] alloc_pages+0xe30/0x11b0 [ 158.004827][ T3424] skb_page_frag_refill+0x362/0x910 [ 158.004827][ T3424] sk_page_frag_refill+0xa2/0x1c0 [ 158.004827][ T3424] sk_msg_alloc+0x278/0x1560 [ 158.004827][ T3424] tls_sw_do_sendpage+0xbec/0x21f0 [ 158.004827][ T3424] tls_sw_sendpage+0x247/0x2b0 [ 158.004827][ T3424] inet_sendpage+0x1de/0x2f0 [ 158.004827][ T3424] kernel_sendpage+0x4cc/0x940 [ 158.004827][ T3424] sock_sendpage+0x162/0x220 [ 158.004827][ T3424] pipe_to_sendpage+0x3df/0x4f0 [ 158.004827][ T3424] __splice_from_pipe+0x5c7/0x1010 [ 158.004827][ T3424] generic_splice_sendpage+0x1c6/0x2a0 [ 158.260226][ T3424] do_splice+0x26d8/0x32f0 [ 158.260226][ T3424] __se_sys_splice+0x81f/0xba0 [ 158.260226][ T3424] __x64_sys_splice+0x1a1/0x200 [ 158.260226][ T3424] do_syscall_64+0x41/0x90 [ 158.260226][ T3424] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 158.260226][ T3424] [ 158.260226][ T3424] CPU: 7 PID: 3424 Comm: a.out Not tainted 6.4.0-rc7-ge6bc8833d80f-dirty #26 [ 158.260226][ T3424] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 158.260226][ T3424] ===================================================== [ 158.260226][ T3424] Disabling lock debugging due to kernel taint [ 158.260226][ T3424] Kernel panic - not syncing: kmsan.panic set ... [ 158.260226][ T3424] CPU: 7 PID: 3424 Comm: a.out Tainted: G B 6.4.0-rc7-ge6bc8833d80f-dirty #26 [ 158.320898][ T3424] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 158.334186][ T3424] Call Trace: [ 158.334186][ T3424] <TASK> [ 158.334186][ T3424] dump_stack_lvl+0x1f6/0x280 [ 158.334186][ T3424] dump_stack+0x29/0x30 [ 158.334186][ T3424] panic+0x4e7/0xc60 [ 158.334186][ T3424] ? add_taint+0x185/0x210 [ 158.334186][ T3424] kmsan_report+0x2d1/0x2e0 [ 158.334186][ T3424] ? __msan_warning+0x98/0x120 [ 158.334186][ T3424] ? aes_encrypt+0x1692/0x1fa0 [ 158.334186][ T3424] ? aesti_encrypt+0xe1/0x160 [ 158.334186][ T3424] ? crypto_cipher_encrypt_one+0x1d1/0x2e0 [ 158.334186][ T3424] ? crypto_cbcmac_digest_update+0x3ff/0x5a0 [ 158.334186][ T3424] ? shash_ahash_finup+0x79d/0xd00 [ 158.334186][ T3424] ? shash_async_finup+0xbf/0x110 [ 158.334186][ T3424] ? crypto_ahash_finup+0x244/0x500 [ 158.334186][ T3424] ? crypto_ccm_auth+0x14df/0x15a0 [ 158.334186][ T3424] ? crypto_ccm_encrypt+0x2ad/0x8b0 [ 158.334186][ T3424] ? crypto_aead_encrypt+0x116/0x1a0 [ 158.334186][ T3424] ? tls_push_record+0x2bbe/0x3bf0 [ 158.334186][ T3424] ? bpf_exec_tx_verdict+0x5ba/0x2530 [ 158.334186][ T3424] ? tls_sw_do_sendpage+0x1779/0x21f0 [ 158.334186][ T3424] ? tls_sw_sendpage+0x247/0x2b0 [ 158.334186][ T3424] ? inet_sendpage+0x1de/0x2f0 [ 158.334186][ T3424] ? kernel_sendpage+0x4cc/0x940 [ 158.334186][ T3424] ? sock_sendpage+0x162/0x220 [ 158.334186][ T3424] ? pipe_to_sendpage+0x3df/0x4f0 [ 158.334186][ T3424] ? __splice_from_pipe+0x5c7/0x1010 [ 158.334186][ T3424] ? generic_splice_sendpage+0x1c6/0x2a0 [ 158.334186][ T3424] ? do_splice+0x26d8/0x32f0 [ 158.334186][ T3424] ? __se_sys_splice+0x81f/0xba0 [ 158.334186][ T3424] ? __x64_sys_splice+0x1a1/0x200 [ 158.334186][ T3424] ? do_syscall_64+0x41/0x90 [ 158.334186][ T3424] ? entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 158.334186][ T3424] ? filter_irq_stacks+0xb9/0x230 [ 158.334186][ T3424] ? __stack_depot_save+0x22/0x490 [ 158.334186][ T3424] ? kmsan_internal_set_shadow_origin+0x66/0xe0 [ 158.334186][ T3424] ? kmsan_internal_chain_origin+0x110/0x120 [ 158.334186][ T3424] ? kmsan_get_shadow_origin_ptr+0x4d/0xa0 [ 158.334186][ T3424] __msan_warning+0x98/0x120 [ 158.334186][ T3424] aes_encrypt+0x1692/0x1fa0 [ 158.334186][ T3424] aesti_encrypt+0xe1/0x160 [ 158.334186][ T3424] crypto_cipher_encrypt_one+0x1d1/0x2e0 [ 158.334186][ T3424] ? aesti_set_key+0xb0/0xb0 [ 158.334186][ T3424] ? kmsan_get_shadow_origin_ptr+0x4d/0xa0 [ 158.334186][ T3424] crypto_cbcmac_digest_update+0x3ff/0x5a0 [ 158.334186][ T3424] ? crypto_cbcmac_digest_init+0x140/0x140 [ 158.334186][ T3424] shash_ahash_finup+0x79d/0xd00 [ 158.334186][ T3424] ? kmsan_get_shadow_origin_ptr+0x4d/0xa0 [ 158.334186][ T3424] shash_async_finup+0xbf/0x110 [ 158.334186][ T3424] crypto_ahash_finup+0x244/0x500 [ 158.334186][ T3424] ? shash_async_final+0x3d0/0x3d0 [ 158.334186][ T3424] crypto_ccm_auth+0x14df/0x15a0 [ 158.334186][ T3424] crypto_ccm_encrypt+0x2ad/0x8b0 [ 158.334186][ T3424] ? kmsan_get_shadow_origin_ptr+0x4d/0xa0 [ 158.334186][ T3424] ? crypto_ccm_setauthsize+0x100/0x100 [ 158.334186][ T3424] crypto_aead_encrypt+0x116/0x1a0 [ 158.653332][ T3424] tls_push_record+0x2bbe/0x3bf0 [ 158.653332][ T3424] bpf_exec_tx_verdict+0x5ba/0x2530 [ 158.653332][ T3424] ? _printk+0x181/0x1b0 [ 158.653332][ T3424] ? tls_sw_do_sendpage+0xc81/0x21f0 [ 158.653332][ T3424] tls_sw_do_sendpage+0x1779/0x21f0 [ 158.653332][ T3424] tls_sw_sendpage+0x247/0x2b0 [ 158.653332][ T3424] ? tls_sw_do_sendpage+0x21f0/0x21f0 [ 158.653332][ T3424] inet_sendpage+0x1de/0x2f0 [ 158.653332][ T3424] ? inet_sendmsg+0x1d0/0x1d0 [ 158.653332][ T3424] kernel_sendpage+0x4cc/0x940 [ 158.653332][ T3424] sock_sendpage+0x162/0x220 [ 158.653332][ T3424] pipe_to_sendpage+0x3df/0x4f0 [ 158.653332][ T3424] ? sock_fasync+0x240/0x240 [ 158.653332][ T3424] __splice_from_pipe+0x5c7/0x1010 [ 158.653332][ T3424] ? generic_splice_sendpage+0x2a0/0x2a0 [ 158.653332][ T3424] generic_splice_sendpage+0x1c6/0x2a0 [ 158.653332][ T3424] ? iter_file_splice_write+0x1a30/0x1a30 [ 158.653332][ T3424] do_splice+0x26d8/0x32f0 [ 158.653332][ T3424] ? kmsan_get_shadow_origin_ptr+0x4d/0xa0 [ 158.653332][ T3424] ? __se_sys_splice+0x292/0xba0 [ 158.653332][ T3424] ? __msan_metadata_ptr_for_load_8+0x24/0x40 [ 158.653332][ T3424] ? filter_irq_stacks+0xb9/0x230 [ 158.653332][ T3424] __se_sys_splice+0x81f/0xba0 [ 158.870673][ T3424] __x64_sys_splice+0x1a1/0x200 [ 158.870673][ T3424] do_syscall_64+0x41/0x90 [ 158.870673][ T3424] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 158.870673][ T3424] RIP: 0033:0x7f6bbd51ea3d [ 158.895223][ T3424] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c3 a3 0f 00 f7 d8 64 89 01 48 [ 158.895223][ T3424] RSP: 002b:00007f6bbd731e08 EFLAGS: 00000246 ORIG_RAX: 0000000000000113 [ 158.895223][ T3424] RAX: ffffffffffffffda RBX: 000055ccd9ea6080 RCX: 00007f6bbd51ea3d [ 158.895223][ T3424] RDX: 0000000000000004 RSI: 0000000000000000 RDI: 0000000000000003 [ 158.895223][ T3424] RBP: 000055ccd9ea41f4 R08: 00080000fffffffc R09: 0000000000000000 [ 158.895223][ T3424] R10: 0000000000000000 R11: 0000000000000246 R12: 0100000000000000 [ 158.895223][ T3424] R13: e65b75b4ec4292eb R14: f2300cdb85a45425 R15: 000055ccd9ea6088 [ 159.041467][ T3424] </TASK> [ 159.041467][ T3424] Kernel Offset: disabled [ 159.041467][ T3424] Rebooting in 10 seconds.. ---------------------------------------- > > Could you explain why that bug contains ~50 reports that seem entirely > unrelated? AIUI, this actual issue has not been reproduced since > 2020?? Multiple different bugs are reported as the same problem. Reproducer is available for only bpf_exec_tx_verdict() one, and the reproducer still works. > > >> Config: https://syzkaller.appspot.com/text?tag=KernelConfig&x=ee5f7a0b2e48ed66 >> Report: https://syzkaller.appspot.com/text?tag=CrashReport&x=1325260d900000 >> Syz repro: https://syzkaller.appspot.com/text?tag=ReproSyz&x=11af973e900000 >> C repro: https://syzkaller.appspot.com/text?tag=ReproC&x=163a1e45900000 >> >> The bug is reproducible for me locally as well (and Tetsuo's patch >> makes it disappear, although I have no opinion on its correctness). > > What I'd like to do is run a kernel plus initrd locally in OVMF and > reproduce the issue - can I do that without all the syzkaller > machinery? I'm using Ubuntu 22.04 on Oracle VM VirtualBox. I don't know if this can be reproduced with kernel plus initrd only. But since the C reproducer is standalone, syzkaller machinery is not involved.
On Fri, Jun 30, 2023 at 1:11 PM Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote: > > On 2023/06/30 19:18, Ard Biesheuvel wrote: > > On Fri, 30 Jun 2023 at 12:11, Alexander Potapenko <glider@google.com> wrote: > >> > >> On Fri, Jun 30, 2023 at 12:02 PM Ard Biesheuvel <ardb@kernel.org> wrote: > >>> > >>> On Fri, 30 Jun 2023 at 11:53, Tetsuo Handa > >>> <penguin-kernel@i-love.sakura.ne.jp> wrote: > >>>> > >>>> On 2023/06/30 18:36, Ard Biesheuvel wrote: > >>>>> Why are you sending this now? > >>>> > >>>> Just because this is currently top crasher and I can reproduce locally. > >>>> > >>>>> Do you have a reproducer for this issue? > >>>> > >>>> Yes. https://syzkaller.appspot.com/text?tag=ReproC&x=12931621900000 works. > >>>> > >>> > >>> Could you please share your kernel config and the resulting kernel log > >>> when running the reproducer? I'll try to reproduce locally as well, > >>> and see if I can figure out what is going on in the crypto layer > >> > >> The config together with the repro is available at > >> https://syzkaller.appspot.com/bug?extid=828dfc12440b4f6f305d, see the > >> latest row of the "Crashes" table that contains a C repro. > > Kernel is commit e6bc8833d80f of https://github.com/google/kmsan/commits/master . > Config is available in the dashboard page, but a smaller one is available at > https://I-love.SAKURA.ne.jp/tmp/config-6.4.0-rc7-kmsan . > > I'm using a debug printk() patch shown below. > Please note that your patch is not correct, unless I am missing something. sk_page_frag() will use a per-thread frag allocator (current->task_frag), which might be fed with pages allocated from other sockets (standard TCP ones), without __GFP_ZERO. If we must (and I am not saying we should) take this route, we also need to force "sk->sk_use_task_frag = false" for TLS sockets.
On Fri, 30 Jun 2023 at 13:12, Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote: > > On 2023/06/30 19:18, Ard Biesheuvel wrote: > > On Fri, 30 Jun 2023 at 12:11, Alexander Potapenko <glider@google.com> wrote: > >> > >> On Fri, Jun 30, 2023 at 12:02 PM Ard Biesheuvel <ardb@kernel.org> wrote: > >>> > >>> On Fri, 30 Jun 2023 at 11:53, Tetsuo Handa > >>> <penguin-kernel@i-love.sakura.ne.jp> wrote: > >>>> > >>>> On 2023/06/30 18:36, Ard Biesheuvel wrote: > >>>>> Why are you sending this now? > >>>> > >>>> Just because this is currently top crasher and I can reproduce locally. > >>>> > >>>>> Do you have a reproducer for this issue? > >>>> > >>>> Yes. https://syzkaller.appspot.com/text?tag=ReproC&x=12931621900000 works. > >>>> > >>> > >>> Could you please share your kernel config and the resulting kernel log > >>> when running the reproducer? I'll try to reproduce locally as well, > >>> and see if I can figure out what is going on in the crypto layer > >> > >> The config together with the repro is available at > >> https://syzkaller.appspot.com/bug?extid=828dfc12440b4f6f305d, see the > >> latest row of the "Crashes" table that contains a C repro. > > Kernel is commit e6bc8833d80f of https://github.com/google/kmsan/commits/master . That commit does not exist in that repo. Does it matter? > Config is available in the dashboard page, but a smaller one is available at > https://I-love.SAKURA.ne.jp/tmp/config-6.4.0-rc7-kmsan . > Thanks - I'll try to rebuild v6.4-rc7 with that. > I'm using a debug printk() patch shown below. > > ---------------------------------------- > diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c > index 1a53c8f481e9..b32bb015995c 100644 > --- a/net/tls/tls_sw.c > +++ b/net/tls/tls_sw.c > @@ -1210,7 +1210,8 @@ static int tls_sw_do_sendpage(struct sock *sk, struct page *page, > if (!sk_stream_memory_free(sk)) > goto wait_for_sndbuf; > alloc_payload: > - ret = tls_alloc_encrypted_msg(sk, required_size); > + ret = tls_alloc_encrypted_msg(sk, required_size); ///// > + pr_info("required_size=%d ret=%d\n", required_size, ret); > if (ret) { > if (ret != -ENOSPC) > goto wait_for_memory; > @@ -1232,7 +1233,9 @@ static int tls_sw_do_sendpage(struct sock *sk, struct page *page, > > tls_ctx->pending_open_record_frags = true; > if (full_record || eor || sk_msg_full(msg_pl)) { > - ret = bpf_exec_tx_verdict(msg_pl, sk, full_record, > + pr_info("full_record=%d eor=%d sk_msg_full(msg_pl)=%d copied=%d\n", > + full_record, eor, sk_msg_full(msg_pl), copied); > + ret = bpf_exec_tx_verdict(msg_pl, sk, full_record, ///// > record_type, &copied, flags); > if (ret) { > if (ret == -EINPROGRESS) > ---------------------------------------- > > Output (on Ubuntu 22.04 on Oracle VM VirtualBox) is shown below. > Please check tendency of the sum of required_size= values up to the full_record= line. > It seems that the value of required_size= might vary depending on the timings, but > the sum of the values seems to have some rule. > > 4125+8221+12317+16413=41076 (the lower 4 bits are 0100) > 2461+6557+10653+14749+16413=50833 (the lower 4 bits are 0001) > 2461+6573+10669+14765+16413=50881 (the lower 4 bits are 0001) > > KMSAN reports this problem when the lower 4 bits became 0001 for the second time. > Unless KMSAN's reporting is asynchronous, maybe the reason of "for the second time" > part is that the previous state is relevant... > > ---------------------------------------- > [ 157.471712][ T3414] required_size=4125 ret=0 > [ 157.475879][ T3414] required_size=8221 ret=0 > [ 157.480471][ T3414] required_size=12317 ret=0 > [ 157.484604][ T3414] required_size=16413 ret=0 > [ 157.490499][ T3414] full_record=1 eor=0 sk_msg_full(msg_pl)=0 copied=4096 > [ 157.513772][ T3414] required_size=4125 ret=0 > [ 157.523782][ T3414] required_size=8221 ret=0 > [ 157.533658][ T3414] required_size=12317 ret=0 > [ 157.539579][ T3414] required_size=16413 ret=0 > [ 157.543785][ T3414] full_record=1 eor=0 sk_msg_full(msg_pl)=0 copied=4096 > [ 157.572869][ T3414] required_size=4125 ret=0 > [ 157.579350][ T3414] required_size=8221 ret=0 > [ 157.584699][ T3414] required_size=12317 ret=0 > [ 157.591756][ T3414] required_size=16413 ret=0 > [ 157.595891][ T3414] full_record=1 eor=0 sk_msg_full(msg_pl)=0 copied=4096 > [ 157.790734][ T3424] required_size=2461 ret=0 > [ 157.800725][ T3424] required_size=6557 ret=0 > [ 157.804560][ T3424] required_size=10653 ret=0 > [ 157.808433][ T3424] required_size=14749 ret=0 > [ 157.810125][ T3424] required_size=16413 ret=0 > [ 157.829564][ T3424] full_record=1 eor=0 sk_msg_full(msg_pl)=0 copied=1664 > [ 157.848397][ T3424] required_size=2461 ret=0 > [ 157.854875][ T3424] required_size=6573 ret=0 > [ 157.860883][ T3424] required_size=10669 ret=0 > [ 157.865463][ T3424] required_size=14765 ret=0 > [ 157.871794][ T3424] required_size=16413 ret=0 > [ 157.877333][ T3424] full_record=1 eor=0 sk_msg_full(msg_pl)=0 copied=1648 > [ 157.885187][ T3424] ===================================================== > [ 157.887262][ T3424] BUG: KMSAN: uninit-value in aes_encrypt+0x1692/0x1fa0 > [ 157.887262][ T3424] aes_encrypt+0x1692/0x1fa0 > [ 157.887262][ T3424] aesti_encrypt+0xe1/0x160 > [ 157.887262][ T3424] crypto_cipher_encrypt_one+0x1d1/0x2e0 > [ 157.887262][ T3424] crypto_cbcmac_digest_update+0x3ff/0x5a0 > [ 157.887262][ T3424] shash_ahash_finup+0x79d/0xd00 > [ 157.887262][ T3424] shash_async_finup+0xbf/0x110 > [ 157.887262][ T3424] crypto_ahash_finup+0x244/0x500 > [ 157.887262][ T3424] crypto_ccm_auth+0x14df/0x15a0 > [ 157.887262][ T3424] crypto_ccm_encrypt+0x2ad/0x8b0 > [ 157.887262][ T3424] crypto_aead_encrypt+0x116/0x1a0 > [ 157.887262][ T3424] tls_push_record+0x2bbe/0x3bf0 > [ 157.887262][ T3424] bpf_exec_tx_verdict+0x5ba/0x2530 > [ 157.887262][ T3424] tls_sw_do_sendpage+0x1779/0x21f0 > [ 157.887262][ T3424] tls_sw_sendpage+0x247/0x2b0 > [ 157.887262][ T3424] inet_sendpage+0x1de/0x2f0 > [ 157.887262][ T3424] kernel_sendpage+0x4cc/0x940 > [ 158.004827][ T3424] sock_sendpage+0x162/0x220 > [ 158.004827][ T3424] pipe_to_sendpage+0x3df/0x4f0 > [ 158.004827][ T3424] __splice_from_pipe+0x5c7/0x1010 > [ 158.004827][ T3424] generic_splice_sendpage+0x1c6/0x2a0 > [ 158.004827][ T3424] do_splice+0x26d8/0x32f0 > [ 158.004827][ T3424] __se_sys_splice+0x81f/0xba0 > [ 158.004827][ T3424] __x64_sys_splice+0x1a1/0x200 > [ 158.004827][ T3424] do_syscall_64+0x41/0x90 > [ 158.004827][ T3424] entry_SYSCALL_64_after_hwframe+0x63/0xcd > [ 158.004827][ T3424] > [ 158.004827][ T3424] Uninit was stored to memory at: > [ 158.004827][ T3424] __crypto_xor+0x285/0x1700 > [ 158.004827][ T3424] crypto_cbcmac_digest_update+0x2ba/0x5a0 > [ 158.004827][ T3424] shash_ahash_finup+0x79d/0xd00 > [ 158.004827][ T3424] shash_async_finup+0xbf/0x110 > [ 158.004827][ T3424] crypto_ahash_finup+0x244/0x500 > [ 158.004827][ T3424] crypto_ccm_auth+0x14df/0x15a0 > [ 158.004827][ T3424] crypto_ccm_encrypt+0x2ad/0x8b0 > [ 158.004827][ T3424] crypto_aead_encrypt+0x116/0x1a0 > [ 158.004827][ T3424] tls_push_record+0x2bbe/0x3bf0 > [ 158.004827][ T3424] bpf_exec_tx_verdict+0x5ba/0x2530 > [ 158.004827][ T3424] tls_sw_do_sendpage+0x1779/0x21f0 > [ 158.004827][ T3424] tls_sw_sendpage+0x247/0x2b0 > [ 158.004827][ T3424] inet_sendpage+0x1de/0x2f0 > [ 158.004827][ T3424] kernel_sendpage+0x4cc/0x940 > [ 158.004827][ T3424] sock_sendpage+0x162/0x220 > [ 158.004827][ T3424] pipe_to_sendpage+0x3df/0x4f0 > [ 158.004827][ T3424] __splice_from_pipe+0x5c7/0x1010 > [ 158.004827][ T3424] generic_splice_sendpage+0x1c6/0x2a0 > [ 158.004827][ T3424] do_splice+0x26d8/0x32f0 > [ 158.004827][ T3424] __se_sys_splice+0x81f/0xba0 > [ 158.004827][ T3424] __x64_sys_splice+0x1a1/0x200 > [ 158.004827][ T3424] do_syscall_64+0x41/0x90 > [ 158.004827][ T3424] entry_SYSCALL_64_after_hwframe+0x63/0xcd > [ 158.004827][ T3424] > [ 158.004827][ T3424] Uninit was created at: > [ 158.004827][ T3424] __alloc_pages+0x925/0x1050 > [ 158.004827][ T3424] alloc_pages+0xe30/0x11b0 > [ 158.004827][ T3424] skb_page_frag_refill+0x362/0x910 > [ 158.004827][ T3424] sk_page_frag_refill+0xa2/0x1c0 > [ 158.004827][ T3424] sk_msg_alloc+0x278/0x1560 > [ 158.004827][ T3424] tls_sw_do_sendpage+0xbec/0x21f0 > [ 158.004827][ T3424] tls_sw_sendpage+0x247/0x2b0 > [ 158.004827][ T3424] inet_sendpage+0x1de/0x2f0 > [ 158.004827][ T3424] kernel_sendpage+0x4cc/0x940 > [ 158.004827][ T3424] sock_sendpage+0x162/0x220 > [ 158.004827][ T3424] pipe_to_sendpage+0x3df/0x4f0 > [ 158.004827][ T3424] __splice_from_pipe+0x5c7/0x1010 > [ 158.004827][ T3424] generic_splice_sendpage+0x1c6/0x2a0 > [ 158.260226][ T3424] do_splice+0x26d8/0x32f0 > [ 158.260226][ T3424] __se_sys_splice+0x81f/0xba0 > [ 158.260226][ T3424] __x64_sys_splice+0x1a1/0x200 > [ 158.260226][ T3424] do_syscall_64+0x41/0x90 > [ 158.260226][ T3424] entry_SYSCALL_64_after_hwframe+0x63/0xcd > [ 158.260226][ T3424] > [ 158.260226][ T3424] CPU: 7 PID: 3424 Comm: a.out Not tainted 6.4.0-rc7-ge6bc8833d80f-dirty #26 > [ 158.260226][ T3424] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 > [ 158.260226][ T3424] ===================================================== > [ 158.260226][ T3424] Disabling lock debugging due to kernel taint > [ 158.260226][ T3424] Kernel panic - not syncing: kmsan.panic set ... > [ 158.260226][ T3424] CPU: 7 PID: 3424 Comm: a.out Tainted: G B 6.4.0-rc7-ge6bc8833d80f-dirty #26 > [ 158.320898][ T3424] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 > [ 158.334186][ T3424] Call Trace: > [ 158.334186][ T3424] <TASK> > [ 158.334186][ T3424] dump_stack_lvl+0x1f6/0x280 > [ 158.334186][ T3424] dump_stack+0x29/0x30 > [ 158.334186][ T3424] panic+0x4e7/0xc60 > [ 158.334186][ T3424] ? add_taint+0x185/0x210 > [ 158.334186][ T3424] kmsan_report+0x2d1/0x2e0 > [ 158.334186][ T3424] ? __msan_warning+0x98/0x120 > [ 158.334186][ T3424] ? aes_encrypt+0x1692/0x1fa0 > [ 158.334186][ T3424] ? aesti_encrypt+0xe1/0x160 > [ 158.334186][ T3424] ? crypto_cipher_encrypt_one+0x1d1/0x2e0 > [ 158.334186][ T3424] ? crypto_cbcmac_digest_update+0x3ff/0x5a0 > [ 158.334186][ T3424] ? shash_ahash_finup+0x79d/0xd00 > [ 158.334186][ T3424] ? shash_async_finup+0xbf/0x110 > [ 158.334186][ T3424] ? crypto_ahash_finup+0x244/0x500 > [ 158.334186][ T3424] ? crypto_ccm_auth+0x14df/0x15a0 > [ 158.334186][ T3424] ? crypto_ccm_encrypt+0x2ad/0x8b0 > [ 158.334186][ T3424] ? crypto_aead_encrypt+0x116/0x1a0 > [ 158.334186][ T3424] ? tls_push_record+0x2bbe/0x3bf0 > [ 158.334186][ T3424] ? bpf_exec_tx_verdict+0x5ba/0x2530 > [ 158.334186][ T3424] ? tls_sw_do_sendpage+0x1779/0x21f0 > [ 158.334186][ T3424] ? tls_sw_sendpage+0x247/0x2b0 > [ 158.334186][ T3424] ? inet_sendpage+0x1de/0x2f0 > [ 158.334186][ T3424] ? kernel_sendpage+0x4cc/0x940 > [ 158.334186][ T3424] ? sock_sendpage+0x162/0x220 > [ 158.334186][ T3424] ? pipe_to_sendpage+0x3df/0x4f0 > [ 158.334186][ T3424] ? __splice_from_pipe+0x5c7/0x1010 > [ 158.334186][ T3424] ? generic_splice_sendpage+0x1c6/0x2a0 > [ 158.334186][ T3424] ? do_splice+0x26d8/0x32f0 > [ 158.334186][ T3424] ? __se_sys_splice+0x81f/0xba0 > [ 158.334186][ T3424] ? __x64_sys_splice+0x1a1/0x200 > [ 158.334186][ T3424] ? do_syscall_64+0x41/0x90 > [ 158.334186][ T3424] ? entry_SYSCALL_64_after_hwframe+0x63/0xcd > [ 158.334186][ T3424] ? filter_irq_stacks+0xb9/0x230 > [ 158.334186][ T3424] ? __stack_depot_save+0x22/0x490 > [ 158.334186][ T3424] ? kmsan_internal_set_shadow_origin+0x66/0xe0 > [ 158.334186][ T3424] ? kmsan_internal_chain_origin+0x110/0x120 > [ 158.334186][ T3424] ? kmsan_get_shadow_origin_ptr+0x4d/0xa0 > [ 158.334186][ T3424] __msan_warning+0x98/0x120 > [ 158.334186][ T3424] aes_encrypt+0x1692/0x1fa0 > [ 158.334186][ T3424] aesti_encrypt+0xe1/0x160 > [ 158.334186][ T3424] crypto_cipher_encrypt_one+0x1d1/0x2e0 > [ 158.334186][ T3424] ? aesti_set_key+0xb0/0xb0 > [ 158.334186][ T3424] ? kmsan_get_shadow_origin_ptr+0x4d/0xa0 > [ 158.334186][ T3424] crypto_cbcmac_digest_update+0x3ff/0x5a0 > [ 158.334186][ T3424] ? crypto_cbcmac_digest_init+0x140/0x140 > [ 158.334186][ T3424] shash_ahash_finup+0x79d/0xd00 > [ 158.334186][ T3424] ? kmsan_get_shadow_origin_ptr+0x4d/0xa0 > [ 158.334186][ T3424] shash_async_finup+0xbf/0x110 > [ 158.334186][ T3424] crypto_ahash_finup+0x244/0x500 > [ 158.334186][ T3424] ? shash_async_final+0x3d0/0x3d0 > [ 158.334186][ T3424] crypto_ccm_auth+0x14df/0x15a0 > [ 158.334186][ T3424] crypto_ccm_encrypt+0x2ad/0x8b0 > [ 158.334186][ T3424] ? kmsan_get_shadow_origin_ptr+0x4d/0xa0 > [ 158.334186][ T3424] ? crypto_ccm_setauthsize+0x100/0x100 > [ 158.334186][ T3424] crypto_aead_encrypt+0x116/0x1a0 > [ 158.653332][ T3424] tls_push_record+0x2bbe/0x3bf0 > [ 158.653332][ T3424] bpf_exec_tx_verdict+0x5ba/0x2530 > [ 158.653332][ T3424] ? _printk+0x181/0x1b0 > [ 158.653332][ T3424] ? tls_sw_do_sendpage+0xc81/0x21f0 > [ 158.653332][ T3424] tls_sw_do_sendpage+0x1779/0x21f0 > [ 158.653332][ T3424] tls_sw_sendpage+0x247/0x2b0 > [ 158.653332][ T3424] ? tls_sw_do_sendpage+0x21f0/0x21f0 > [ 158.653332][ T3424] inet_sendpage+0x1de/0x2f0 > [ 158.653332][ T3424] ? inet_sendmsg+0x1d0/0x1d0 > [ 158.653332][ T3424] kernel_sendpage+0x4cc/0x940 > [ 158.653332][ T3424] sock_sendpage+0x162/0x220 > [ 158.653332][ T3424] pipe_to_sendpage+0x3df/0x4f0 > [ 158.653332][ T3424] ? sock_fasync+0x240/0x240 > [ 158.653332][ T3424] __splice_from_pipe+0x5c7/0x1010 > [ 158.653332][ T3424] ? generic_splice_sendpage+0x2a0/0x2a0 > [ 158.653332][ T3424] generic_splice_sendpage+0x1c6/0x2a0 > [ 158.653332][ T3424] ? iter_file_splice_write+0x1a30/0x1a30 > [ 158.653332][ T3424] do_splice+0x26d8/0x32f0 > [ 158.653332][ T3424] ? kmsan_get_shadow_origin_ptr+0x4d/0xa0 > [ 158.653332][ T3424] ? __se_sys_splice+0x292/0xba0 > [ 158.653332][ T3424] ? __msan_metadata_ptr_for_load_8+0x24/0x40 > [ 158.653332][ T3424] ? filter_irq_stacks+0xb9/0x230 > [ 158.653332][ T3424] __se_sys_splice+0x81f/0xba0 > [ 158.870673][ T3424] __x64_sys_splice+0x1a1/0x200 > [ 158.870673][ T3424] do_syscall_64+0x41/0x90 > [ 158.870673][ T3424] entry_SYSCALL_64_after_hwframe+0x63/0xcd > [ 158.870673][ T3424] RIP: 0033:0x7f6bbd51ea3d > [ 158.895223][ T3424] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c3 a3 0f 00 f7 d8 64 89 01 48 > [ 158.895223][ T3424] RSP: 002b:00007f6bbd731e08 EFLAGS: 00000246 ORIG_RAX: 0000000000000113 > [ 158.895223][ T3424] RAX: ffffffffffffffda RBX: 000055ccd9ea6080 RCX: 00007f6bbd51ea3d > [ 158.895223][ T3424] RDX: 0000000000000004 RSI: 0000000000000000 RDI: 0000000000000003 > [ 158.895223][ T3424] RBP: 000055ccd9ea41f4 R08: 00080000fffffffc R09: 0000000000000000 > [ 158.895223][ T3424] R10: 0000000000000000 R11: 0000000000000246 R12: 0100000000000000 > [ 158.895223][ T3424] R13: e65b75b4ec4292eb R14: f2300cdb85a45425 R15: 000055ccd9ea6088 > [ 159.041467][ T3424] </TASK> > [ 159.041467][ T3424] Kernel Offset: disabled > [ 159.041467][ T3424] Rebooting in 10 seconds.. > ---------------------------------------- > > > > > Could you explain why that bug contains ~50 reports that seem entirely > > unrelated? AIUI, this actual issue has not been reproduced since > > 2020?? > > Multiple different bugs are reported as the same problem. > Reproducer is available for only bpf_exec_tx_verdict() one, and the reproducer still works. > > > > > > >> Config: https://syzkaller.appspot.com/text?tag=KernelConfig&x=ee5f7a0b2e48ed66 > >> Report: https://syzkaller.appspot.com/text?tag=CrashReport&x=1325260d900000 > >> Syz repro: https://syzkaller.appspot.com/text?tag=ReproSyz&x=11af973e900000 > >> C repro: https://syzkaller.appspot.com/text?tag=ReproC&x=163a1e45900000 > >> > >> The bug is reproducible for me locally as well (and Tetsuo's patch > >> makes it disappear, although I have no opinion on its correctness). > > > > What I'd like to do is run a kernel plus initrd locally in OVMF and > > reproduce the issue - can I do that without all the syzkaller > > machinery? > > I'm using Ubuntu 22.04 on Oracle VM VirtualBox. > I don't know if this can be reproduced with kernel plus initrd only. But > since the C reproducer is standalone, syzkaller machinery is not involved. >
On Fri, Jun 30, 2023 at 12:18 PM Ard Biesheuvel <ardb@kernel.org> wrote: > > On Fri, 30 Jun 2023 at 12:11, Alexander Potapenko <glider@google.com> wrote: > > > > On Fri, Jun 30, 2023 at 12:02 PM Ard Biesheuvel <ardb@kernel.org> wrote: > > > > > > On Fri, 30 Jun 2023 at 11:53, Tetsuo Handa > > > <penguin-kernel@i-love.sakura.ne.jp> wrote: > > > > > > > > On 2023/06/30 18:36, Ard Biesheuvel wrote: > > > > > Why are you sending this now? > > > > > > > > Just because this is currently top crasher and I can reproduce locally. > > > > > > > > > Do you have a reproducer for this issue? > > > > > > > > Yes. https://syzkaller.appspot.com/text?tag=ReproC&x=12931621900000 works. > > > > > > > > > > Could you please share your kernel config and the resulting kernel log > > > when running the reproducer? I'll try to reproduce locally as well, > > > and see if I can figure out what is going on in the crypto layer > > > > The config together with the repro is available at > > https://syzkaller.appspot.com/bug?extid=828dfc12440b4f6f305d, see the > > latest row of the "Crashes" table that contains a C repro. > > Could you explain why that bug contains ~50 reports that seem entirely > unrelated? These are some unfortunate effects of syzbot trying to deduplicate bugs. There's a tradeoff between reporting every single crash separately and grouping together those that have e.g. the same origin. Applying this algorithm transitively results in bigger clusters containing unwanted reports. We'll look closer. > AIUI, this actual issue has not been reproduced since > 2020?? Oh, sorry, I misread the table and misinformed you. The topmost row of the table is indeed the _oldest_ one. Another manifestation of the bug was on 2023/05/23 (https://syzkaller.appspot.com/text?tag=CrashReport&x=146f66b1280000) > > > Config: https://syzkaller.appspot.com/text?tag=KernelConfig&x=ee5f7a0b2e48ed66 > > Report: https://syzkaller.appspot.com/text?tag=CrashReport&x=1325260d900000 > > Syz repro: https://syzkaller.appspot.com/text?tag=ReproSyz&x=11af973e900000 > > C repro: https://syzkaller.appspot.com/text?tag=ReproC&x=163a1e45900000 > > > > The bug is reproducible for me locally as well (and Tetsuo's patch > > makes it disappear, although I have no opinion on its correctness). > > What I'd like to do is run a kernel plus initrd locally in OVMF and > reproduce the issue - can I do that without all the syzkaller > machinery? You can build the kernel from the config linked above, that's what I did to reproduce it locally. As for initrd, there are disk images attached to the reports, will that help? E.g. $ wget https://storage.googleapis.com/syzbot-assets/79bb4ff7cc58/disk-f93f2fed.raw.xz $ unxz disk-f93f2fed.raw.xz $ qemu-system-x86_64 -smp 2,sockets=2,cores=1 -m 4G -drive file=disk-f93f2fed.raw -snapshot -nographic -enable-kvm lets me boot syzkaller with the disk/kernel from that report of 2023/05/23. Adding "-net user,hostfwd=tcp::10022-:22 -net nic,model=e1000" I am also able to SSH into the machine (there's no password): $ ssh -o "StrictHostKeyChecking no" -p 10022 root@localhost Then the repro can be downloaded and executed: $ wget "https://syzkaller.appspot.com/text?tag=ReproC&x=163a1e45900000" -O t.c $ gcc t.c -static -o t $ scp -o "StrictHostKeyChecking no" -P 10022 t root@localhost: $ ssh -o "StrictHostKeyChecking no" -p 10022 root@localhost ./t Within a couple minutes the kernel crashes with the report: [ 151.522472][ T5865] ===================================================== [ 151.523843][ T5865] BUG: KMSAN: uninit-value in aes_encrypt+0x15cc/0x1db0 [ 151.525120][ T5865] aes_encrypt+0x15cc/0x1db0 [ 151.526113][ T5865] aesti_encrypt+0x7d/0xf0 [ 151.527057][ T5865] crypto_cipher_encrypt_one+0x112/0x200 [ 151.528224][ T5865] crypto_cbcmac_digest_update+0x301/0x4b0 The vmlinux binary (also available at the syzbot page) can be used to symbolize this report, but perhaps at this point you'd love to switch to your own kernel. -- Alexander Potapenko Software Engineer Google Germany GmbH Erika-Mann-Straße, 33 80636 München Geschäftsführer: Paul Manicle, Liana Sebastian Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg
> > >>> > > >>> Could you please share your kernel config and the resulting kernel log > > >>> when running the reproducer? I'll try to reproduce locally as well, > > >>> and see if I can figure out what is going on in the crypto layer > > >> > > >> The config together with the repro is available at > > >> https://syzkaller.appspot.com/bug?extid=828dfc12440b4f6f305d, see the > > >> latest row of the "Crashes" table that contains a C repro. > > > > Kernel is commit e6bc8833d80f of https://github.com/google/kmsan/commits/master . > > That commit does not exist in that repo. Does it matter? Apologies for this mess. https://github.com/google/kmsan/commits/master is force-updated once a week to point to the latest release candidate with KMSAN-specific patches. Older releases are called e.g. kmsan-v6.4-rc7. Right now there's only one patch required to run torvalds/master with KMSAN (https://github.com/google/kmsan/commit/e6bc8833d80f). That patch will hit upstream in v6.5-rc1, after which I am going to switch syzbot to test upstream. For such a long-standing bug the exact version of KMSAN shouldn't matter.
On Fri, 30 Jun 2023 at 13:38, Alexander Potapenko <glider@google.com> wrote: > > On Fri, Jun 30, 2023 at 12:18 PM Ard Biesheuvel <ardb@kernel.org> wrote: > > > > On Fri, 30 Jun 2023 at 12:11, Alexander Potapenko <glider@google.com> wrote: > > > > > > On Fri, Jun 30, 2023 at 12:02 PM Ard Biesheuvel <ardb@kernel.org> wrote: > > > > > > > > On Fri, 30 Jun 2023 at 11:53, Tetsuo Handa > > > > <penguin-kernel@i-love.sakura.ne.jp> wrote: > > > > > > > > > > On 2023/06/30 18:36, Ard Biesheuvel wrote: > > > > > > Why are you sending this now? > > > > > > > > > > Just because this is currently top crasher and I can reproduce locally. > > > > > > > > > > > Do you have a reproducer for this issue? > > > > > > > > > > Yes. https://syzkaller.appspot.com/text?tag=ReproC&x=12931621900000 works. > > > > > > > > > > > > > Could you please share your kernel config and the resulting kernel log > > > > when running the reproducer? I'll try to reproduce locally as well, > > > > and see if I can figure out what is going on in the crypto layer > > > > > > The config together with the repro is available at > > > https://syzkaller.appspot.com/bug?extid=828dfc12440b4f6f305d, see the > > > latest row of the "Crashes" table that contains a C repro. > > > > Could you explain why that bug contains ~50 reports that seem entirely > > unrelated? > > These are some unfortunate effects of syzbot trying to deduplicate > bugs. There's a tradeoff between reporting every single crash > separately and grouping together those that have e.g. the same origin. > Applying this algorithm transitively results in bigger clusters > containing unwanted reports. > We'll look closer. > > > AIUI, this actual issue has not been reproduced since > > 2020?? > > Oh, sorry, I misread the table and misinformed you. The topmost row of > the table is indeed the _oldest_ one. > Another manifestation of the bug was on 2023/05/23 > (https://syzkaller.appspot.com/text?tag=CrashReport&x=146f66b1280000) > That one has nothing to do with networking, so I don't see how this patch would affect it. > > > > > > Config: https://syzkaller.appspot.com/text?tag=KernelConfig&x=ee5f7a0b2e48ed66 > > > Report: https://syzkaller.appspot.com/text?tag=CrashReport&x=1325260d900000 > > > Syz repro: https://syzkaller.appspot.com/text?tag=ReproSyz&x=11af973e900000 > > > C repro: https://syzkaller.appspot.com/text?tag=ReproC&x=163a1e45900000 > > > > > > The bug is reproducible for me locally as well (and Tetsuo's patch > > > makes it disappear, although I have no opinion on its correctness). > > > > What I'd like to do is run a kernel plus initrd locally in OVMF and > > reproduce the issue - can I do that without all the syzkaller > > machinery? > > You can build the kernel from the config linked above, that's what I > did to reproduce it locally. > As for initrd, there are disk images attached to the reports, will that help? > > E.g. > $ wget https://storage.googleapis.com/syzbot-assets/79bb4ff7cc58/disk-f93f2fed.raw.xz > $ unxz disk-f93f2fed.raw.xz > $ qemu-system-x86_64 -smp 2,sockets=2,cores=1 -m 4G -drive > file=disk-f93f2fed.raw -snapshot -nographic -enable-kvm > > lets me boot syzkaller with the disk/kernel from that report of 2023/05/23. > Adding "-net user,hostfwd=tcp::10022-:22 -net nic,model=e1000" I am > also able to SSH into the machine (there's no password): > > $ ssh -o "StrictHostKeyChecking no" -p 10022 root@localhost > > Then the repro can be downloaded and executed: > > $ wget "https://syzkaller.appspot.com/text?tag=ReproC&x=163a1e45900000" -O t.c > $ gcc t.c -static -o t > $ scp -o "StrictHostKeyChecking no" -P 10022 t root@localhost: > $ ssh -o "StrictHostKeyChecking no" -p 10022 root@localhost ./t > > Within a couple minutes the kernel crashes with the report: > > [ 151.522472][ T5865] ===================================================== > [ 151.523843][ T5865] BUG: KMSAN: uninit-value in aes_encrypt+0x15cc/0x1db0 > [ 151.525120][ T5865] aes_encrypt+0x15cc/0x1db0 > [ 151.526113][ T5865] aesti_encrypt+0x7d/0xf0 > [ 151.527057][ T5865] crypto_cipher_encrypt_one+0x112/0x200 > [ 151.528224][ T5865] crypto_cbcmac_digest_update+0x301/0x4b0 > OK, thanks for the instructions. Out of curiosity - does the stack trace you cut off here include the BPF routine mentioned in the report?
On Fri, Jun 30, 2023 at 1:49 PM Ard Biesheuvel <ardb@kernel.org> wrote: > > On Fri, 30 Jun 2023 at 13:38, Alexander Potapenko <glider@google.com> wrote: > > > > On Fri, Jun 30, 2023 at 12:18 PM Ard Biesheuvel <ardb@kernel.org> wrote: > > > > > > On Fri, 30 Jun 2023 at 12:11, Alexander Potapenko <glider@google.com> wrote: > > > > > > > > On Fri, Jun 30, 2023 at 12:02 PM Ard Biesheuvel <ardb@kernel.org> wrote: > > > > > > > > > > On Fri, 30 Jun 2023 at 11:53, Tetsuo Handa > > > > > <penguin-kernel@i-love.sakura.ne.jp> wrote: > > > > > > > > > > > > On 2023/06/30 18:36, Ard Biesheuvel wrote: > > > > > > > Why are you sending this now? > > > > > > > > > > > > Just because this is currently top crasher and I can reproduce locally. > > > > > > > > > > > > > Do you have a reproducer for this issue? > > > > > > > > > > > > Yes. https://syzkaller.appspot.com/text?tag=ReproC&x=12931621900000 works. > > > > > > > > > > > > > > > > Could you please share your kernel config and the resulting kernel log > > > > > when running the reproducer? I'll try to reproduce locally as well, > > > > > and see if I can figure out what is going on in the crypto layer > > > > > > > > The config together with the repro is available at > > > > https://syzkaller.appspot.com/bug?extid=828dfc12440b4f6f305d, see the > > > > latest row of the "Crashes" table that contains a C repro. > > > > > > Could you explain why that bug contains ~50 reports that seem entirely > > > unrelated? > > > > These are some unfortunate effects of syzbot trying to deduplicate > > bugs. There's a tradeoff between reporting every single crash > > separately and grouping together those that have e.g. the same origin. > > Applying this algorithm transitively results in bigger clusters > > containing unwanted reports. > > We'll look closer. > > > > > AIUI, this actual issue has not been reproduced since > > > 2020?? > > > > Oh, sorry, I misread the table and misinformed you. The topmost row of > > the table is indeed the _oldest_ one. > > Another manifestation of the bug was on 2023/05/23 > > (https://syzkaller.appspot.com/text?tag=CrashReport&x=146f66b1280000) > > > > That one has nothing to do with networking, so I don't see how this > patch would affect it. I definitely have to be more attentive. You are right that this bug report is also unrelated. Yet it is still fine to use the build artifacts corresponding to it (which is what I did). I'll investigate why so many reports got clustered into this one. > OK, thanks for the instructions. > > Out of curiosity - does the stack trace you cut off here include the > BPF routine mentioned in the report? It does: [ 151.522472][ T5865] ===================================================== [ 151.523843][ T5865] BUG: KMSAN: uninit-value in aes_encrypt+0x15cc/0x1db0 [ 151.525120][ T5865] aes_encrypt+0x15cc/0x1db0 [ 151.526113][ T5865] aesti_encrypt+0x7d/0xf0 [ 151.527057][ T5865] crypto_cipher_encrypt_one+0x112/0x200 [ 151.528224][ T5865] crypto_cbcmac_digest_update+0x301/0x4b0 [ 151.529459][ T5865] shash_ahash_finup+0x66e/0xc00 [ 151.530541][ T5865] shash_async_finup+0x7f/0xc0 [ 151.531542][ T5865] crypto_ahash_finup+0x1b8/0x3e0 [ 151.532583][ T5865] crypto_ccm_auth+0x1269/0x1350 [ 151.533606][ T5865] crypto_ccm_encrypt+0x1c9/0x7a0 [ 151.534650][ T5865] crypto_aead_encrypt+0xe0/0x150 [ 151.535695][ T5865] tls_push_record+0x3bf3/0x4ec0 [ 151.539491][ T5865] bpf_exec_tx_verdict+0x46e/0x21d0 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [ 151.540597][ T5865] tls_sw_do_sendpage+0x1150/0x1ad0 [ 151.541594][ T5865] tls_sw_sendpage+0x15b/0x1b0 [ 151.542500][ T5865] inet_sendpage+0x138/0x210 [ 151.543365][ T5865] kernel_sendpage+0x34c/0x6d0 [ 151.544269][ T5865] sock_sendpage+0xb0/0x160 [ 151.545117][ T5865] pipe_to_sendpage+0x304/0x3f0 [ 151.546051][ T5865] __splice_from_pipe+0x438/0xc20 [ 151.547013][ T5865] generic_splice_sendpage+0x100/0x160 [ 151.548068][ T5865] do_splice+0x213b/0x2d10 [ 151.548933][ T5865] __se_sys_splice+0x5ad/0x8f0 [ 151.549851][ T5865] __x64_sys_splice+0x11b/0x1a0 [ 151.550790][ T5865] do_syscall_64+0x41/0xc0 [ 151.551646][ T5865] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 151.552773][ T5865] [ 151.553220][ T5865] Uninit was stored to memory at: [ 151.554212][ T5865] __crypto_xor+0x171/0x1310 [ 151.555062][ T5865] crypto_cbcmac_digest_update+0x208/0x4b0 [ 151.556132][ T5865] shash_ahash_finup+0x66e/0xc00 [ 151.557084][ T5865] shash_async_finup+0x7f/0xc0 [ 151.557989][ T5865] crypto_ahash_finup+0x1b8/0x3e0 [ 151.558941][ T5865] crypto_ccm_auth+0x1269/0x1350 [ 151.559874][ T5865] crypto_ccm_encrypt+0x1c9/0x7a0 [ 151.560812][ T5865] crypto_aead_encrypt+0xe0/0x150 [ 151.561749][ T5865] tls_push_record+0x3bf3/0x4ec0 [ 151.562835][ T5865] bpf_exec_tx_verdict+0x46e/0x21d0 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [ 151.563967][ T5865] tls_sw_do_sendpage+0x1150/0x1ad0 [ 151.565075][ T5865] tls_sw_sendpage+0x15b/0x1b0 [ 151.566107][ T5865] inet_sendpage+0x138/0x210 [ 151.567078][ T5865] kernel_sendpage+0x34c/0x6d0 [ 151.568087][ T5865] sock_sendpage+0xb0/0x160 [ 151.568960][ T5865] pipe_to_sendpage+0x304/0x3f0 [ 151.569909][ T5865] __splice_from_pipe+0x438/0xc20 [ 151.570886][ T5865] generic_splice_sendpage+0x100/0x160 [ 151.571946][ T5865] do_splice+0x213b/0x2d10 [ 151.572810][ T5865] __se_sys_splice+0x5ad/0x8f0 [ 151.573732][ T5865] __x64_sys_splice+0x11b/0x1a0 [ 151.574664][ T5865] do_syscall_64+0x41/0xc0 [ 151.575513][ T5865] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 151.576640][ T5865] [ 151.577084][ T5865] Uninit was created at: [ 151.577949][ T5865] __alloc_pages+0x9a4/0xe00 [ 151.578849][ T5865] alloc_pages+0xd01/0x1040 [ 151.579729][ T5865] skb_page_frag_refill+0x2bf/0x7c0 [ 151.580752][ T5865] sk_page_frag_refill+0x59/0x130 [ 151.581720][ T5865] sk_msg_alloc+0x198/0x10d0 [ 151.582611][ T5865] tls_sw_do_sendpage+0x98a/0x1ad0 [ 151.583599][ T5865] tls_sw_sendpage+0x15b/0x1b0 [ 151.584535][ T5865] inet_sendpage+0x138/0x210 [ 151.585404][ T5865] kernel_sendpage+0x34c/0x6d0 [ 151.586275][ T5865] sock_sendpage+0xb0/0x160 [ 151.587099][ T5865] pipe_to_sendpage+0x304/0x3f0 [ 151.588023][ T5865] __splice_from_pipe+0x438/0xc20 [ 151.588981][ T5865] generic_splice_sendpage+0x100/0x160 [ 151.590032][ T5865] do_splice+0x213b/0x2d10 [ 151.590910][ T5865] __se_sys_splice+0x5ad/0x8f0 [ 151.591840][ T5865] __x64_sys_splice+0x11b/0x1a0 [ 151.592780][ T5865] do_syscall_64+0x41/0xc0 [ 151.593748][ T5865] entry_SYSCALL_64_after_hwframe+0x63/0xcd
On Fri, 30 Jun 2023 at 13:55, Alexander Potapenko <glider@google.com> wrote: > > On Fri, Jun 30, 2023 at 1:49 PM Ard Biesheuvel <ardb@kernel.org> wrote: > > > > On Fri, 30 Jun 2023 at 13:38, Alexander Potapenko <glider@google.com> wrote: > > > > > > On Fri, Jun 30, 2023 at 12:18 PM Ard Biesheuvel <ardb@kernel.org> wrote: > > > > > > > > On Fri, 30 Jun 2023 at 12:11, Alexander Potapenko <glider@google.com> wrote: > > > > > > > > > > On Fri, Jun 30, 2023 at 12:02 PM Ard Biesheuvel <ardb@kernel.org> wrote: > > > > > > > > > > > > On Fri, 30 Jun 2023 at 11:53, Tetsuo Handa > > > > > > <penguin-kernel@i-love.sakura.ne.jp> wrote: > > > > > > > > > > > > > > On 2023/06/30 18:36, Ard Biesheuvel wrote: > > > > > > > > Why are you sending this now? > > > > > > > > > > > > > > Just because this is currently top crasher and I can reproduce locally. > > > > > > > > > > > > > > > Do you have a reproducer for this issue? > > > > > > > > > > > > > > Yes. https://syzkaller.appspot.com/text?tag=ReproC&x=12931621900000 works. > > > > > > > > > > > > > > > > > > > Could you please share your kernel config and the resulting kernel log > > > > > > when running the reproducer? I'll try to reproduce locally as well, > > > > > > and see if I can figure out what is going on in the crypto layer > > > > > > > > > > The config together with the repro is available at > > > > > https://syzkaller.appspot.com/bug?extid=828dfc12440b4f6f305d, see the > > > > > latest row of the "Crashes" table that contains a C repro. > > > > > > > > Could you explain why that bug contains ~50 reports that seem entirely > > > > unrelated? > > > > > > These are some unfortunate effects of syzbot trying to deduplicate > > > bugs. There's a tradeoff between reporting every single crash > > > separately and grouping together those that have e.g. the same origin. > > > Applying this algorithm transitively results in bigger clusters > > > containing unwanted reports. > > > We'll look closer. > > > > > > > AIUI, this actual issue has not been reproduced since > > > > 2020?? > > > > > > Oh, sorry, I misread the table and misinformed you. The topmost row of > > > the table is indeed the _oldest_ one. > > > Another manifestation of the bug was on 2023/05/23 > > > (https://syzkaller.appspot.com/text?tag=CrashReport&x=146f66b1280000) > > > > > > > That one has nothing to do with networking, so I don't see how this > > patch would affect it. > > I definitely have to be more attentive. > You are right that this bug report is also unrelated. Yet it is still > fine to use the build artifacts corresponding to it (which is what I > did). > I'll investigate why so many reports got clustered into this one. > > > > > OK, thanks for the instructions. > > > > Out of curiosity - does the stack trace you cut off here include the > > BPF routine mentioned in the report? > > It does: > > [ 151.522472][ T5865] ===================================================== > [ 151.523843][ T5865] BUG: KMSAN: uninit-value in aes_encrypt+0x15cc/0x1db0 > [ 151.525120][ T5865] aes_encrypt+0x15cc/0x1db0 > [ 151.526113][ T5865] aesti_encrypt+0x7d/0xf0 > [ 151.527057][ T5865] crypto_cipher_encrypt_one+0x112/0x200 > [ 151.528224][ T5865] crypto_cbcmac_digest_update+0x301/0x4b0 > [ 151.529459][ T5865] shash_ahash_finup+0x66e/0xc00 > [ 151.530541][ T5865] shash_async_finup+0x7f/0xc0 > [ 151.531542][ T5865] crypto_ahash_finup+0x1b8/0x3e0 > [ 151.532583][ T5865] crypto_ccm_auth+0x1269/0x1350 > [ 151.533606][ T5865] crypto_ccm_encrypt+0x1c9/0x7a0 > [ 151.534650][ T5865] crypto_aead_encrypt+0xe0/0x150 > [ 151.535695][ T5865] tls_push_record+0x3bf3/0x4ec0 > [ 151.539491][ T5865] bpf_exec_tx_verdict+0x46e/0x21d0 > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > [ 151.540597][ T5865] tls_sw_do_sendpage+0x1150/0x1ad0 OK, so after poking around a little bit, I have managed to confirm that the problem is in the TLS layer, and I am a bit out of my depth debugging that. With the debugging code below applied, KMSAN triggers on an uninit-memory in the input scatterlist provided by the TLS layer into the CCM code. [ 148.375852][ T2424] ===================================================== [ 148.377269][ T2424] BUG: KMSAN: uninit-value in tls_push_record+0x2d9f/0x3eb0 [ 148.378623][ T2424] tls_push_record+0x2d9f/0x3eb0 [ 148.379559][ T2424] bpf_exec_tx_verdict+0x5ba/0x2530 [ 148.380534][ T2424] tls_sw_do_sendpage+0x169c/0x1f80 [ 148.381519][ T2424] tls_sw_sendpage+0x247/0x2b0 ... [ 148.411559][ T2424] [ 148.412108][ T2424] Bytes 0-15 of 16 are uninitialized [ 148.413379][ T2424] Memory access of size 16 starts at ffff8880157889c7 Note that this is the *input* scatterlist containing the AAD (additional authenticated data) and the crypto input, and so there is definitely a bug here that shouldn't be papered over by zero'ing the allocation. --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -543,6 +543,21 @@ static int tls_do_encryption(struct sock *sk, list_add_tail((struct list_head *)&rec->list, &ctx->tx_list); atomic_inc(&ctx->encrypt_pending); + { + int len = aead_req->assoclen + aead_req->cryptlen; + struct sg_mapping_iter miter; + + sg_miter_start(&miter, rec->sg_aead_in, + sg_nents(rec->sg_aead_in), + SG_MITER_TO_SG | SG_MITER_ATOMIC); + + while (len > 0 && sg_miter_next(&miter)) { + kmsan_check_memory(miter.addr, min(len, (int)miter.length)); + len -= miter.length; + } + sg_miter_stop(&miter); + } + The reason that this cascades all the way down to the AES cipher code appears to be that sbox substitution involves array indexing, which is one of the actions KMSAN qualifies as 'use' of uninit data.
On Fri, 30 Jun 2023 17:16:06 +0200 Ard Biesheuvel wrote: > Note that this is the *input* scatterlist containing the AAD > (additional authenticated data) and the crypto input, and so there is > definitely a bug here that shouldn't be papered over by zero'ing the > allocation. Noob question, it's not the tag / AAD, right? We definitely don't init that..
On Fri, 30 Jun 2023 at 17:27, Jakub Kicinski <kuba@kernel.org> wrote: > > On Fri, 30 Jun 2023 17:16:06 +0200 Ard Biesheuvel wrote: > > Note that this is the *input* scatterlist containing the AAD > > (additional authenticated data) and the crypto input, and so there is > > definitely a bug here that shouldn't be papered over by zero'ing the > > allocation. > > Noob question, it's not the tag / AAD, right? We definitely don't init > that.. assoclen + cryptlen does not cover the tag on encryption, so it would be omitted here regardless of whether it was covered in the input scatterlist or not.
I'm trying to write a simplified reproducer that reproduces ---------------------------------------- [ 162.905919][ T3399] required_size=2461 ret=0 [ 162.916796][ T3399] required_size=6557 ret=0 [ 162.919587][ T3399] required_size=10653 ret=0 [ 162.923090][ T3399] required_size=14749 ret=0 [ 162.928686][ T3399] required_size=16413 ret=0 [ 162.936894][ T3399] full_record=1 eor=0 sk_msg_full(msg_pl)=0 copied=1664 [ 162.962097][ T3399] required_size=2461 ret=0 [ 162.967270][ T3399] required_size=6557 ret=0 [ 162.992866][ T3399] required_size=10653 ret=0 [ 162.999962][ T3399] required_size=14765 ret=0 [ 163.006420][ T3399] required_size=16413 ret=0 [ 163.012163][ T3399] full_record=1 eor=0 sk_msg_full(msg_pl)=0 copied=1648 ---------------------------------------- part, and came to wonder if serialization between splice() and sendmsg() is correct. A program where splice() and sendmsg() cannot run in parallel due to single-threaded is shown below. ---------------------------------------- #define _GNU_SOURCE #include <sys/types.h> #include <fcntl.h> #include <unistd.h> #include <sys/socket.h> #include <arpa/inet.h> #define SOL_TCP 6 #define TCP_REPAIR 19 #define TCP_ULP 31 #define TLS_TX 1 int main(int argc, char *argv[]) { struct iovec iov = { .iov_base = "@@@@@@@@@@@@@@@@", .iov_len = 16 }; struct msghdr hdr = { .msg_iov = &iov, .msg_iovlen = 1, .msg_flags = MSG_FASTOPEN }; const struct sockaddr_in6 addr = { .sin6_family = AF_INET6, .sin6_addr = in6addr_loopback }; const int one = 1; int ret_ignored = 0; const int fd = socket(PF_INET6, SOCK_STREAM, IPPROTO_IP); int pipe_fds[2] = { -1, -1 }; static char buf[32768] = { }; ret_ignored += pipe(pipe_fds); setsockopt(fd, SOL_TCP, TCP_REPAIR, &one, sizeof(one)); connect(fd, (struct sockaddr *) &addr, sizeof(addr)); setsockopt(fd, SOL_TCP, TCP_ULP, "tls", 4); setsockopt(fd, SOL_TLS, TLS_TX,"\3\0035\0%T\244\205\333\f0\362B\221\243\234\206\216\220\243u\347\342P|1\24}Q@\377\227\353\222B\354\264u[\346", 40); ret_ignored += write(pipe_fds[1], buf, 2432); ret_ignored += write(pipe_fds[1], buf, 10676); ret_ignored += write(pipe_fds[1], buf, 17996); if (argc == 1) { ret_ignored += splice(pipe_fds[0], NULL, fd, NULL, 1048576, 0); ret_ignored += sendmsg(fd, &hdr, MSG_DONTWAIT | MSG_MORE); } else { ret_ignored += sendmsg(fd, &hdr, MSG_DONTWAIT | MSG_MORE); ret_ignored += splice(pipe_fds[0], NULL, fd, NULL, 1048576, 0); } return ret_ignored * 0; } ---------------------------------------- If you run this program with argc == 1, you will get below output. ---------------------------------------- [ 4030.292376] [ T3896] required_size=2461 ret=0 [ 4030.292376] [ T3896] required_size=6557 ret=0 [ 4030.292376] [ T3896] required_size=10653 ret=0 [ 4030.292376] [ T3896] required_size=14749 ret=0 [ 4030.292376] [ T3896] required_size=16413 ret=0 [ 4030.292376] [ T3896] full_record=1 eor=0 sk_msg_full(msg_pl)=0 copied=1664 [ 4030.330185] [ T3896] required_size=2461 ret=0 [ 4030.330185] [ T3896] required_size=6557 ret=0 [ 4030.330185] [ T3896] required_size=10653 ret=0 [ 4030.335443] [ T3896] required_size=14749 ret=0 [ 4030.335443] [ T3896] full_record=0 eor=1 sk_msg_full(msg_pl)=0 copied=4096 ---------------------------------------- If you run this program with argc != 1, you will get below output. ---------------------------------------- [ 4044.312696] [ T3898] required_size=2477 ret=0 [ 4044.312696] [ T3898] required_size=6573 ret=0 [ 4044.312696] [ T3898] required_size=10669 ret=0 [ 4044.312696] [ T3898] required_size=14765 ret=0 [ 4044.312696] [ T3898] required_size=16413 ret=0 [ 4044.312696] [ T3898] full_record=1 eor=0 sk_msg_full(msg_pl)=0 copied=1648 [ 4044.340425] [ T3898] required_size=2477 ret=0 [ 4044.350515] [ T3898] required_size=6573 ret=0 [ 4044.350515] [ T3898] required_size=10669 ret=0 [ 4044.350515] [ T3898] required_size=14765 ret=0 [ 4044.350515] [ T3898] full_record=0 eor=1 sk_msg_full(msg_pl)=0 copied=4096 ---------------------------------------- The difference is that the required_size= value differs by "struct iovec"->iov_len bytes which is the value passed to sendmsg(). If splice() happens before sendmsg() happens, the output does not include bytes passed to sendmsg(). If sendmsg() happens before splice() happens, the output includes bytes passed to sendmsg(). Then, where does the difference between ---------------------------------------- [ 162.919587][ T3399] required_size=10653 ret=0 [ 162.923090][ T3399] required_size=14749 ret=0 [ 162.928686][ T3399] required_size=16413 ret=0 ---------------------------------------- and ---------------------------------------- [ 162.992866][ T3399] required_size=10653 ret=0 [ 162.999962][ T3399] required_size=14765 ret=0 [ 163.006420][ T3399] required_size=16413 ret=0 ---------------------------------------- come from? Both output had the same values until 10653, but the next value differs by 16. This might suggest that a race between splice() and sendmsg() caused unexpected required_size= value... This could explain "for the second time" part below. > 4125+8221+12317+16413=41076 (the lower 4 bits are 0100) > 2461+6557+10653+14749+16413=50833 (the lower 4 bits are 0001) > 2461+6573+10669+14765+16413=50881 (the lower 4 bits are 0001) > > KMSAN reports this problem when the lower 4 bits became 0001 for the second time. > Unless KMSAN's reporting is asynchronous, maybe the reason of "for the second time" > part is that the previous state is relevant...
On 2023/07/01 13:12, Tetsuo Handa wrote: > Then, where does the difference between > > ---------------------------------------- > [ 162.919587][ T3399] required_size=10653 ret=0 > [ 162.923090][ T3399] required_size=14749 ret=0 > [ 162.928686][ T3399] required_size=16413 ret=0 > ---------------------------------------- > > and > > ---------------------------------------- > [ 162.992866][ T3399] required_size=10653 ret=0 > [ 162.999962][ T3399] required_size=14765 ret=0 > [ 163.006420][ T3399] required_size=16413 ret=0 > ---------------------------------------- > > come from? Both output had the same values until 10653, but > the next value differs by 16. This might suggest that a race between > splice() and sendmsg() caused unexpected required_size= value... > I found a simplified reproducer. This problem happens when splice() and sendmsg() run in parallel. ---------------------------------------- #define _GNU_SOURCE #include <sys/types.h> #include <fcntl.h> #include <unistd.h> #include <sys/socket.h> #include <arpa/inet.h> #include <sys/poll.h> #define SOL_TCP 6 #define TCP_REPAIR 19 #define TCP_ULP 31 #define TLS_TX 1 int main(int argc, char *argv[]) { struct iovec iov = { .iov_base = "@@@@@@@@@@@@@@@@", .iov_len = 16 }; struct msghdr hdr = { .msg_iov = &iov, .msg_iovlen = 1, .msg_flags = MSG_FASTOPEN }; const struct sockaddr_in6 addr = { .sin6_family = AF_INET6, .sin6_addr = in6addr_loopback }; const int one = 1; int ret_ignored = 0; const int fd = socket(PF_INET6, SOCK_STREAM, IPPROTO_IP); int pipe_fds[2] = { -1, -1 }; static char buf[32768] = { }; ret_ignored += pipe(pipe_fds); setsockopt(fd, SOL_TCP, TCP_REPAIR, &one, sizeof(one)); connect(fd, (struct sockaddr *) &addr, sizeof(addr)); setsockopt(fd, SOL_TCP, TCP_ULP, "tls", 4); setsockopt(fd, SOL_TLS, TLS_TX,"\3\0035\0%T\244\205\333\f0\362B\221\243\234\206\216\220\243u\347\342P|1\24}Q@\377\227\353\222B\354\264u[\346", 40); if (fork() == 0) { ret_ignored += splice(pipe_fds[0], NULL, fd, NULL, 1048576, SPLICE_F_MORE); _exit(0); } close(pipe_fds[0]); ret_ignored += write(pipe_fds[1], buf, sizeof(buf)); poll(NULL, 0, 1); ret_ignored += sendmsg(fd, &hdr, MSG_DONTWAIT | MSG_MORE); return ret_ignored * 0; } ---------------------------------------- Debug printk() patch and output are shown below. ---------------------------------------- diff --git a/net/core/sock.c b/net/core/sock.c index 24f2761bdb1d..879c1d54deed 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -2825,6 +2825,14 @@ static void sk_leave_memory_pressure(struct sock *sk) DEFINE_STATIC_KEY_FALSE(net_high_order_alloc_disable_key); +static inline char *kaddr(struct page *page) +{ + char *kaddr = kmap_atomic(page); + + kunmap_atomic(kaddr); + return kaddr; +} + /** * skb_page_frag_refill - check that a page_frag contains enough room * @sz: minimum size of the fragment we want to get @@ -2840,10 +2848,17 @@ bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t gfp) if (pfrag->page) { if (page_ref_count(pfrag->page) == 1) { pfrag->offset = 0; + if (current->comm[0] && !strcmp(current->comm + 1, ".out")) { + pr_info("assigned %px\n", kaddr(pfrag->page)); + } return true; } - if (pfrag->offset + sz <= pfrag->size) + if (pfrag->offset + sz <= pfrag->size) { + if (current->comm[0] && !strcmp(current->comm + 1, ".out")) { + pr_info("assigned %d from %px\n", sz, kaddr(pfrag->page) + pfrag->offset); + } return true; + } put_page(pfrag->page); } @@ -2856,12 +2871,18 @@ bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t gfp) __GFP_NORETRY, SKB_FRAG_PAGE_ORDER); if (likely(pfrag->page)) { + if (current->comm[0] && !strcmp(current->comm + 1, ".out")) { + pr_info("allocated %px\n", kaddr(pfrag->page)); + } pfrag->size = PAGE_SIZE << SKB_FRAG_PAGE_ORDER; return true; } } pfrag->page = alloc_page(gfp); if (likely(pfrag->page)) { + if (current->comm[0] && !strcmp(current->comm + 1, ".out")) { + pr_info("allocated %px\n", kaddr(pfrag->page)); + } pfrag->size = PAGE_SIZE; return true; } diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 1a53c8f481e9..dcf00d1f239f 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -543,6 +543,22 @@ static int tls_do_encryption(struct sock *sk, list_add_tail((struct list_head *)&rec->list, &ctx->tx_list); atomic_inc(&ctx->encrypt_pending); + { + int len = aead_req->assoclen + aead_req->cryptlen; + struct sg_mapping_iter miter; + + sg_miter_start(&miter, rec->sg_aead_in, + sg_nents(rec->sg_aead_in), + SG_MITER_TO_SG | SG_MITER_ATOMIC); + + while (len > 0 && sg_miter_next(&miter)) { + pr_info("addr=%px len=%d\n", miter.addr, min(len, (int)miter.length)); + kmsan_check_memory(miter.addr, min(len, (int)miter.length)); + len -= miter.length; + } + sg_miter_stop(&miter); + } + rc = crypto_aead_encrypt(aead_req); if (!rc || rc != -EINPROGRESS) { atomic_dec(&ctx->encrypt_pending); @@ -1211,6 +1227,7 @@ static int tls_sw_do_sendpage(struct sock *sk, struct page *page, goto wait_for_sndbuf; alloc_payload: ret = tls_alloc_encrypted_msg(sk, required_size); + pr_info("required_size=%d ret=%d\n", required_size, ret); if (ret) { if (ret != -ENOSPC) goto wait_for_memory; @@ -1232,6 +1249,8 @@ static int tls_sw_do_sendpage(struct sock *sk, struct page *page, tls_ctx->pending_open_record_frags = true; if (full_record || eor || sk_msg_full(msg_pl)) { + pr_info("full_record=%d eor=%d sk_msg_full(msg_pl)=%d copied=%d\n", + full_record, eor, sk_msg_full(msg_pl), copied); ret = bpf_exec_tx_verdict(msg_pl, sk, full_record, record_type, &copied, flags); if (ret) { ---------------------------------------- skb_page_frag_refill() assigned 32 bytes from ffff88810d40901d but tls_do_encryption() accesses 16 bytes from ffff88810d40900d !? ---------------------------------------- [ 159.777428][ T3405] sock: allocated ffff88810d408000 [ 159.792685][ T3405] required_size=4125 ret=0 [ 159.802047][ T3395] sock: allocated ffff88810b000000 [ 159.799656][ T3405] sock: assigned 32 from ffff88810d40901d [ 159.828229][ T3405] required_size=8237 ret=0 [ 159.848161][ T3405] sock: assigned 32 from ffff88810d40a01d [ 159.849298][ T3405] required_size=12333 ret=0 [ 159.849382][ T3405] sock: assigned 32 from ffff88810d40b01d [ 159.849416][ T3405] required_size=16413 ret=0 [ 159.849434][ T3405] full_record=1 eor=0 sk_msg_full(msg_pl)=0 copied=4080 [ 159.849472][ T3405] addr=ffff888115aa4658 len=13 [ 159.849492][ T3405] addr=ffff88811da48000 len=4096 [ 159.849513][ T3405] addr=ffff88810d40900d len=16 [ 159.849530][ T3405] ===================================================== [ 159.849618][ T3405] BUG: KMSAN: uninit-value in tls_push_record+0x2cff/0x4040 [ 159.849654][ T3405] tls_push_record+0x2cff/0x4040 [ 159.849686][ T3405] bpf_exec_tx_verdict+0x5ba/0x2530 [ 159.849715][ T3405] tls_sw_do_sendpage+0x1779/0x21f0 [ 159.849744][ T3405] tls_sw_sendpage+0x247/0x2b0 [ 159.849772][ T3405] inet_sendpage+0x1de/0x2f0 [ 159.849807][ T3405] kernel_sendpage+0x4cc/0x940 [ 159.849837][ T3405] sock_sendpage+0x162/0x220 [ 159.849864][ T3405] pipe_to_sendpage+0x3df/0x4f0 [ 159.849891][ T3405] __splice_from_pipe+0x5c7/0x1010 [ 159.849916][ T3405] generic_splice_sendpage+0x1c6/0x2a0 [ 159.849943][ T3405] do_splice+0x26d8/0x32f0 [ 159.849966][ T3405] __se_sys_splice+0x81f/0xba0 [ 159.849991][ T3405] __x64_sys_splice+0x1a1/0x200 [ 159.850016][ T3405] do_syscall_64+0x41/0x90 [ 159.850044][ T3405] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 159.850082][ T3405] [ 159.850088][ T3405] Uninit was created at: [ 159.850136][ T3405] __alloc_pages+0x925/0x1050 [ 159.850171][ T3405] alloc_pages+0xe30/0x11b0 [ 159.850204][ T3405] skb_page_frag_refill+0x539/0x1030 [ 159.850238][ T3405] sk_page_frag_refill+0xa2/0x1c0 [ 159.850271][ T3405] sk_msg_alloc+0x278/0x1560 [ 159.850307][ T3405] tls_sw_do_sendpage+0xbec/0x21f0 [ 159.850336][ T3405] tls_sw_sendpage+0x247/0x2b0 [ 159.850363][ T3405] inet_sendpage+0x1de/0x2f0 [ 159.850392][ T3405] kernel_sendpage+0x4cc/0x940 [ 159.850418][ T3405] sock_sendpage+0x162/0x220 [ 159.850446][ T3405] pipe_to_sendpage+0x3df/0x4f0 [ 159.850470][ T3405] __splice_from_pipe+0x5c7/0x1010 [ 159.850495][ T3405] generic_splice_sendpage+0x1c6/0x2a0 [ 159.850522][ T3405] do_splice+0x26d8/0x32f0 [ 159.850545][ T3405] __se_sys_splice+0x81f/0xba0 [ 159.850570][ T3405] __x64_sys_splice+0x1a1/0x200 [ 159.850595][ T3405] do_syscall_64+0x41/0x90 [ 159.850619][ T3405] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 159.850654][ T3405] [ 159.850659][ T3405] Bytes 0-15 of 16 are uninitialized [ 159.850672][ T3405] Memory access of size 16 starts at ffff88810d40900d [ 159.850685][ T3405] [ 159.850694][ T3405] CPU: 2 PID: 3405 Comm: a.out Not tainted 6.4.0-rc7-ge6bc8833d80f-dirty #32 [ 159.850725][ T3405] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 159.850741][ T3405] ===================================================== [ 159.850749][ T3405] Disabling lock debugging due to kernel taint [ 159.850761][ T3405] Kernel panic - not syncing: kmsan.panic set ... [ 159.850774][ T3405] CPU: 2 PID: 3405 Comm: a.out Tainted: G B 6.4.0-rc7-ge6bc8833d80f-dirty #32 [ 159.850801][ T3405] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 159.850814][ T3405] Call Trace: [ 159.850824][ T3405] <TASK> [ 159.850833][ T3405] dump_stack_lvl+0x1f6/0x280 [ 159.850863][ T3405] dump_stack+0x29/0x30 [ 159.850888][ T3405] panic+0x4e7/0xc60 [ 159.850928][ T3405] ? add_taint+0x185/0x210 [ 159.850969][ T3405] kmsan_report+0x2d1/0x2e0 [ 159.851008][ T3405] ? kmsan_internal_unpoison_memory+0x14/0x20 [ 159.851048][ T3405] ? __msan_metadata_ptr_for_load_4+0x24/0x40 [ 159.851089][ T3405] ? kmsan_internal_check_memory+0x47d/0x540 [ 159.851128][ T3405] ? kmsan_check_memory+0x1a/0x20 [ 159.851166][ T3405] ? tls_push_record+0x2cff/0x4040 [ 159.851197][ T3405] ? bpf_exec_tx_verdict+0x5ba/0x2530 [ 159.851228][ T3405] ? tls_sw_do_sendpage+0x1779/0x21f0 [ 159.851258][ T3405] ? tls_sw_sendpage+0x247/0x2b0 [ 159.851288][ T3405] ? inet_sendpage+0x1de/0x2f0 [ 159.851318][ T3405] ? kernel_sendpage+0x4cc/0x940 [ 159.851346][ T3405] ? sock_sendpage+0x162/0x220 [ 159.851374][ T3405] ? pipe_to_sendpage+0x3df/0x4f0 [ 159.851399][ T3405] ? __splice_from_pipe+0x5c7/0x1010 [ 159.851427][ T3405] ? generic_splice_sendpage+0x1c6/0x2a0 [ 159.851455][ T3405] ? do_splice+0x26d8/0x32f0 [ 159.851479][ T3405] ? __se_sys_splice+0x81f/0xba0 [ 159.851506][ T3405] ? __x64_sys_splice+0x1a1/0x200 [ 159.851532][ T3405] ? do_syscall_64+0x41/0x90 [ 159.851557][ T3405] ? entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 159.851598][ T3405] ? vprintk_default+0x8a/0xa0 [ 159.851634][ T3405] ? vprintk+0x163/0x180 [ 159.851669][ T3405] ? _printk+0x181/0x1b0 [ 159.851706][ T3405] kmsan_internal_check_memory+0x47d/0x540 [ 159.851748][ T3405] ? kmsan_get_shadow_origin_ptr+0x4d/0xa0 [ 159.851790][ T3405] kmsan_check_memory+0x1a/0x20 [ 159.851828][ T3405] tls_push_record+0x2cff/0x4040 [ 159.851873][ T3405] bpf_exec_tx_verdict+0x5ba/0x2530 [ 159.851926][ T3405] ? _printk+0x181/0x1b0 [ 159.851965][ T3405] ? tls_sw_do_sendpage+0xbd1/0x21f0 [ 159.851998][ T3405] tls_sw_do_sendpage+0x1779/0x21f0 [ 159.852045][ T3405] tls_sw_sendpage+0x247/0x2b0 [ 159.852078][ T3405] ? tls_sw_do_sendpage+0x21f0/0x21f0 [ 159.852110][ T3405] inet_sendpage+0x1de/0x2f0 [ 159.852143][ T3405] ? inet_sendmsg+0x1d0/0x1d0 [ 159.852173][ T3405] kernel_sendpage+0x4cc/0x940 [ 159.852206][ T3405] sock_sendpage+0x162/0x220 [ 159.852238][ T3405] pipe_to_sendpage+0x3df/0x4f0 [ 159.852264][ T3405] ? sock_fasync+0x240/0x240 [ 159.852296][ T3405] __splice_from_pipe+0x5c7/0x1010 [ 159.852324][ T3405] ? generic_splice_sendpage+0x2a0/0x2a0 [ 159.852360][ T3405] generic_splice_sendpage+0x1c6/0x2a0 [ 159.852393][ T3405] ? iter_file_splice_write+0x1a30/0x1a30 [ 159.852421][ T3405] do_splice+0x26d8/0x32f0 [ 159.852446][ T3405] ? kmsan_internal_set_shadow_origin+0x66/0xe0 [ 159.852487][ T3405] ? _raw_spin_lock_irqsave+0x91/0x110 [ 159.852527][ T3405] ? kmsan_get_shadow_origin_ptr+0x4d/0xa0 [ 159.852567][ T3405] ? _raw_spin_lock_irqsave+0x91/0x110 [ 159.852606][ T3405] ? _raw_spin_unlock_irqrestore+0x76/0xb0 [ 159.852652][ T3405] __se_sys_splice+0x81f/0xba0 [ 159.852696][ T3405] __x64_sys_splice+0x1a1/0x200 [ 159.852735][ T3405] do_syscall_64+0x41/0x90 [ 159.852767][ T3405] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 159.852813][ T3405] RIP: 0033:0x7f5a347261aa [ 159.852840][ T3405] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 49 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 13 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89 [ 159.852873][ T3405] RSP: 002b:00007ffe1b674588 EFLAGS: 00000246 ORIG_RAX: 0000000000000113 [ 159.852905][ T3405] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f5a347261aa [ 159.852926][ T3405] RDX: 0000000000000003 RSI: 0000000000000000 RDI: 0000000000000004 [ 159.852942][ T3405] RBP: 0000000000000001 R08: 0000000000100000 R09: 0000000000000004 [ 159.852959][ T3405] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003 [ 159.852975][ T3405] R13: 000055675f9631c0 R14: 000055675f965d58 R15: 00007f5a348e8040 [ 159.853001][ T3405] </TASK> [ 159.858725][ T3405] Kernel Offset: disabled [ 159.858725][ T3405] Rebooting in 10 seconds.. ----------------------------------------
On Tue, 4 Jul 2023 22:32:00 +0900 Tetsuo Handa wrote: > I found a simplified reproducer. > This problem happens when splice() and sendmsg() run in parallel. Could you retry with the upstream (tip of Linus's tree) and see if it still repros? I tried to get a KMSAN kernel to boot on QEMU but it the kernel doesn't want to start, no idea what's going on :(
On 2023/07/07 5:53, Jakub Kicinski wrote: > On Tue, 4 Jul 2023 22:32:00 +0900 Tetsuo Handa wrote: >> I found a simplified reproducer. >> This problem happens when splice() and sendmsg() run in parallel. > > Could you retry with the upstream (tip of Linus's tree) and see if it > still repros? I tried to get a KMSAN kernel to boot on QEMU but it > the kernel doesn't want to start, no idea what's going on :( I can't reproduce this problem as of commit a452483508d7 of linux.git tree, for the simplified reproducer is failing with EBADMSG error. Unless what the simplified reproducer is doing has become illegal, I need to bisect between commit 219d92056ba3 ("splice, net: Fix SPLICE_F_MORE signalling in splice_direct_to_actor()") which fails with EBADMSG error and commit 8a0d57df8938 ("tls: improve lockless access safety of tls_err_abort()") which shows this problem, with commit e6bc8833d80f ("string: use __builtin_memcpy() in strlcpy/strlcat") backported... ---------------------------------------- root@fuzz:~# strace -f ./a.out execve("./a.out", ["./a.out"], 0x7ffedb58a368 /* 26 vars */) = 0 brk(NULL) = 0x564cb3f30000 arch_prctl(0x3001 /* ARCH_??? */, 0x7ffd9c8980a0) = -1 EINVAL (Invalid argument) mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f463d4e6000 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=37735, ...}, AT_EMPTY_PATH) = 0 mmap(NULL, 37735, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f463d4dc000 close(3) = 0 openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P\237\2\0\0\0\0\0"..., 832) = 832 pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784, 64) = 784 pread64(3, "\4\0\0\0 \0\0\0\5\0\0\0GNU\0\2\0\0\300\4\0\0\0\3\0\0\0\0\0\0\0"..., 48, 848) = 48 pread64(3, "\4\0\0\0\24\0\0\0\3\0\0\0GNU\0i8\235HZ\227\223\333\350s\360\352,\223\340."..., 68, 896) = 68 newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=2216304, ...}, AT_EMPTY_PATH) = 0 pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784, 64) = 784 mmap(NULL, 2260560, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f463d200000 mmap(0x7f463d228000, 1658880, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x28000) = 0x7f463d228000 mmap(0x7f463d3bd000, 360448, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1bd000) = 0x7f463d3bd000 mmap(0x7f463d415000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x214000) = 0x7f463d415000 mmap(0x7f463d41b000, 52816, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f463d41b000 close(3) = 0 mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f463d4d9000 arch_prctl(ARCH_SET_FS, 0x7f463d4d9740) = 0 set_tid_address(0x7f463d4d9a10) = 3444 set_robust_list(0x7f463d4d9a20, 24) = 0 rseq(0x7f463d4da0e0, 0x20, 0, 0x53053053) = 0 mprotect(0x7f463d415000, 16384, PROT_READ) = 0 mprotect(0x564cb38ac000, 4096, PROT_READ) = 0 mprotect(0x7f463d520000, 8192, PROT_READ) = 0 prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0 munmap(0x7f463d4dc000, 37735) = 0 socket(AF_INET6, SOCK_STREAM, IPPROTO_IP) = 3 pipe2([4, 5], 0) = 0 setsockopt(3, SOL_TCP, TCP_REPAIR, [1], 4) = 0 connect(3, {sa_family=AF_INET6, sin6_port=htons(0), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_scope_id=0}, 28) = 0 setsockopt(3, SOL_TCP, TCP_ULP, [7564404], 4) = 0 setsockopt(3, SOL_TLS, TLS_TX, "\3\0035\0%T\244\205\333\f0\362B\221\243\234\206\216\220\243u\347\342P|1\24}Q@\377\227"..., 40) = 0 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLDstrace: Process 3445 attached , child_tidptr=0x7f463d4d9a10) = 3445 [pid 3444] close(4 <unfinished ...> [pid 3445] set_robust_list(0x7f463d4d9a20, 24 <unfinished ...> [pid 3444] <... close resumed>) = 0 [pid 3445] <... set_robust_list resumed>) = 0 [pid 3444] write(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 32768 <unfinished ...> [pid 3445] splice(4, NULL, 3, NULL, 1048576, SPLICE_F_MORE <unfinished ...> [pid 3444] <... write resumed>) = 32768 [pid 3444] poll(NULL, 0, 1) = 0 (Timeout) [pid 3444] sendmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="@@@@@@@@@@@@@@@@", iov_len=16}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_FASTOPEN}, MSG_DONTWAIT|MSG_MORE) = -1 EBADMSG (Bad message) [pid 3445] <... splice resumed>) = -1 EBADMSG (Bad message) [pid 3444] exit_group(0 <unfinished ...> [pid 3445] exit_group(0 <unfinished ...> [pid 3444] <... exit_group resumed>) = ? [pid 3445] <... exit_group resumed>) = ? [pid 3444] +++ exited with 0 +++ +++ exited with 0 +++ ----------------------------------------
On 2023/07/07 18:41, Tetsuo Handa wrote: > On 2023/07/07 5:53, Jakub Kicinski wrote: >> On Tue, 4 Jul 2023 22:32:00 +0900 Tetsuo Handa wrote: >>> I found a simplified reproducer. >>> This problem happens when splice() and sendmsg() run in parallel. >> >> Could you retry with the upstream (tip of Linus's tree) and see if it >> still repros? I tried to get a KMSAN kernel to boot on QEMU but it >> the kernel doesn't want to start, no idea what's going on :( (Returned from https://lkml.kernel.org/r/20230707180901.34c17465@kernel.org .) It seems that a series of changes that was merged in this merge window made this problem no longer reproducible. ;-) Which commit should we mark as a fix for this problem?
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c index f2e7302a4d96..cd5366966864 100644 --- a/net/tls/tls_main.c +++ b/net/tls/tls_main.c @@ -1025,6 +1025,7 @@ static int tls_init(struct sock *sk) struct tls_context *ctx; int rc = 0; + sk->sk_allocation |= __GFP_ZERO; tls_build_proto(sk); #ifdef CONFIG_TLS_TOE
syzbot is reporting uninit-value at aes_encrypt(), for block cipher assumes that bytes to encrypt/decrypt is multiple of block size for that cipher but tls_alloc_encrypted_msg() is not initializing padding bytes when required_size is not multiple of block cipher's block size. In order to make sure that padding bytes are automatically initialized, enable __GFP_ZERO flag when setsockopt(SOL_TCP, TCP_ULP, "tls") is called. Reported-by: syzbot <syzbot+828dfc12440b4f6f305d@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=828dfc12440b4f6f305d Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> --- According to C reproducer, this problem happens when bpf_exec_tx_verdict() is called with lower 4 bits of required_size being 0001 and does not happen when being 0100. Thus, I assumed that this problem is caused by lack of initializing padding bytes. But I couldn't figure out why KMSAN reports this problem when bpf_exec_tx_verdict() is called with lower 4 bits of required_size being 0001 for the second time and does not report this problem when bpf_exec_tx_verdict() is called with lower 4 bits of required_size being 0001 for the first time. More deeper problem exists? KMSAN reporting this problem when accessing u64 relevant? net/tls/tls_main.c | 1 + 1 file changed, 1 insertion(+)