Message ID | 20240202101311.it.893-kees@kernel.org |
---|---|
Headers | show |
Series | ubsan: Introduce wrap-around sanitizers | expand |
On Fri, Feb 02, 2024 at 12:01:55PM +0100, Marco Elver wrote: > On Fri, 2 Feb 2024 at 11:16, Kees Cook <keescook@chromium.org> wrote: > > [...] > > +config UBSAN_UNSIGNED_WRAP > > + bool "Perform checking for unsigned arithmetic wrap-around" > > + depends on $(cc-option,-fsanitize=unsigned-integer-overflow) > > + depends on !X86_32 # avoid excessive stack usage on x86-32/clang > > + depends on !COMPILE_TEST > > + help > > + This option enables -fsanitize=unsigned-integer-overflow which checks > > + for wrap-around of any arithmetic operations with unsigned integers. This > > + currently causes x86 to fail to boot. > > My hypothesis is that these options will quickly be enabled by various > test and fuzzing setups, to the detriment of kernel developers. While > the commit message states that these are for experimentation, I do not > think it is at all clear from the Kconfig options. I can certainly rephrase it more strongly. I would hope that anyone enabling the unsigned sanitizer would quickly realize how extremely noisy it is. > Unsigned integer wrap-around is relatively common (it is _not_ UB > after all). While I can appreciate that in some cases wrap around is a > genuine semantic bug, and that's what we want to find with these > changes, ultimately marking all semantically valid wrap arounds to > catch the unmarked ones. Given these patterns are so common, and C > programmers are used to them, it will take a lot of effort to mark all > the intentional cases. But I fear that even if we get to that place, > _unmarked_ but semantically valid unsigned wrap around will keep > popping up again and again. I agree -- it's going to be quite a challenge. My short-term goal is to see how far the sanitizer itself can get with identifying intentional uses. For example, I found two more extremely common code patterns that trip it now: unsigned int i = ...; ... while (i--) { ... } This trips the sanitizer at loop exit. :P It seems like churn to refactor all of these into "for (; i; i--)". The compiler should be able to identify this by looking for later uses of "i", etc. The other is negative constants: -1UL, -3ULL, etc. These are all over the place and very very obviously intentional and should be ignored by the compiler. > What is the long-term vision to minimize the additional churn this may > introduce? My hope is that we can evolve the coverage over time. Solving it all at once won't be possible, but I think we can get pretty far with the signed overflow sanitizer, which runs relatively cleanly already. If we can't make meaningful progress in unsigned annotations, I think we'll have to work on gaining type-based operator overloading so we can grow type-aware arithmetic. That will serve as a much cleaner annotation. E.g. introduce jiffie_t, which wraps. > I think the problem reminds me a little of the data race problem, > although I suspect unsigned integer wraparound is much more common > than data races (which unlike unsigned wrap around is actually UB) - > so chasing all intentional unsigned integer wrap arounds and marking > will take even more effort than marking all intentional data races > (which we're still slowly, but steadily, making progress towards). > > At the very least, these options should 'depends on EXPERT' or even > 'depends on BROKEN' while the story is still being worked out. Perhaps I should hold off on bringing the unsigned sanitizer back? I was hoping to work in parallel with the signed sanitizer, but maybe this isn't the right approach?
On Fri, Feb 2, 2024 at 1:17 PM Kees Cook <keescook@chromium.org> wrote: > > Perhaps I should hold off on bringing the unsigned sanitizer back? I was > hoping to work in parallel with the signed sanitizer, but maybe this > isn't the right approach? If you can do anything to keep it in-tree, I think it would be nice so that others can easily use it to test the tooling and to start to clean up cases. A per-subsystem opt-in like Marco says could be a way, and you could perhaps do one very small subsystem or similar to see how it would look like. Something that could also help would be to split the cases even further (say, only overflows and not underflows), but is that a possibility with the current tooling? Thanks for working on this, Kees! Cheers, Miguel