Message ID | 20240202101311.it.893-kees@kernel.org |
---|---|
Headers | show |
Series | ubsan: Introduce wrap-around sanitizers | expand |
On Fri, Feb 02, 2024 at 12:01:55PM +0100, Marco Elver wrote: > On Fri, 2 Feb 2024 at 11:16, Kees Cook <keescook@chromium.org> wrote: > > [...] > > +config UBSAN_UNSIGNED_WRAP > > + bool "Perform checking for unsigned arithmetic wrap-around" > > + depends on $(cc-option,-fsanitize=unsigned-integer-overflow) > > + depends on !X86_32 # avoid excessive stack usage on x86-32/clang > > + depends on !COMPILE_TEST > > + help > > + This option enables -fsanitize=unsigned-integer-overflow which checks > > + for wrap-around of any arithmetic operations with unsigned integers. This > > + currently causes x86 to fail to boot. > > My hypothesis is that these options will quickly be enabled by various > test and fuzzing setups, to the detriment of kernel developers. While > the commit message states that these are for experimentation, I do not > think it is at all clear from the Kconfig options. I can certainly rephrase it more strongly. I would hope that anyone enabling the unsigned sanitizer would quickly realize how extremely noisy it is. > Unsigned integer wrap-around is relatively common (it is _not_ UB > after all). While I can appreciate that in some cases wrap around is a > genuine semantic bug, and that's what we want to find with these > changes, ultimately marking all semantically valid wrap arounds to > catch the unmarked ones. Given these patterns are so common, and C > programmers are used to them, it will take a lot of effort to mark all > the intentional cases. But I fear that even if we get to that place, > _unmarked_ but semantically valid unsigned wrap around will keep > popping up again and again. I agree -- it's going to be quite a challenge. My short-term goal is to see how far the sanitizer itself can get with identifying intentional uses. For example, I found two more extremely common code patterns that trip it now: unsigned int i = ...; ... while (i--) { ... } This trips the sanitizer at loop exit. :P It seems like churn to refactor all of these into "for (; i; i--)". The compiler should be able to identify this by looking for later uses of "i", etc. The other is negative constants: -1UL, -3ULL, etc. These are all over the place and very very obviously intentional and should be ignored by the compiler. > What is the long-term vision to minimize the additional churn this may > introduce? My hope is that we can evolve the coverage over time. Solving it all at once won't be possible, but I think we can get pretty far with the signed overflow sanitizer, which runs relatively cleanly already. If we can't make meaningful progress in unsigned annotations, I think we'll have to work on gaining type-based operator overloading so we can grow type-aware arithmetic. That will serve as a much cleaner annotation. E.g. introduce jiffie_t, which wraps. > I think the problem reminds me a little of the data race problem, > although I suspect unsigned integer wraparound is much more common > than data races (which unlike unsigned wrap around is actually UB) - > so chasing all intentional unsigned integer wrap arounds and marking > will take even more effort than marking all intentional data races > (which we're still slowly, but steadily, making progress towards). > > At the very least, these options should 'depends on EXPERT' or even > 'depends on BROKEN' while the story is still being worked out. Perhaps I should hold off on bringing the unsigned sanitizer back? I was hoping to work in parallel with the signed sanitizer, but maybe this isn't the right approach?