Message ID | cover.1605896059.git.gustavoars@kernel.org |
---|---|
Headers | show |
Series | Fix fall-through warnings for Clang | expand |
On Fri, Nov 20, 2020 at 11:51:42AM -0800, Jakub Kicinski wrote: > On Fri, 20 Nov 2020 11:30:40 -0800 Kees Cook wrote: > > On Fri, Nov 20, 2020 at 10:53:44AM -0800, Jakub Kicinski wrote: > > > On Fri, 20 Nov 2020 12:21:39 -0600 Gustavo A. R. Silva wrote: > > > > This series aims to fix almost all remaining fall-through warnings in > > > > order to enable -Wimplicit-fallthrough for Clang. > > > > > > > > In preparation to enable -Wimplicit-fallthrough for Clang, explicitly > > > > add multiple break/goto/return/fallthrough statements instead of just > > > > letting the code fall through to the next case. > > > > > > > > Notice that in order to enable -Wimplicit-fallthrough for Clang, this > > > > change[1] is meant to be reverted at some point. So, this patch helps > > > > to move in that direction. > > > > > > > > Something important to mention is that there is currently a discrepancy > > > > between GCC and Clang when dealing with switch fall-through to empty case > > > > statements or to cases that only contain a break/continue/return > > > > statement[2][3][4]. > > > > > > Are we sure we want to make this change? Was it discussed before? > > > > > > Are there any bugs Clangs puritanical definition of fallthrough helped > > > find? > > > > > > IMVHO compiler warnings are supposed to warn about issues that could > > > be bugs. Falling through to default: break; can hardly be a bug?! > > > > It's certainly a place where the intent is not always clear. I think > > this makes all the cases unambiguous, and doesn't impact the machine > > code, since the compiler will happily optimize away any behavioral > > redundancy. > > If none of the 140 patches here fix a real bug, and there is no change > to machine code then it sounds to me like a W=2 kind of a warning. FWIW, this series has found at least one bug so far: https://lore.kernel.org/lkml/CAFCwf11izHF=g1mGry1fE5kvFFFrxzhPSM6qKAO8gxSp=Kr_CQ@mail.gmail.com/
On Sun, 2020-11-22 at 10:21 -0800, James Bottomley wrote: > Please tell me > our reward for all this effort isn't a single missing error print. There were quite literally dozens of logical defects found by the fallthrough additions. Very few were logging only.
On Sun, 2020-11-22 at 11:12 -0800, James Bottomley wrote: > On Sun, 2020-11-22 at 10:25 -0800, Joe Perches wrote: > > On Sun, 2020-11-22 at 10:21 -0800, James Bottomley wrote: > > > Please tell me our reward for all this effort isn't a single > > > missing error print. > > > > There were quite literally dozens of logical defects found > > by the fallthrough additions. Very few were logging only. > > So can you give us the best examples (or indeed all of them if someone > is keeping score)? hopefully this isn't a US election situation ... Gustavo? Are you running for congress now? https://lwn.net/Articles/794944/
On Sun, Nov 22, 2020 at 7:22 PM James Bottomley <James.Bottomley@hansenpartnership.com> wrote: > > Well, it's a problem in an error leg, sure, but it's not a really > compelling reason for a 141 patch series, is it? All that fixing this > error will do is get the driver to print "oh dear there's a problem" > under four more conditions than it previously did. > > We've been at this for three years now with nearly a thousand patches, > firstly marking all the fall throughs with /* fall through */ and later > changing it to fallthrough. At some point we do have to ask if the > effort is commensurate with the protection afforded. Please tell me > our reward for all this effort isn't a single missing error print. It isn't that much effort, isn't it? Plus we need to take into account the future mistakes that it might prevent, too. So even if there were zero problems found so far, it is still a positive change. I would agree if these changes were high risk, though; but they are almost trivial. Cheers, Miguel
On Sun, 2020-11-22 at 21:35 +0100, Miguel Ojeda wrote: > On Sun, Nov 22, 2020 at 7:22 PM James Bottomley > <James.Bottomley@hansenpartnership.com> wrote: > > Well, it's a problem in an error leg, sure, but it's not a really > > compelling reason for a 141 patch series, is it? All that fixing > > this error will do is get the driver to print "oh dear there's a > > problem" under four more conditions than it previously did. > > > > We've been at this for three years now with nearly a thousand > > patches, firstly marking all the fall throughs with /* fall through > > */ and later changing it to fallthrough. At some point we do have > > to ask if the effort is commensurate with the protection > > afforded. Please tell me our reward for all this effort isn't a > > single missing error print. > > It isn't that much effort, isn't it? Well, it seems to be three years of someone's time plus the maintainer review time and series disruption of nearly a thousand patches. Let's be conservative and assume the producer worked about 30% on the series and it takes about 5-10 minutes per patch to review, merge and for others to rework existing series. So let's say it's cost a person year of a relatively junior engineer producing the patches and say 100h of review and application time. The latter is likely the big ticket item because it's what we have in least supply in the kernel (even though it's 20x vs the producer time). > Plus we need to take into account the future mistakes that it might > prevent, too. So even if there were zero problems found so far, it is > still a positive change. Well, the question I was asking is if it's worth the cost which I've tried to outline above. > I would agree if these changes were high risk, though; but they are > almost trivial. It's not about the risk of the changes it's about the cost of implementing them. Even if you discount the producer time (which someone gets to pay for, and if I were the engineering manager, I'd be unhappy about), the review/merge/rework time is pretty significant in exchange for six minor bug fixes. Fine, when a new compiler warning comes along it's certainly reasonable to see if we can benefit from it and the fact that the compiler people think it's worthwhile is enough evidence to assume this initially. But at some point you have to ask whether that assumption is supported by the evidence we've accumulated over the time we've been using it. And if the evidence doesn't support it perhaps it is time to stop the experiment. James
On Mon, 2020-11-23 at 09:54 +1100, Finn Thain wrote: > But is anyone keeping score of the regressions? If unreported bugs > count, what about unreported regressions? Well, I was curious about the former (obviously no tool will tell me about the latter), so I asked git what patches had a fall-through series named in a fixes tag and these three popped out: 9cf51446e686 bpf, powerpc: Fix misuse of fallthrough in bpf_jit_comp() 6a9dc5fd6170 lib: Revert use of fallthrough pseudo-keyword in lib/ 91dbd73a1739 mips/oprofile: Fix fallthrough placement I don't think any of these is fixing a significant problem, but they did cause someone time and trouble to investigate. James
On Sun, Nov 22, 2020 at 11:53:55AM -0800, James Bottomley wrote: > On Sun, 2020-11-22 at 11:22 -0800, Joe Perches wrote: > > On Sun, 2020-11-22 at 11:12 -0800, James Bottomley wrote: > > > On Sun, 2020-11-22 at 10:25 -0800, Joe Perches wrote: > > > > On Sun, 2020-11-22 at 10:21 -0800, James Bottomley wrote: > > > > > Please tell me our reward for all this effort isn't a single > > > > > missing error print. > > > > > > > > There were quite literally dozens of logical defects found > > > > by the fallthrough additions. Very few were logging only. > > > > > > So can you give us the best examples (or indeed all of them if > > > someone is keeping score)? hopefully this isn't a US election > > > situation ... > > > > Gustavo? Are you running for congress now? > > > > https://lwn.net/Articles/794944/ > > That's 21 reported fixes of which about 50% seem to produce no change > in code behaviour at all, a quarter seem to have no user visible effect > with the remaining quarter producing unexpected errors on obscure > configuration parameters, which is why no-one really noticed them > before. The really important point here is the number of bugs this has prevented and will prevent in the future. See an example of this, below: https://lore.kernel.org/linux-iio/20190813135802.GB27392@kroah.com/ This work is still relevant, even if the total number of issues/bugs we find in the process is zero (which is not the case). "The sucky thing about doing hard work to deploy hardening is that the result is totally invisible by definition (things not happening) [..]" - Dmitry Vyukov Thanks -- Gustavo
On Sun, Nov 22, 2020 at 11:36 PM James Bottomley <James.Bottomley@hansenpartnership.com> wrote: > > Well, it seems to be three years of someone's time plus the maintainer > review time and series disruption of nearly a thousand patches. Let's > be conservative and assume the producer worked about 30% on the series > and it takes about 5-10 minutes per patch to review, merge and for > others to rework existing series. So let's say it's cost a person year > of a relatively junior engineer producing the patches and say 100h of > review and application time. The latter is likely the big ticket item > because it's what we have in least supply in the kernel (even though > it's 20x vs the producer time). How are you arriving at such numbers? It is a total of ~200 trivial lines. > It's not about the risk of the changes it's about the cost of > implementing them. Even if you discount the producer time (which > someone gets to pay for, and if I were the engineering manager, I'd be > unhappy about), the review/merge/rework time is pretty significant in > exchange for six minor bug fixes. Fine, when a new compiler warning > comes along it's certainly reasonable to see if we can benefit from it > and the fact that the compiler people think it's worthwhile is enough > evidence to assume this initially. But at some point you have to ask > whether that assumption is supported by the evidence we've accumulated > over the time we've been using it. And if the evidence doesn't support > it perhaps it is time to stop the experiment. Maintainers routinely review 1-line trivial patches, not to mention internal API changes, etc. If some company does not want to pay for that, that's fine, but they don't get to be maintainers and claim `Supported`. Cheers, Miguel
On Mon, Nov 23, 2020 at 4:58 PM James Bottomley <James.Bottomley@hansenpartnership.com> wrote: > > On Mon, 2020-11-23 at 15:19 +0100, Miguel Ojeda wrote: > > On Sun, Nov 22, 2020 at 11:36 PM James Bottomley > > <James.Bottomley@hansenpartnership.com> wrote: [cut] > > > > Maintainers routinely review 1-line trivial patches, not to mention > > internal API changes, etc. > > We're also complaining about the inability to recruit maintainers: > > https://www.theregister.com/2020/06/30/hard_to_find_linux_maintainers_says_torvalds/ > > And burn out: > > http://antirez.com/news/129 Right. > The whole crux of your argument seems to be maintainers' time isn't > important so we should accept all trivial patches ... I'm pushing back > on that assumption in two places, firstly the valulessness of the time > and secondly that all trivial patches are valuable. > > > If some company does not want to pay for that, that's fine, but they > > don't get to be maintainers and claim `Supported`. > > What I'm actually trying to articulate is a way of measuring value of > the patch vs cost ... it has nothing really to do with who foots the > actual bill. > > One thesis I'm actually starting to formulate is that this continual > devaluing of maintainers is why we have so much difficulty keeping and > recruiting them. Absolutely. This is just one of the factors involved, but a significant one IMV.
On Mon, 2020-11-23 at 07:03 -0600, Gustavo A. R. Silva wrote: > On Sun, Nov 22, 2020 at 11:53:55AM -0800, James Bottomley wrote: > > On Sun, 2020-11-22 at 11:22 -0800, Joe Perches wrote: > > > On Sun, 2020-11-22 at 11:12 -0800, James Bottomley wrote: > > > > On Sun, 2020-11-22 at 10:25 -0800, Joe Perches wrote: > > > > > On Sun, 2020-11-22 at 10:21 -0800, James Bottomley wrote: > > > > > > Please tell me our reward for all this effort isn't a > > > > > > single missing error print. > > > > > > > > > > There were quite literally dozens of logical defects found > > > > > by the fallthrough additions. Very few were logging only. > > > > > > > > So can you give us the best examples (or indeed all of them if > > > > someone is keeping score)? hopefully this isn't a US election > > > > situation ... > > > > > > Gustavo? Are you running for congress now? > > > > > > https://lwn.net/Articles/794944/ > > > > That's 21 reported fixes of which about 50% seem to produce no > > change in code behaviour at all, a quarter seem to have no user > > visible effect with the remaining quarter producing unexpected > > errors on obscure configuration parameters, which is why no-one > > really noticed them before. > > The really important point here is the number of bugs this has > prevented and will prevent in the future. See an example of this, > below: > > https://lore.kernel.org/linux-iio/20190813135802.GB27392@kroah.com/ I think this falls into the same category as the other six bugs: it changes the output/input for parameters but no-one has really noticed, usually because the command is obscure or the bias effect is minor. > This work is still relevant, even if the total number of issues/bugs > we find in the process is zero (which is not the case). Really, no ... something which produces no improvement has no value at all ... we really shouldn't be wasting maintainer time with it because it has a cost to merge. I'm not sure we understand where the balance lies in value vs cost to merge but I am confident in the zero value case. > "The sucky thing about doing hard work to deploy hardening is that > the result is totally invisible by definition (things not happening) > [..]" > - Dmitry Vyukov Really, no. Something that can't be measured at all doesn't exist. And actually hardening is one of those things you can measure (which I do have to admit isn't true for everything in the security space) ... it's number of exploitable bugs found before you did it vs number of exploitable bugs found after you did it. Usually hardening eliminates a class of bug, so the way I've measured hardening before is to go through the CVE list for the last couple of years for product X, find all the bugs that are of the class we're looking to eliminate and say if we had hardened X against this class of bug we'd have eliminated Y% of the exploits. It can be quite impressive if Y is a suitably big number. James
On Mon, Nov 23, 2020 at 4:58 PM James Bottomley <James.Bottomley@hansenpartnership.com> wrote: > > Well, I used git. It says that as of today in Linus' tree we have 889 > patches related to fall throughs and the first series went in in > october 2017 ... ignoring a couple of outliers back to February. I can see ~10k insertions over ~1k commits and 15 years that mention a fallthrough in the entire repo. That is including some commits (like the biggest one, 960 insertions) that have nothing to do with C fallthrough. A single kernel release has an order of magnitude more changes than this... But if we do the math, for an author, at even 1 minute per line change and assuming nothing can be automated at all, it would take 1 month of work. For maintainers, a couple of trivial lines is noise compared to many other patches. In fact, this discussion probably took more time than the time it would take to review the 200 lines. :-) > We're also complaining about the inability to recruit maintainers: > > https://www.theregister.com/2020/06/30/hard_to_find_linux_maintainers_says_torvalds/ > > And burn out: > > http://antirez.com/news/129 Accepting trivial and useful 1-line patches is not what makes a voluntary maintainer quit... Thankless work with demanding deadlines is. > The whole crux of your argument seems to be maintainers' time isn't > important so we should accept all trivial patches I have not said that, at all. In fact, I am a voluntary one and I welcome patches like this. It takes very little effort on my side to review and it helps the kernel overall. Paid maintainers are the ones that can take care of big features/reviews. > What I'm actually trying to articulate is a way of measuring value of > the patch vs cost ... it has nothing really to do with who foots the > actual bill. I understand your point, but you were the one putting it in terms of a junior FTE. In my view, 1 month-work (worst case) is very much worth removing a class of errors from a critical codebase. > One thesis I'm actually starting to formulate is that this continual > devaluing of maintainers is why we have so much difficulty keeping and > recruiting them. That may very well be true, but I don't feel anybody has devalued maintainers in this discussion. Cheers, Miguel
On Fri, Nov 20, 2020 at 12:21:39PM -0600, Gustavo A. R. Silva wrote: > IB/hfi1: Fix fall-through warnings for Clang > IB/mlx4: Fix fall-through warnings for Clang > IB/qedr: Fix fall-through warnings for Clang > RDMA/mlx5: Fix fall-through warnings for Clang I picked these four to the rdma tree, thanks Jason
On Mon, 23 Nov 2020, Miguel Ojeda wrote: > On Mon, 23 Nov 2020, Finn Thain wrote: > > > On Sun, 22 Nov 2020, Miguel Ojeda wrote: > > > > > > > > It isn't that much effort, isn't it? Plus we need to take into > > > account the future mistakes that it might prevent, too. > > > > We should also take into account optimisim about future improvements > > in tooling. > > > Not sure what you mean here. There is no reliable way to guess what the > intention was with a missing fallthrough, even if you parsed whitespace > and indentation. > What I meant was that you've used pessimism as if it was fact. For example, "There is no way to guess what the effect would be if the compiler trained programmers to add a knee-jerk 'break' statement to avoid a warning". Moreover, what I meant was that preventing programmer mistakes is a problem to be solved by development tools. The idea that retro-fitting new language constructs onto mature code is somehow necessary to "prevent future mistakes" is entirely questionable. > > > So even if there were zero problems found so far, it is still a > > > positive change. > > > > > > > It is if you want to spin it that way. > > > How is that a "spin"? It is a fact that we won't get *implicit* > fallthrough mistakes anymore (in particular if we make it a hard error). > Perhaps "handwaving" is a better term? > > > I would agree if these changes were high risk, though; but they are > > > almost trivial. > > > > > > > This is trivial: > > > > case 1: > > this(); > > + fallthrough; > > case 2: > > that(); > > > > But what we inevitably get is changes like this: > > > > case 3: > > this(); > > + break; > > case 4: > > hmmm(); > > > > Why? Mainly to silence the compiler. Also because the patch author > > argued successfully that they had found a theoretical bug, often in > > mature code. > > > If someone changes control flow, that is on them. Every kernel developer > knows what `break` does. > Sure. And if you put -Wimplicit-fallthrough into the Makefile and if that leads to well-intentioned patches that cause regressions, it is partly on you. Have you ever considered the overall cost of the countless -Wpresume-incompetence flags? Perhaps you pay the power bill for a build farm that produces logs that no-one reads? Perhaps you've run git bisect, knowing that the compiler messages are not interesting? Or compiled software in using a language that generates impenetrable messages? If so, here's a tip: # grep CFLAGS /etc/portage/make.conf CFLAGS="... -Wno-all -Wno-extra ..." CXXFLAGS="${CFLAGS}" Now allow me some pessimism: the hardware upgrades, gigawatt hours and wait time attributable to obligatory static analyses are a net loss. > > But is anyone keeping score of the regressions? If unreported bugs > > count, what about unreported regressions? > > > Introducing `fallthrough` does not change semantics. If you are really > keen, you can always compare the objects because the generated code > shouldn't change. > No, it's not for me to prove that such patches don't affect code generation. That's for the patch author and (unfortunately) for reviewers. > Cheers, > Miguel >
On Tue, 2020-11-24 at 11:58 +1100, Finn Thain wrote: > it's not for me to prove that such patches don't affect code > generation. That's for the patch author and (unfortunately) for reviewers. Ideally, that proof would be provided by the compilation system itself and not patch authors nor reviewers nor maintainers. Unfortunately gcc does not guarantee repeatability or deterministic output. To my knowledge, neither does clang.
On Sun, Nov 22, 2020 at 8:17 AM Kees Cook <keescook@chromium.org> wrote: > > On Fri, Nov 20, 2020 at 11:51:42AM -0800, Jakub Kicinski wrote: > > If none of the 140 patches here fix a real bug, and there is no change > > to machine code then it sounds to me like a W=2 kind of a warning. > > FWIW, this series has found at least one bug so far: > https://lore.kernel.org/lkml/CAFCwf11izHF=g1mGry1fE5kvFFFrxzhPSM6qKAO8gxSp=Kr_CQ@mail.gmail.com/ So looks like the bulk of these are: switch (x) { case 0: ++x; default: break; } I have a patch that fixes those up for clang: https://reviews.llvm.org/D91895 There's 3 other cases that don't quite match between GCC and Clang I observe in the kernel: switch (x) { case 0: ++x; default: goto y; } y:; switch (x) { case 0: ++x; default: return; } switch (x) { case 0: ++x; default: ; } Based on your link, and Nathan's comment on my patch, maybe Clang should continue to warn for the above (at least the `default: return;` case) and GCC should change? While the last case looks harmless, there were only 1 or 2 across the tree in my limited configuration testing; I really think we should just add `break`s for those.
On Mon, 23 Nov 2020, Joe Perches wrote: > On Tue, 2020-11-24 at 11:58 +1100, Finn Thain wrote: > > it's not for me to prove that such patches don't affect code > > generation. That's for the patch author and (unfortunately) for > > reviewers. > > Ideally, that proof would be provided by the compilation system itself > and not patch authors nor reviewers nor maintainers. > > Unfortunately gcc does not guarantee repeatability or deterministic > output. To my knowledge, neither does clang. > Yes, I've said the same thing myself. But having attempted it, I now think this is a hard problem. YMMV. https://lore.kernel.org/linux-scsi/alpine.LNX.2.22.394.2004281017310.12@nippy.intranet/ https://lore.kernel.org/linux-scsi/alpine.LNX.2.22.394.2005211358460.8@nippy.intranet/
On Mon, Nov 23, 2020 at 04:03:45PM -0400, Jason Gunthorpe wrote: > On Fri, Nov 20, 2020 at 12:21:39PM -0600, Gustavo A. R. Silva wrote: > > > IB/hfi1: Fix fall-through warnings for Clang > > IB/mlx4: Fix fall-through warnings for Clang > > IB/qedr: Fix fall-through warnings for Clang > > RDMA/mlx5: Fix fall-through warnings for Clang > > I picked these four to the rdma tree, thanks Awesome. :) Thank you, Jason. -- Gustavo
On Mon, Nov 23, 2020 at 08:38:46PM +0000, Mark Brown wrote: > On Fri, 20 Nov 2020 12:21:39 -0600, Gustavo A. R. Silva wrote: > > This series aims to fix almost all remaining fall-through warnings in > > order to enable -Wimplicit-fallthrough for Clang. > > > > In preparation to enable -Wimplicit-fallthrough for Clang, explicitly > > add multiple break/goto/return/fallthrough statements instead of just > > letting the code fall through to the next case. > > > > [...] > > Applied to > > https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator.git for-next > > Thanks! > > [1/1] regulator: as3722: Fix fall-through warnings for Clang > commit: b52b417ccac4fae5b1f2ec4f1d46eb91e4493dc5 Thank you, Mark. -- Gustavo
On Mon, Nov 23, 2020 at 05:32:51PM -0800, Nick Desaulniers wrote: > On Sun, Nov 22, 2020 at 8:17 AM Kees Cook <keescook@chromium.org> wrote: > > > > On Fri, Nov 20, 2020 at 11:51:42AM -0800, Jakub Kicinski wrote: > > > If none of the 140 patches here fix a real bug, and there is no change > > > to machine code then it sounds to me like a W=2 kind of a warning. > > > > FWIW, this series has found at least one bug so far: > > https://lore.kernel.org/lkml/CAFCwf11izHF=g1mGry1fE5kvFFFrxzhPSM6qKAO8gxSp=Kr_CQ@mail.gmail.com/ > > So looks like the bulk of these are: > switch (x) { > case 0: > ++x; > default: > break; > } > > I have a patch that fixes those up for clang: > https://reviews.llvm.org/D91895 I still think this isn't right -- it's a case statement that runs off the end without an explicit flow control determination. I think Clang is right to warn for these, and GCC should also warn.
On Mon, Nov 23, 2020 at 08:31:30AM -0800, James Bottomley wrote: > Really, no ... something which produces no improvement has no value at > all ... we really shouldn't be wasting maintainer time with it because > it has a cost to merge. I'm not sure we understand where the balance > lies in value vs cost to merge but I am confident in the zero value > case. What? We can't measure how many future bugs aren't introduced because the kernel requires explicit case flow-control statements for all new code. We already enable -Wimplicit-fallthrough globally, so that's not the discussion. The issue is that Clang is (correctly) even more strict than GCC for this, so these are the remaining ones to fix for full Clang coverage too. People have spent more time debating this already than it would have taken to apply the patches. :) This is about robustness and language wrangling. It's a big code-base, and this is the price of our managing technical debt for permanent robustness improvements. (The numbers I ran from Gustavo's earlier patches were that about 10% of the places adjusted were identified as legitimate bugs being fixed. This final series may be lower, but there are still bugs being found from it -- we need to finish this and shut the door on it for good.)
On Tue, 24 Nov 2020, Kees Cook wrote: > On Mon, Nov 23, 2020 at 08:31:30AM -0800, James Bottomley wrote: > > Really, no ... something which produces no improvement has no value at > > all ... we really shouldn't be wasting maintainer time with it because > > it has a cost to merge. I'm not sure we understand where the balance > > lies in value vs cost to merge but I am confident in the zero value > > case. > > What? We can't measure how many future bugs aren't introduced because > the kernel requires explicit case flow-control statements for all new > code. > These statements are not "missing" unless you presume that code written before the latest de facto language spec was written should somehow be held to that spec. If the 'fallthrough' statement is not part of the latest draft spec then we should ask why not before we embrace it. Being that the kernel still prefers -std=gnu89 you might want to consider what has prevented -std=gnu99 or -std=gnu2x etc. > We already enable -Wimplicit-fallthrough globally, so that's not the > discussion. The issue is that Clang is (correctly) even more strict than > GCC for this, so these are the remaining ones to fix for full Clang > coverage too. > Seems to me you should be patching the compiler. When you have consensus among the language lawyers you'll have more credibility with those being subjected to enforcement.
On Tue, Nov 24, 2020 at 11:24 PM Finn Thain <fthain@telegraphics.com.au> wrote: > > These statements are not "missing" unless you presume that code written > before the latest de facto language spec was written should somehow be > held to that spec. There is no "language spec" the kernel adheres to. Even if it did, kernel code is not frozen. If an improvement is found, it should be applied. > If the 'fallthrough' statement is not part of the latest draft spec then > we should ask why not before we embrace it. Being that the kernel still > prefers -std=gnu89 you might want to consider what has prevented > -std=gnu99 or -std=gnu2x etc. The C standard has nothing to do with this. We use compiler extensions of several kinds, for many years. Even discounting those extensions, the kernel is not even conforming to C due to e.g. strict aliasing. I am not sure what you are trying to argue here. But, since you insist: yes, the `fallthrough` attribute is in the current C2x draft. Cheers, Miguel
On Tue, Nov 24, 2020 at 1:58 AM Finn Thain <fthain@telegraphics.com.au> wrote: > > What I meant was that you've used pessimism as if it was fact. "future mistakes that it might prevent" is neither pessimism nor states a fact. > For example, "There is no way to guess what the effect would be if the > compiler trained programmers to add a knee-jerk 'break' statement to avoid > a warning". It is only knee-jerk if you think you are infallible. > Moreover, what I meant was that preventing programmer mistakes is a > problem to be solved by development tools This warning comes from a development tool -- the compiler. > The idea that retro-fitting new > language constructs onto mature code is somehow necessary to "prevent > future mistakes" is entirely questionable. The kernel is not a frozen codebase. Further, "mature code vs. risk of change" arguments don't apply here because the semantics of the program and binary output isn't changing. > Sure. And if you put -Wimplicit-fallthrough into the Makefile and if that > leads to well-intentioned patches that cause regressions, it is partly on > you. Again: adding a `fallthrough` does not change the program semantics. If you are a maintainer and want to cross-check, compare the codegen. > Have you ever considered the overall cost of the countless > -Wpresume-incompetence flags? Yeah: negative. On the other hand, the overall cost of the countless -fI-am-infallible flags is very noticeable. > Perhaps you pay the power bill for a build farm that produces logs that > no-one reads? Perhaps you've run git bisect, knowing that the compiler > messages are not interesting? Or compiled software in using a language > that generates impenetrable messages? If so, here's a tip: > > # grep CFLAGS /etc/portage/make.conf > CFLAGS="... -Wno-all -Wno-extra ..." > CXXFLAGS="${CFLAGS}" > > Now allow me some pessimism: the hardware upgrades, gigawatt hours and > wait time attributable to obligatory static analyses are a net loss. If you really believe compiler warnings and static analysis are useless and costly, I think there is not much point in continuing the discussion. > No, it's not for me to prove that such patches don't affect code > generation. That's for the patch author and (unfortunately) for reviewers. I was not asking you to prove it. I am stating that proving it is very easy. Cheers, Miguel
On Wed, 25 Nov 2020, Miguel Ojeda wrote: > > The C standard has nothing to do with this. We use compiler extensions > of several kinds, for many years. Even discounting those extensions, the > kernel is not even conforming to C due to e.g. strict aliasing. I am not > sure what you are trying to argue here. > I'm saying that supporting the official language spec makes more sense than attempting to support a multitude of divergent interpretations of the spec (i.e. gcc, clang, coverity etc.) I'm also saying that the reason why we use -std=gnu89 is that existing code was written in that language, not in ad hoc languages comprised of collections of extensions that change with every release. > But, since you insist: yes, the `fallthrough` attribute is in the > current C2x draft. > Thank you for checking. I found a free version that's only 6 weeks old: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2583.pdf It will be interesting to see whether 6.7.11.5 changes once the various implementations reach agreement.
On Mon, Nov 23, 2020 at 9:38 PM James Bottomley <James.Bottomley@hansenpartnership.com> wrote: > > So you think a one line patch should take one minute to produce ... I > really don't think that's grounded in reality. No, I have not said that. Please don't put words in my mouth (again). I have said *authoring* lines of *this* kind takes a minute per line. Specifically: lines fixing the fallthrough warning mechanically and repeatedly where the compiler tells you to, and doing so full-time for a month. For instance, take the following one from Gustavo. Are you really saying it takes 12 minutes (your number) to write that `break;`? diff --git a/drivers/gpu/drm/via/via_irq.c b/drivers/gpu/drm/via/via_irq.c index 24cc445169e2..a3e0fb5b8671 100644 --- a/drivers/gpu/drm/via/via_irq.c +++ b/drivers/gpu/drm/via/via_irq.c @@ -364,6 +364,7 @@ int via_wait_irq(struct drm_device *dev, void *data, struct drm_file *file_priv) irqwait->request.sequence += atomic_read(&cur_irq->irq_received); irqwait->request.type &= ~_DRM_VBLANK_RELATIVE; + break; case VIA_IRQ_ABSOLUTE: break; default: > I suppose a one line > patch only takes a minute to merge with b4 if no-one reviews or tests > it, but that's not really desirable. I have not said that either. I said reviewing and merging those are noise compared to any complex patch. Testing should be done by the author comparing codegen. > Part of what I'm trying to measure is the "and useful" bit because > that's not a given. It is useful since it makes intent clear. It also catches actual bugs, which is even more valuable. > Well, you know, subsystems are very different in terms of the amount of > patches a maintainer has to process per release cycle of the kernel. > If a maintainer is close to capacity, additional patches, however > trivial, become a problem. If a maintainer has spare cycles, trivial > patches may look easy. First of all, voluntary maintainers choose their own workload. Furthermore, we already measure capacity in the `MAINTAINERS` file: maintainers can state they can only handle a few patches. Finally, if someone does not have time for a trivial patch, they are very unlikely to have any time to review big ones. > You seem to be saying that because you find it easy to merge trivial > patches, everyone should. Again, I have not said anything of the sort. Cheers, Miguel
On Wed, Nov 25, 2020 at 12:53 AM Finn Thain <fthain@telegraphics.com.au> wrote: > > I'm saying that supporting the official language spec makes more sense > than attempting to support a multitude of divergent interpretations of the > spec (i.e. gcc, clang, coverity etc.) Making the kernel strictly conforming is a ship that sailed long ago, for several reasons. Anyway, supporting several compilers and other tools, regardless of extensions, is valuable. > I'm also saying that the reason why we use -std=gnu89 is that existing > code was written in that language, not in ad hoc languages comprised of > collections of extensions that change with every release. No, we aren't particularly tied to `gnu89` or anything like that. We could actually go for `gnu11` already, since the minimum GCC and Clang support it. Even if a bit of code needs fixing, that shouldn't be a problem if someone puts the work. In other words, the kernel code is not frozen, nor are the features it uses from compilers. They do, in fact, change from time to time. > Thank you for checking. I found a free version that's only 6 weeks old: You're welcome! There are quite a few new attributes coming, mostly following C++ ones. > It will be interesting to see whether 6.7.11.5 changes once the various > implementations reach agreement. Not sure what you mean. The standard does not evolve through implementations' agreement (although standardizing existing practice is one of the best arguments to back a change). Cheers, Miguel
On Tue, 2020-11-24 at 13:32 -0800, Kees Cook wrote: > On Mon, Nov 23, 2020 at 08:31:30AM -0800, James Bottomley wrote: > > Really, no ... something which produces no improvement has no value > > at all ... we really shouldn't be wasting maintainer time with it > > because it has a cost to merge. I'm not sure we understand where > > the balance lies in value vs cost to merge but I am confident in > > the zero value case. > > What? We can't measure how many future bugs aren't introduced because > the kernel requires explicit case flow-control statements for all new > code. No but we can measure how vulnerable our current coding habits are to the mistake this warning would potentially prevent. I don't think it's wrong to extrapolate that if we had no instances at all of prior coding problems we likely wouldn't have any in future either making adopting the changes needed to enable the warning valueless ... that's the zero value case I was referring to above. Now, what we have seems to be about 6 cases (at least what's been shown in this thread) where a missing break would cause potentially user visible issues. That means the value of this isn't zero, but it's not a no-brainer massive win either. That's why I think asking what we've invested vs the return isn't a useless exercise. > We already enable -Wimplicit-fallthrough globally, so that's not the > discussion. The issue is that Clang is (correctly) even more strict > than GCC for this, so these are the remaining ones to fix for full > Clang coverage too. > > People have spent more time debating this already than it would have > taken to apply the patches. :) You mean we've already spent 90% of the effort to come this far so we might as well go the remaining 10% because then at least we get some return? It's certainly a clinching argument in defence procurement ... > This is about robustness and language wrangling. It's a big code- > base, and this is the price of our managing technical debt for > permanent robustness improvements. (The numbers I ran from Gustavo's > earlier patches were that about 10% of the places adjusted were > identified as legitimate bugs being fixed. This final series may be > lower, but there are still bugs being found from it -- we need to > finish this and shut the door on it for good.) I got my six patches by analyzing the lwn.net report of the fixes that was cited which had 21 of which 50% didn't actually change the emitted code, and 25% didn't have a user visible effect. But the broader point I'm making is just because the compiler people come up with a shiny new warning doesn't necessarily mean the problem it's detecting is one that causes us actual problems in the code base. I'd really be happier if we had a theory about what classes of CVE or bug we could eliminate before we embrace the next new warning. James
On Mon, Nov 23, 2020 at 07:58:06AM -0800, James Bottomley wrote: > On Mon, 2020-11-23 at 15:19 +0100, Miguel Ojeda wrote: > > On Sun, Nov 22, 2020 at 11:36 PM James Bottomley > > <James.Bottomley@hansenpartnership.com> wrote: > > > It's not about the risk of the changes it's about the cost of > > > implementing them. Even if you discount the producer time (which > > > someone gets to pay for, and if I were the engineering manager, I'd > > > be unhappy about), the review/merge/rework time is pretty > > > significant in exchange for six minor bug fixes. Fine, when a new > > > compiler warning comes along it's certainly reasonable to see if we > > > can benefit from it and the fact that the compiler people think > > > it's worthwhile is enough evidence to assume this initially. But > > > at some point you have to ask whether that assumption is supported > > > by the evidence we've accumulated over the time we've been using > > > it. And if the evidence doesn't support it perhaps it is time to > > > stop the experiment. > > > > Maintainers routinely review 1-line trivial patches, not to mention > > internal API changes, etc. > > We're also complaining about the inability to recruit maintainers: > > https://www.theregister.com/2020/06/30/hard_to_find_linux_maintainers_says_torvalds/ > > And burn out: > > http://antirez.com/news/129 > > The whole crux of your argument seems to be maintainers' time isn't > important so we should accept all trivial patches ... I'm pushing back > on that assumption in two places, firstly the valulessness of the time > and secondly that all trivial patches are valuable. You're assuming burn out or recruitment problems is due to patch workload or too many "trivial" patches. In my experience, "other maintainers" is by far the biggest cause of burn out for my kernel maintenance work. Certainly arguing with a maintainer about some obviously-correct patch series must be a good example of this. Sean
On Mon, Nov 23, 2020 at 10:39 PM James Bottomley <James.Bottomley@hansenpartnership.com> wrote: > On Mon, 2020-11-23 at 19:56 +0100, Miguel Ojeda wrote: > > On Mon, Nov 23, 2020 at 4:58 PM James Bottomley > > <James.Bottomley@hansenpartnership.com> wrote: ... > > But if we do the math, for an author, at even 1 minute per line > > change and assuming nothing can be automated at all, it would take 1 > > month of work. For maintainers, a couple of trivial lines is noise > > compared to many other patches. > > So you think a one line patch should take one minute to produce ... I > really don't think that's grounded in reality. I suppose a one line > patch only takes a minute to merge with b4 if no-one reviews or tests > it, but that's not really desirable. In my practice most of the one line patches were either to fix or to introduce quite interesting issues. 1 minute is 2-3 orders less than usually needed for such patches. That's why I don't like churn produced by people who often even didn't compile their useful contributions.
On Tue, Nov 24, 2020 at 11:05 PM James Bottomley <James.Bottomley@hansenpartnership.com> wrote: > > On Tue, 2020-11-24 at 13:32 -0800, Kees Cook wrote: > > We already enable -Wimplicit-fallthrough globally, so that's not the > > discussion. The issue is that Clang is (correctly) even more strict > > than GCC for this, so these are the remaining ones to fix for full > > Clang coverage too. > > > > People have spent more time debating this already than it would have > > taken to apply the patches. :) > > You mean we've already spent 90% of the effort to come this far so we > might as well go the remaining 10% because then at least we get some > return? It's certainly a clinching argument in defence procurement ... So developers and distributions using Clang can't have -Wimplicit-fallthrough enabled because GCC is less strict (which has been shown in this thread to lead to bugs)? We'd like to have nice things too, you know. I even agree that most of the churn comes from case 0: ++x; default: break; which I have a patch for: https://reviews.llvm.org/D91895. I agree that can never lead to bugs. But that's not the sole case of this series, just most of them. Though, note how the reviewer (C++ spec editor and clang front end owner) in https://reviews.llvm.org/D91895 even asks in that review how maybe a new flag would be more appropriate for a watered down/stylistic variant of the existing behavior. And if the current wording of Documentation/process/deprecated.rst around "fallthrough" is a straightforward rule of thumb, I kind of agree with him. > > > This is about robustness and language wrangling. It's a big code- > > base, and this is the price of our managing technical debt for > > permanent robustness improvements. (The numbers I ran from Gustavo's > > earlier patches were that about 10% of the places adjusted were > > identified as legitimate bugs being fixed. This final series may be > > lower, but there are still bugs being found from it -- we need to > > finish this and shut the door on it for good.) > > I got my six patches by analyzing the lwn.net report of the fixes that > was cited which had 21 of which 50% didn't actually change the emitted > code, and 25% didn't have a user visible effect. > > But the broader point I'm making is just because the compiler people > come up with a shiny new warning doesn't necessarily mean the problem That's not what this is though; you're attacking a strawman. I'd encourage you to bring that up when that actually occurs, unlike this case since it's actively hindering getting -Wimplicit-fallthrough enabled for Clang. This is not a shiny new warning; it's already on for GCC and has existed in both compilers for multiple releases. And I'll also note that warnings are warnings and not errors because they cannot be proven to be bugs in 100% of cases, but they have led to bugs in the past. They require a human to review their intent and remove ambiguities. If 97% of cases would end in a break ("Expert C Programming: Deep C Secrets" - Peter van der Linden), then it starts to look to me like a language defect; certainly an incorrectly chosen default. But the compiler can't know those 3% were intentional, unless you're explicit for those exceptional cases. > it's detecting is one that causes us actual problems in the code base. > I'd really be happier if we had a theory about what classes of CVE or > bug we could eliminate before we embrace the next new warning. We don't generally file CVEs and waiting for them to occur might be too reactive, but I agree that pointing to some additional documentation in commit messages about how a warning could lead to a bug would make it clearer to reviewers why being able to enable it treewide, even if there's no bug in their particular subsystem, is in the general interest of the commons. On Mon, Nov 23, 2020 at 7:58 AM James Bottomley <James.Bottomley@hansenpartnership.com> wrote: > > We're also complaining about the inability to recruit maintainers: > > https://www.theregister.com/2020/06/30/hard_to_find_linux_maintainers_says_torvalds/ > > And burn out: > > http://antirez.com/news/129 > > The whole crux of your argument seems to be maintainers' time isn't > important so we should accept all trivial patches ... I'm pushing back > on that assumption in two places, firstly the valulessness of the time > and secondly that all trivial patches are valuable. It's critical to the longevity of any open source project that there are not single points of failure. If someone is not expendable or replaceable (or claims to be) then that's a risk to the project and a bottleneck. Not having a replacement in training or some form of redundancy is short sighted. If trivial patches are adding too much to your workload, consider training a co-maintainer or asking for help from one of your reviewers whom you trust. I don't doubt it's hard to find maintainers, but existing maintainers should go out of their way to entrust co-maintainers especially when they find their workload becomes too high. And reviewing/picking up trivial patches is probably a great way to get started. If we allow too much knowledge of any one subsystem to collect with one maintainer, what happens when that maintainer leaves the community (which, given a finite lifespan, is an inevitability)?
On Wed, Nov 25, 2020 at 5:24 PM Jakub Kicinski <kuba@kernel.org> wrote: > > And just to spell it out, > > case ENUM_VALUE1: > bla(); > break; > case ENUM_VALUE2: > bla(); > default: > break; > > is a fairly idiomatic way of indicating that not all values of the enum > are expected to be handled by the switch statement. It looks like a benign typo to me -- `ENUM_VALUE2` does not follow the same pattern like `ENUM_VALUE1`. To me, the presence of the `default` is what indicates (explicitly) that not everything is handled. > Applying a real patch set and then getting a few follow ups the next day > for trivial coding things like fallthrough missing or static missing, > just because I didn't have the full range of compilers to check with > before applying makes me feel pretty shitty, like I'm not doing a good > job. YMMV. The number of compilers, checkers, static analyzers, tests, etc. we use keeps going up. That, indeed, means maintainers will miss more things (unless maintainers do more work than before). But catching bugs before they happen is *not* a bad thing. Perhaps we could encourage more rebasing in -next (while still giving credit to bots and testers) to avoid having many fixing commits afterwards, but that is orthogonal. I really don't think we should encourage the feeling that a maintainer is doing a bad job if they don't catch everything on their reviews. Any review is worth it. Maintainers, in the end, are just the "guaranteed" reviewers that decide when the code looks reasonable enough. They should definitely not feel pressured to be perfect. Cheers, Miguel
On Tue, Nov 24, 2020 at 11:05:35PM -0800, James Bottomley wrote: > Now, what we have seems to be about 6 cases (at least what's been shown > in this thread) where a missing break would cause potentially user > visible issues. That means the value of this isn't zero, but it's not > a no-brainer massive win either. That's why I think asking what we've > invested vs the return isn't a useless exercise. The number is much higher[1]. If it were 6 in the entire history of the kernel, I would agree with you. :) Some were fixed _before_ Gustavo's effort too, which I also count towards the idea of "this is a dangerous weakness in C, and now we have stopped it forever." > But the broader point I'm making is just because the compiler people > come up with a shiny new warning doesn't necessarily mean the problem > it's detecting is one that causes us actual problems in the code base. > I'd really be happier if we had a theory about what classes of CVE or > bug we could eliminate before we embrace the next new warning. But we did! It was long ago justified and documented[2], and even links to the CWE[3] for it. This wasn't random joy over discovering a new warning we could turn on, this was turning on a warning that the compiler folks finally gave us to handle an entire class of flaws. If we need to update the code-base to address it not a useful debate -- that was settled already, even if you're only discovering it now. :P. This last patch set is about finishing that work for Clang, which is correctly even more strict than GCC. -Kees [1] https://outflux.net/slides/2019/lss/kspp.pdf calls out specific numbers (about 6.5% of the patches fixed missing breaks): v4.19: 3 of 129 v4.20: 2 of 59 v5.0: 3 of 56 v5.1: 10 of 100 v5.2: 6 of 71 v5.3: 7 of 69 And in the history of the kernel, it's been an ongoing source of flaws: $ l --no-merges | grep -i 'missing break' | wc -l 185 The frequency of such errors being "naturally" found was pretty steady until the static checkers started warning, and then it was on the rise, but the full effort flushed the rest out, and now it's dropped to almost zero: 1 v2.6.12 3 v2.6.16.28 1 v2.6.17 1 v2.6.19 2 v2.6.21 1 v2.6.22 3 v2.6.24 3 v2.6.29 1 v2.6.32 1 v2.6.33 1 v2.6.35 4 v2.6.36 3 v2.6.38 2 v2.6.39 7 v3.0 2 v3.1 2 v3.2 2 v3.3 3 v3.4 1 v3.5 8 v3.6 7 v3.7 3 v3.8 6 v3.9 3 v3.10 2 v3.11 5 v3.12 5 v3.13 2 v3.14 4 v3.15 2 v3.16 3 v3.17 2 v3.18 2 v3.19 1 v4.0 2 v4.1 5 v4.2 4 v4.5 5 v4.7 6 v4.8 1 v4.9 3 v4.10 2 v4.11 6 v4.12 3 v4.13 2 v4.14 5 v4.15 2 v4.16 7 v4.18 2 v4.19 6 v4.20 3 v5.0 12 v5.1 3 v5.2 4 v5.3 2 v5.4 1 v5.8 And the reason it's fully zero, is because we still have the cases we're cleaning up right now. Even this last one from v5.8 is specifically of the same type this series addresses: case 4: color_index = TrueCModeIndex; + break; default: return; } [2] https://www.kernel.org/doc/html/latest/process/deprecated.html#implicit-switch-case-fall-through All switch/case blocks must end in one of: break; fallthrough; continue; goto <label>; return [expression]; [3] https://cwe.mitre.org/data/definitions/484.html
On Wed, 25 Nov 2020, Nick Desaulniers wrote: > So developers and distributions using Clang can't have > -Wimplicit-fallthrough enabled because GCC is less strict (which has > been shown in this thread to lead to bugs)? We'd like to have nice > things too, you know. > Apparently the GCC developers don't want you to have "nice things" either. Do you think that the kernel should drop gcc in favour of clang? Or do you think that a codebase can somehow satisfy multiple checkers and their divergent interpretations of the language spec? > This is not a shiny new warning; it's already on for GCC and has existed > in both compilers for multiple releases. > Perhaps you're referring to the compiler feature that lead to the ill-fated, tree-wide /* fallthrough */ patch series. When the ink dries on the C23 language spec and the implementations figure out how to interpret it then sure, enforce the warning for new code -- the cost/benefit analysis is straight forward. However, the case for patching existing mature code is another story.
On Wed, Nov 25, 2020 at 8:24 AM Jakub Kicinski <kuba@kernel.org> wrote: > > Applying a real patch set and then getting a few follow ups the next day > for trivial coding things like fallthrough missing or static missing, > just because I didn't have the full range of compilers to check with > before applying makes me feel pretty shitty, like I'm not doing a good > job. YMMV. I understand. Everyone feels that way, except maybe Bond villains and robots. My advice in that case is don't take it personally. We're working with a language that's more error prone relative to others. While one would like to believe they are flawless, over time they can't beat the aggregate statistics. A balance between Imposter Syndrome and Dunning Kruger is walked by all software developers, and the fear of making mistakes in public is one of the number one reasons folks don't take the plunge contributing to open source software or even the kernel. My advice to them is "don't sweat the small stuff."
On Wed, Nov 25, 2020 at 1:33 PM Finn Thain <fthain@telegraphics.com.au> wrote: > > Or do you think that a codebase can somehow satisfy multiple checkers and > their divergent interpretations of the language spec? Have we found any cases yet that are divergent? I don't think so. It sounds to me like GCC's cases it warns for is a subset of Clang's. Having additional coverage with Clang then should ensure coverage for both. > > This is not a shiny new warning; it's already on for GCC and has existed > > in both compilers for multiple releases. > > > > Perhaps you're referring to the compiler feature that lead to the > ill-fated, tree-wide /* fallthrough */ patch series. > > When the ink dries on the C23 language spec and the implementations figure > out how to interpret it then sure, enforce the warning for new code -- the > cost/benefit analysis is straight forward. However, the case for patching > existing mature code is another story. I don't think we need to wait for the ink to dry on the C23 language spec to understand that implicit fallthrough is an obvious defect of the C language. While the kernel is a mature codebase, it's not immune to bugs. And its maturity has yet to slow its rapid pace of development.
On 25/11/2020 00:32, Miguel Ojeda wrote: > I have said *authoring* lines of *this* kind takes a minute per line. > Specifically: lines fixing the fallthrough warning mechanically and > repeatedly where the compiler tells you to, and doing so full-time for > a month. <snip> > It is useful since it makes intent clear. To make the intent clear, you have to first be certain that you understand the intent; otherwise by adding either a break or a fallthrough to suppress the warning you are just destroying the information that "the intent of this code is unknown". Figuring out the intent of a piece of unfamiliar code takes more than 1 minute; just because case foo: thing; case bar: break; produces identical code to case foo: thing; break; case bar: break; doesn't mean that *either* is correct — maybe the author meant to write case foo: return thing; case bar: break; and by inserting that break you've destroyed the marker that would direct someone who knew what the code was about to look at that point in the code and spot the problem. Thus, you *always* have to look at more than just the immediate mechanical context of the code, to make a proper judgement that yes, this was the intent. If you think that that sort of thing can be done in an *average* time of one minute, then I hope you stay away from code I'm responsible for! One minute would be an optimistic target for code that, as the maintainer, one is already somewhat familiar with. For code that you're seeing for the first time, as is usually the case with the people doing these mechanical fix-a-warning patches, it's completely unrealistic. A warning is only useful because it makes you *think* about the code. If you suppress the warning without doing that thinking, then you made the warning useless; and if the warning made you think about code that didn't *need* it, then the warning was useless from the start. So make your mind up: does Clang's stricter -Wimplicit-fallthrough flag up code that needs thought (in which case the fixes take effort both to author and to review) or does it flag up code that can be mindlessly "fixed" (in which case the warning is worthless)? Proponents in this thread seem to be trying to have it both ways. -ed
On 24/11/2020 21:25, Kees Cook wrote: > I still think this isn't right -- it's a case statement that runs off > the end without an explicit flow control determination. Proves too much — for instance case foo: case bar: thing; break; doesn't require a fallthrough; after case foo:, and afaik no-one is suggesting it should. Yet it, too, is "a case statement that runs off the end without an explicit flow control determination". -ed
On Wed, 25 Nov 2020, Nick Desaulniers wrote: > On Wed, Nov 25, 2020 at 1:33 PM Finn Thain <fthain@telegraphics.com.au> > wrote: > > > > Or do you think that a codebase can somehow satisfy multiple checkers > > and their divergent interpretations of the language spec? > > Have we found any cases yet that are divergent? I don't think so. There are many implementations, so I think you are guaranteed to find more divergence if you look. That's because the spec is full of language like this: "implementations are encouraged not to emit a diagnostic" and "implementations are encouraged to issue a diagnostic". Some implementations will decide to not emit (under the premise that vast amounts of existing code would have to get patched until the compiler goes quiet) whereas other implementations will decide to emit (under the premise that the author is doing the checking and not the janitor or the packager). > It sounds to me like GCC's cases it warns for is a subset of Clang's. > Having additional coverage with Clang then should ensure coverage for > both. > If that claim were true, the solution would be simple. (It's not.) For the benefit of projects that enable -Werror and projects that nominated gcc as their preferred compiler, clang would simply need a flag to enable conformance with gcc by suppressing those additional warnings that clang would normally produce. This simple solution is, of course, completely unworkable, since it would force clang to copy some portion of gcc's logic (rewritten under LLVM's unique license) and then to track future changes to that portion of gcc indefinitely.
On Wed, 25 Nov 2020, Nick Desaulniers wrote: > On Wed, Nov 25, 2020 at 1:33 PM Finn Thain <fthain@telegraphics.com.au> wrote: > > > > Or do you think that a codebase can somehow satisfy multiple checkers > > and their divergent interpretations of the language spec? > > Have we found any cases yet that are divergent? I don't think so. You mean, aside from -Wimplicit-fallthrough? I'm glad you asked. How about -Wincompatible-pointer-types and -Wframe-larger-than? All of the following files have been affected by divergent diagnostics produced by clang and gcc. arch/arm64/include/asm/neon-intrinsics.h arch/powerpc/xmon/Makefile drivers/gpu/drm/i915/Makefile drivers/gpu/drm/i915/i915_utils.h drivers/staging/media/atomisp/pci/atomisp_subdev.c fs/ext4/super.c include/trace/events/qla.h net/mac80211/rate.c tools/lib/string.c tools/perf/util/setup.py tools/scripts/Makefile.include And if I searched for 'smatch' or 'coverity' instead of 'clang' I'd probably find more divergence. Here are some of the relevant commits. 0738c8b5915c7eaf1e6007b441008e8f3b460443 9c87156cce5a63735d1218f0096a65c50a7a32aa babaab2f473817f173a2d08e410c25abf5ed0f6b 065e5e559555e2f100bc95792a8ef1b609bbe130 93f56de259376d7e4fff2b2d104082e1fa66e237 6c4798d3f08b81c2c52936b10e0fa872590c96ae b7a313d84e853049062011d78cb04b6decd12f5c 093b75ef5995ea35d7f6bdb6c7b32a42a1999813 And before you object, "but -Wconstant-logical-operand is a clang-only warning! it can't be divergent with gcc!", consider that the special cases added to deal with clang-only warnings have to be removed when gcc catches up, which is more churn. Now multiply that by the number of checkers you care about.
Hi Miguel, On Thu, Nov 26, 2020 at 3:54 PM Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> wrote: > On Wed, Nov 25, 2020 at 11:44 PM Edward Cree <ecree.xilinx@gmail.com> wrote: > > To make the intent clear, you have to first be certain that you > > understand the intent; otherwise by adding either a break or a > > fallthrough to suppress the warning you are just destroying the > > information that "the intent of this code is unknown". > > If you don't know what the intent of your own code is, then you > *already* have a problem in your hands. The maintainer is not necessarily the owner/author of the code, and thus may not know the intent of the code. > > or does it flag up code > > that can be mindlessly "fixed" (in which case the warning is > > worthless)? Proponents in this thread seem to be trying to > > have it both ways. > > A warning is not worthless just because you can mindlessly fix it. > There are many counterexamples, e.g. many > checkpatch/lint/lang-format/indentation warnings, functional ones like > the `if (a = b)` warning... BTW, you cannot mindlessly fix the latter, as you cannot know if "(a == b)" or "((a = b))" was intended, without understanding the code (and the (possibly unavailable) data sheet, and the hardware, ...). P.S. So far I've stayed out of this thread, as I like it if the compiler flags possible mistakes. After all I was the one fixing new "may be used uninitialized" warnings thrown up by gcc-4.1, until (a bit later than) support for that compiler was removed... Gr{oetje,eeting}s, Geert
On Thu, Nov 26, 2020 at 4:28 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote: > > The maintainer is not necessarily the owner/author of the code, and > thus may not know the intent of the code. Agreed, I was not blaming maintainers -- just trying to point out that the problem is there :-) In those cases, it is still very useful: we add the `fallthrough` and a comment saying `FIXME: fallthrough intended? Figure this out...`. Thus a previous unknown unknown is now a known unknown. And no new unknown unknowns will be introduced since we enabled the warning globally. > BTW, you cannot mindlessly fix the latter, as you cannot know if > "(a == b)" or "((a = b))" was intended, without understanding the code > (and the (possibly unavailable) data sheet, and the hardware, ...). That's right, I was referring to the cases where the compiler saves someone time from a typo they just made. Cheers, Miguel
Gustavo, > This series aims to fix almost all remaining fall-through warnings in > order to enable -Wimplicit-fallthrough for Clang. Applied 20-22,54,120-124 to 5.11/scsi-staging, thanks.
On Tue, Dec 01, 2020 at 12:52:27AM -0500, Martin K. Petersen wrote: > > Gustavo, > > > This series aims to fix almost all remaining fall-through warnings in > > order to enable -Wimplicit-fallthrough for Clang. > > Applied 20-22,54,120-124 to 5.11/scsi-staging, thanks. Awesome! :) Thanks, Martin. -- Gustavo
On Sun, Nov 22, 2020 at 08:17:03AM -0800, Kees Cook wrote: > On Fri, Nov 20, 2020 at 11:51:42AM -0800, Jakub Kicinski wrote: > > On Fri, 20 Nov 2020 11:30:40 -0800 Kees Cook wrote: > > > On Fri, Nov 20, 2020 at 10:53:44AM -0800, Jakub Kicinski wrote: > > > > On Fri, 20 Nov 2020 12:21:39 -0600 Gustavo A. R. Silva wrote: > > > > > This series aims to fix almost all remaining fall-through warnings in > > > > > order to enable -Wimplicit-fallthrough for Clang. > > > > > > > > > > In preparation to enable -Wimplicit-fallthrough for Clang, explicitly > > > > > add multiple break/goto/return/fallthrough statements instead of just > > > > > letting the code fall through to the next case. > > > > > > > > > > Notice that in order to enable -Wimplicit-fallthrough for Clang, this > > > > > change[1] is meant to be reverted at some point. So, this patch helps > > > > > to move in that direction. > > > > > > > > > > Something important to mention is that there is currently a discrepancy > > > > > between GCC and Clang when dealing with switch fall-through to empty case > > > > > statements or to cases that only contain a break/continue/return > > > > > statement[2][3][4]. > > > > > > > > Are we sure we want to make this change? Was it discussed before? > > > > > > > > Are there any bugs Clangs puritanical definition of fallthrough helped > > > > find? > > > > > > > > IMVHO compiler warnings are supposed to warn about issues that could > > > > be bugs. Falling through to default: break; can hardly be a bug?! > > > > > > It's certainly a place where the intent is not always clear. I think > > > this makes all the cases unambiguous, and doesn't impact the machine > > > code, since the compiler will happily optimize away any behavioral > > > redundancy. > > > > If none of the 140 patches here fix a real bug, and there is no change > > to machine code then it sounds to me like a W=2 kind of a warning. > > FWIW, this series has found at least one bug so far: > https://lore.kernel.org/lkml/CAFCwf11izHF=g1mGry1fE5kvFFFrxzhPSM6qKAO8gxSp=Kr_CQ@mail.gmail.com/ This is a fallthrough to a return and not to a break. That should trigger a warning. The fallthrough to a break should not generate a warning. The bug we're trying to fix is "missing break statement" but if the result of the bug is "we hit a break statement" then now we're just talking about style. GCC should limit itself to warning about potentially buggy code. regards, dan carpenter
On Fri, 20 Nov 2020 12:21:39 -0600, Gustavo A. R. Silva wrote: > This series aims to fix almost all remaining fall-through warnings in > order to enable -Wimplicit-fallthrough for Clang. > > In preparation to enable -Wimplicit-fallthrough for Clang, explicitly > add multiple break/goto/return/fallthrough statements instead of just > letting the code fall through to the next case. > > [...] Applied to 5.11/scsi-queue, thanks! [054/141] target: Fix fall-through warnings for Clang https://git.kernel.org/mkp/scsi/c/492096ecfa39