Message ID | 1519790211-16582-1-git-send-email-alex.shi@linaro.org |
---|---|
Headers | show |
Series | arm meltdown fix backporting review for lts 4.9 | expand |
On 02/28/2018 11:56 AM, Alex Shi wrote: > The patchset also on repository: > git://git.linaro.org/kernel/linux-linaro-stable.git lts-4.9-spectrevv2 Sorry, the correct branch address is here: https://git.linaro.org/kernel/speculation-fixes-staging.git v4.9-meltdown Thanks Alex
On Fri, 02 Mar 2018 09:14:50 +0000, Alex Shi wrote: > > > > On 03/01/2018 11:24 PM, Greg KH wrote: > > On Wed, Feb 28, 2018 at 11:56:22AM +0800, Alex Shi wrote: > >> Hi All, > >> > >> This backport patchset fixed the meltdown issue, it's original branch: > >> https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=kpti > >> A few dependency or fixingpatches are also picked up, if they are necessary > >> and no functional changes. > >> > >> The patchset also on repository: > >> git://git.linaro.org/kernel/linux-linaro-stable.git lts-4.9-spectrevv2 > >> > >> No bug found yet from kernelci.org and lkft testing. > > > > No bugs is good, but does it actually fix the meltdown problem? What > > did you test it on? > > Oh, I have no A73/A75 cpu, so I can not reproduce meltdown bug. Cortex-A73 is not affected by Meltdown. Only A75 is. Please don't spread misinformation. They are both affected by Spectre though. M. -- Jazz is not dead, it just smell funny.
On Tue, Mar 06, 2018 at 02:26:34PM +0000, Mark Brown wrote: > On Mon, Mar 05, 2018 at 02:08:59PM +0100, Greg KH wrote: > > > I know there is lots more than Android to ARM, but the huge majority by > > quantity is Android. > > > What I'm saying here is look at all of the backports that were required > > to get this working in the android tree. It was non-trivial by a long > > shot, and based on that work, this series feels really "small" and I'm > > really worried that it's not really working or solving the problem here. > > Unfortunately what's been coming over was just the bit about using > android-common, not the bit about why you're worried about the code. :( Sorry, it's been a long few months, my ability to communicate well about this topic is tough at times without assuming everyone else has been dealing with it for as long as some of us have. > > There are major features that were backported to the android trees for > > ARM that the upstream features for Spectre and Meltdown built on top of > > to get their solution. To not backport all of that is a huge risk, > > right? > > I'm not far enough into the details to comment on the specifics here; > there's other people in the CCs who are. Let's let people look at the > code and see if they think some of the fixes are useful in LTS. The > Android tree does have things beyond what's in LTS and there's been more > time for analysis since the changes were made there. I suggest looking at the backports in the android-common tree that are needed for this "feature" to work properly, and pull them out and test them if you really want it in your Linaro trees. If you think some of them should be added to the LTS kernels, I'll be glad to consider them, but don't do a hack to try to work around the lack of these features, otherwise you will not be happy in the long-run. Again, look at the mess we have for x86 in 4.4.y and 4.9.y. You do not want that for ARM for the simple reason that ARM systems usually last "longer" with those old kernels than the x86 systems do. > > So that's why I keep pointing people at the android trees. Look at what > > they did there. There's nothing stoping anyone who is really insistant > > on staying on these old kernel versions from pulling from those branches > > to get these bugfixes in a known stable, and tested, implementation. > > I think there's enough stuff going on in the Android tree to make that > unpalatable for a good segment of users. Really? Like what? Last I looked it's only about 300 or so patches. Something like less than .5% of the normal SoC backport size for any ARM system recently. There were some numbers published a few months ago about the real count, I can dig them up if you are curious. > > Or just move to 4.14.y. Seriously, that's probably the safest thing in > > the long run for anyone here. And when you realize you can't do that, > > go yell at your SoC for forcing you into the nightmare that they conned > > you into by their 3+ million lines added to their kernel tree. You were > > always living on borowed time, and it looks like that time is finally > > up... > > Yes, there are some people who are stuck with enormous out of tree patch > sets on most architectures (just look at the enterprise distros!) - but > there are also people who are at or very close to vanilla and just > trying to control their validation costs by not changing too much when > they don't need to. Great, then move to 4.14.y :) And before someone says "but it takes more to validate a new kernel version than it does to just validate a core backport for the architecture code", well... > There's a good discussion to be had about it being sensible for people > to accept more change in that segment of the market but equally those > same attitudes have been an important part of the pressure that's been > placed on vendors long term to get things in mainline. > > > [1] It's also why I keep doing the LTS merges into the android-common > > trees within days of the upstream LTS release (today being an > > exception). That way once you do a pull/merge, you can just keep > > always merging to keep a secure device that is always up to date > > with the latest LTS releases in a simple way. How much easier can I > > make it for the ARM ecosystem here, really? > > That's great for the Android ecosystem, it's fantastic work and is doing > a lot to overcome resistances people had there to merging up the LTS > which is going to help many people. While that's a very large part of > ARM ecosystem it's not all of it, there are also chip vendors and system > integrators who have made deliberate choices to minimize out of tree > code just as we've been encouraging them to. Again great, go use 4.14.y for those systems please. It's better in the long run. thanks, greg k-h
On 2 March 2018 at 16:54, Greg KH <greg@kroah.com> wrote: > On Fri, Mar 02, 2018 at 05:14:50PM +0800, Alex Shi wrote: >> >> >> On 03/01/2018 11:24 PM, Greg KH wrote: >> > On Wed, Feb 28, 2018 at 11:56:22AM +0800, Alex Shi wrote: >> >> Hi All, >> >> >> >> This backport patchset fixed the meltdown issue, it's original branch: >> >> https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=kpti >> >> A few dependency or fixingpatches are also picked up, if they are necessary >> >> and no functional changes. >> >> >> >> The patchset also on repository: >> >> git://git.linaro.org/kernel/linux-linaro-stable.git lts-4.9-spectrevv2 >> >> >> >> No bug found yet from kernelci.org and lkft testing. >> > >> > No bugs is good, but does it actually fix the meltdown problem? What >> > did you test it on? >> >> Oh, I have no A73/A75 cpu, so I can not reproduce meltdown bug. > > Then why should I trust this backport at all? > > Please test on the hardware that is affected, otherwise you do not know > if your patches do anything or not. > I don't think it is feasible to test these backports by confirming that they make the fundamental issue go away. We simply don't have the code to reproduce all the variants, and we have to rely on the information provided by ARM Ltd. regarding which cores are affected and which aren't. What we can do (and what I did for the v4.14 backport) is ensure that the mitigations take effect when they are expected to, i.e., confirm that the trampoline vector table and page tables are being used (which can be done using the exploit code for variant 3a btw), and to check that the branch predictor maintenance code is called as expected. For variant 1, we just have to have faith ... Note that I haven't done so for *this* backport, and I currently don't have any time to spend on this.
On Tue, Mar 13, 2018 at 10:13:26AM +0000, Ard Biesheuvel wrote: > On 13 March 2018 at 10:04, Greg KH <greg@kroah.com> wrote: > > On Wed, Mar 07, 2018 at 06:24:09PM +0000, Ard Biesheuvel wrote: > >> On 2 March 2018 at 16:54, Greg KH <greg@kroah.com> wrote: > >> > On Fri, Mar 02, 2018 at 05:14:50PM +0800, Alex Shi wrote: > >> >> > >> >> > >> >> On 03/01/2018 11:24 PM, Greg KH wrote: > >> >> > On Wed, Feb 28, 2018 at 11:56:22AM +0800, Alex Shi wrote: > >> >> >> Hi All, > >> >> >> > >> >> >> This backport patchset fixed the meltdown issue, it's original branch: > >> >> >> https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=kpti > >> >> >> A few dependency or fixingpatches are also picked up, if they are necessary > >> >> >> and no functional changes. > >> >> >> > >> >> >> The patchset also on repository: > >> >> >> git://git.linaro.org/kernel/linux-linaro-stable.git lts-4.9-spectrevv2 > >> >> >> > >> >> >> No bug found yet from kernelci.org and lkft testing. > >> >> > > >> >> > No bugs is good, but does it actually fix the meltdown problem? What > >> >> > did you test it on? > >> >> > >> >> Oh, I have no A73/A75 cpu, so I can not reproduce meltdown bug. > >> > > >> > Then why should I trust this backport at all? > >> > > >> > Please test on the hardware that is affected, otherwise you do not know > >> > if your patches do anything or not. > >> > > >> > >> I don't think it is feasible to test these backports by confirming > >> that they make the fundamental issue go away. We simply don't have the > >> code to reproduce all the variants, and we have to rely on the > >> information provided by ARM Ltd. regarding which cores are affected > >> and which aren't. > > > > You really don't have the reproducers? Please work with ARM to resolve > > that, this should not be a non-tested set of patches. That's really > > worse than no patches at all, as if they were applied, that would > > provide a false-sense of "all is fixed". > > > > I know that on x86, the line between architecture and platform is > blurry. That is not the case on ARM, though. > > Unlike platform firmware, the OS is built on top of an abstracted > platform which is described by ARM's Architecture Reference Manual. If > ARM Ltd. issues recommendations regarding what firmware PSCI methods > to call when doing a context switch, or which barrier instruction to > issue in certain circumstances, they do so because a certain class of > hardware may require it in some cases. It is really not up to me to go > find some exploit code on GitHub, run it before and after applying the > patch and conclude that the problem is fixed. Instead, what I should > do is confirm that the changes result in the recommended actions to be > taken at the appropriate times. To _not_ take that exploit code and run it to _verify_ that your patches work, would be foolish, right? I can't believe we are having the argument of "Test that your patches actually work"... Ugh, these series are all now dropped from my patch queue until you all get your act together and get someone to verify the changes actually work. greg k-h
On 13 March 2018 at 10:38, Greg KH <greg@kroah.com> wrote: > On Tue, Mar 13, 2018 at 10:13:26AM +0000, Ard Biesheuvel wrote: >> On 13 March 2018 at 10:04, Greg KH <greg@kroah.com> wrote: >> > On Wed, Mar 07, 2018 at 06:24:09PM +0000, Ard Biesheuvel wrote: >> >> On 2 March 2018 at 16:54, Greg KH <greg@kroah.com> wrote: ... >> >> > Please test on the hardware that is affected, otherwise you do not know >> >> > if your patches do anything or not. >> >> > >> >> >> >> I don't think it is feasible to test these backports by confirming >> >> that they make the fundamental issue go away. We simply don't have the >> >> code to reproduce all the variants, and we have to rely on the >> >> information provided by ARM Ltd. regarding which cores are affected >> >> and which aren't. >> > >> > You really don't have the reproducers? Please work with ARM to resolve >> > that, this should not be a non-tested set of patches. That's really >> > worse than no patches at all, as if they were applied, that would >> > provide a false-sense of "all is fixed". >> > >> >> I know that on x86, the line between architecture and platform is >> blurry. That is not the case on ARM, though. >> >> Unlike platform firmware, the OS is built on top of an abstracted >> platform which is described by ARM's Architecture Reference Manual. If >> ARM Ltd. issues recommendations regarding what firmware PSCI methods >> to call when doing a context switch, or which barrier instruction to >> issue in certain circumstances, they do so because a certain class of >> hardware may require it in some cases. It is really not up to me to go >> find some exploit code on GitHub, run it before and after applying the >> patch and conclude that the problem is fixed. Instead, what I should >> do is confirm that the changes result in the recommended actions to be >> taken at the appropriate times. > > To _not_ take that exploit code and run it to _verify_ that your patches > work, would be foolish, right? > Oh, absolutely. But that presupposes access to both the affected hardware and the exploit code. > I can't believe we are having the argument of "Test that your patches > actually work"... > > Ugh, these series are all now dropped from my patch queue until you all > get your act together and get someone to verify the changes actually > work. > Fair enough. If anyone needs these patches for their systems, they can respond with a Tested-by:
On Tue, Mar 13, 2018 at 01:01:43PM +0000, Ard Biesheuvel wrote: > On 13 March 2018 at 10:38, Greg KH <greg@kroah.com> wrote: > > On Tue, Mar 13, 2018 at 10:13:26AM +0000, Ard Biesheuvel wrote: > >> On 13 March 2018 at 10:04, Greg KH <greg@kroah.com> wrote: > >> > On Wed, Mar 07, 2018 at 06:24:09PM +0000, Ard Biesheuvel wrote: > >> >> On 2 March 2018 at 16:54, Greg KH <greg@kroah.com> wrote: > ... > >> >> > Please test on the hardware that is affected, otherwise you do not know > >> >> > if your patches do anything or not. > >> >> > > >> >> > >> >> I don't think it is feasible to test these backports by confirming > >> >> that they make the fundamental issue go away. We simply don't have the > >> >> code to reproduce all the variants, and we have to rely on the > >> >> information provided by ARM Ltd. regarding which cores are affected > >> >> and which aren't. > >> > > >> > You really don't have the reproducers? Please work with ARM to resolve > >> > that, this should not be a non-tested set of patches. That's really > >> > worse than no patches at all, as if they were applied, that would > >> > provide a false-sense of "all is fixed". > >> > > >> > >> I know that on x86, the line between architecture and platform is > >> blurry. That is not the case on ARM, though. > >> > >> Unlike platform firmware, the OS is built on top of an abstracted > >> platform which is described by ARM's Architecture Reference Manual. If > >> ARM Ltd. issues recommendations regarding what firmware PSCI methods > >> to call when doing a context switch, or which barrier instruction to > >> issue in certain circumstances, they do so because a certain class of > >> hardware may require it in some cases. It is really not up to me to go > >> find some exploit code on GitHub, run it before and after applying the > >> patch and conclude that the problem is fixed. Instead, what I should > >> do is confirm that the changes result in the recommended actions to be > >> taken at the appropriate times. > > > > To _not_ take that exploit code and run it to _verify_ that your patches > > work, would be foolish, right? > > > > Oh, absolutely. But that presupposes access to both the affected > hardware and the exploit code. If you all don't have access to both, then someone is doing something seriously wrong. Go complain to ARM please, we all know they have both. I just got done yelling at a whole bunch of vendors last week about this whole mess at a very large meeting of a lot of different Linux-based companies. It's crazy that the disfunction is still happening. greg k-h