Message ID | 1478647728-30357-1-git-send-email-john.stultz@linaro.org |
---|---|
State | New |
Headers | show |
On Tue, Nov 8, 2016 at 3:28 PM, John Stultz <john.stultz@linaro.org> wrote: > This patch adds logic to allows a process to migrate other tasks > between cgroups if they have CAP_SYS_RESOURCE. > > In Android (where this feature originated), the ActivityManager tracks > various application states (TOP_APP, FOREGROUND, BACKGROUND, SYSTEM, > etc), and then as applications change states, the SchedPolicy logic > will migrate the application tasks between different cgroups used > to control the different application states (for example, there is a > background cpuset cgroup which can limit background tasks to stay > on one low-power cpu, and the bg_non_interactive cpuctrl cgroup can > then further limit those background tasks to a small percentage of > that one cpu's cpu time). > > However, for security reasons, Android doesn't want to make the > system_server (the process that runs the ActivityManager and > SchedPolicy logic), run as root. So in the Android common.git > kernel, they have some logic to allow cgroups to loosen their > permissions so CAP_SYS_NICE tasks can migrate other tasks between > cgroups. > > I feel the approach taken there overloads CAP_SYS_NICE a bit much > for non-android environments. > > So this patch, as suggested by Michael Kerrisk, simply adds a > check for CAP_SYS_RESOURCE. > > I've tested this with AOSP master, and this seems to work well > as Zygote and system_server already use CAP_SYS_RESOURCE. I've > also submitted patches against the android-4.4 kernel to change > it to use CAP_SYS_RESOURCE, and the Android developers just merged > it. > > Cc: Tejun Heo <tj@kernel.org> > Cc: Li Zefan <lizefan@huawei.com> > Cc: Jonathan Corbet <corbet@lwn.net> > Cc: cgroups@vger.kernel.org > Cc: Android Kernel Team <kernel-team@android.com> > Cc: Rom Lemarchand <romlem@android.com> > Cc: Colin Cross <ccross@android.com> > Cc: Dmitry Shmidt <dimitrysh@google.com> > Cc: Todd Kjos <tkjos@google.com> > Cc: Christian Poetzsch <christian.potzsch@imgtec.com> > Cc: Amit Pundir <amit.pundir@linaro.org> > Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> > Cc: Kees Cook <keescook@chromium.org> > Cc: Serge E. Hallyn <serge@hallyn.com> > Cc: linux-api@vger.kernel.org > Acked-by: Serge Hallyn <serge@hallyn.com> > Signed-off-by: John Stultz <john.stultz@linaro.org> > --- > v2: Renamed to just CAP_CGROUP_MIGRATE as recommended by Tejun > v3: Switched to just using CAP_SYS_RESOURCE as suggested by Michael > v4: Send out properly folded down version of the patch. :P > --- > kernel/cgroup.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/kernel/cgroup.c b/kernel/cgroup.c > index 85bc9be..866059a 100644 > --- a/kernel/cgroup.c > +++ b/kernel/cgroup.c > @@ -2856,7 +2856,8 @@ static int cgroup_procs_write_permission(struct task_struct *task, > */ > if (!uid_eq(cred->euid, GLOBAL_ROOT_UID) && > !uid_eq(cred->euid, tcred->uid) && > - !uid_eq(cred->euid, tcred->suid)) > + !uid_eq(cred->euid, tcred->suid) && > + !ns_capable(tcred->user_ns, CAP_SYS_RESOURCE)) > ret = -EACCES; > > if (!ret && cgroup_on_dfl(dst_cgrp)) { > -- > 2.7.4 > Reviewed-by: Kees Cook <keescook@chromium.org> -Kees -- Kees Cook Nexus Security
On Tue, Nov 08, 2016 at 03:51:40PM -0800, Andy Lutomirski wrote: > On Tue, Nov 8, 2016 at 3:28 PM, John Stultz <john.stultz@linaro.org> wrote: > > This patch adds logic to allows a process to migrate other tasks > > between cgroups if they have CAP_SYS_RESOURCE. > > > > In Android (where this feature originated), the ActivityManager tracks > > various application states (TOP_APP, FOREGROUND, BACKGROUND, SYSTEM, > > etc), and then as applications change states, the SchedPolicy logic > > will migrate the application tasks between different cgroups used > > to control the different application states (for example, there is a > > background cpuset cgroup which can limit background tasks to stay > > on one low-power cpu, and the bg_non_interactive cpuctrl cgroup can > > then further limit those background tasks to a small percentage of > > that one cpu's cpu time). > > > > However, for security reasons, Android doesn't want to make the > > system_server (the process that runs the ActivityManager and > > SchedPolicy logic), run as root. So in the Android common.git > > kernel, they have some logic to allow cgroups to loosen their > > permissions so CAP_SYS_NICE tasks can migrate other tasks between > > cgroups. > > > > I feel the approach taken there overloads CAP_SYS_NICE a bit much > > for non-android environments. > > > > So this patch, as suggested by Michael Kerrisk, simply adds a > > check for CAP_SYS_RESOURCE. > > > > I've tested this with AOSP master, and this seems to work well > > as Zygote and system_server already use CAP_SYS_RESOURCE. I've > > also submitted patches against the android-4.4 kernel to change > > it to use CAP_SYS_RESOURCE, and the Android developers just merged > > it. > > > > I hate to say it, but I think I may see a problem. Current > developments are afoot to make cgroups do more than resource control. > For example, there's Landlock and there's Daniel's ingress/egress > filter thing. Current cgroup controllers can mostly just DoS their > controlled processes. These new controllers (or controller-like > things) can exfiltrate data and change semantics. > > Does anyone have a security model in mind for these controllers and > the cgroups that they're attached to? I'm reasonably confident that > CAP_SYS_RESOURCE is not the answer... and specifically the answer is... ? Also would be great if you start with specifying the question first and the problem you're trying to solve.
On Tue, Nov 8, 2016 at 4:03 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > On Tue, Nov 08, 2016 at 03:51:40PM -0800, Andy Lutomirski wrote: >> On Tue, Nov 8, 2016 at 3:28 PM, John Stultz <john.stultz@linaro.org> wrote: >> > This patch adds logic to allows a process to migrate other tasks >> > between cgroups if they have CAP_SYS_RESOURCE. >> > >> > In Android (where this feature originated), the ActivityManager tracks >> > various application states (TOP_APP, FOREGROUND, BACKGROUND, SYSTEM, >> > etc), and then as applications change states, the SchedPolicy logic >> > will migrate the application tasks between different cgroups used >> > to control the different application states (for example, there is a >> > background cpuset cgroup which can limit background tasks to stay >> > on one low-power cpu, and the bg_non_interactive cpuctrl cgroup can >> > then further limit those background tasks to a small percentage of >> > that one cpu's cpu time). >> > >> > However, for security reasons, Android doesn't want to make the >> > system_server (the process that runs the ActivityManager and >> > SchedPolicy logic), run as root. So in the Android common.git >> > kernel, they have some logic to allow cgroups to loosen their >> > permissions so CAP_SYS_NICE tasks can migrate other tasks between >> > cgroups. >> > >> > I feel the approach taken there overloads CAP_SYS_NICE a bit much >> > for non-android environments. >> > >> > So this patch, as suggested by Michael Kerrisk, simply adds a >> > check for CAP_SYS_RESOURCE. >> > >> > I've tested this with AOSP master, and this seems to work well >> > as Zygote and system_server already use CAP_SYS_RESOURCE. I've >> > also submitted patches against the android-4.4 kernel to change >> > it to use CAP_SYS_RESOURCE, and the Android developers just merged >> > it. >> > >> >> I hate to say it, but I think I may see a problem. Current >> developments are afoot to make cgroups do more than resource control. >> For example, there's Landlock and there's Daniel's ingress/egress >> filter thing. Current cgroup controllers can mostly just DoS their >> controlled processes. These new controllers (or controller-like >> things) can exfiltrate data and change semantics. >> >> Does anyone have a security model in mind for these controllers and >> the cgroups that they're attached to? I'm reasonably confident that >> CAP_SYS_RESOURCE is not the answer... > > and specifically the answer is... ? > Also would be great if you start with specifying the question first > and the problem you're trying to solve. > I don't have a good answer right now. Here are some constraints, though: 1. An insufficiently privileged process should not be able to move a victim into a dangerous cgroup. 2. An insufficiently privileged process should not be able to move itself into a dangerous cgroup and then use execve to gain privilege such that the execve'd program can be compromised. 3. An insufficiently privileged process should not be able to make an existing cgroup dangerous in a way that could compromise a victim in that cgroup. 4. An insufficiently privileged process should not be able to make a cgroup dangerous in a way that bypasses protections that would otherwise protect execve() as used by itself or some other process in that cgroup. Keep in mind that "dangerous" may apply to a cgroup's descendents in addition to the cgroup being controlled.
On Tue, Nov 8, 2016 at 4:12 PM, Andy Lutomirski <luto@amacapital.net> wrote: > On Tue, Nov 8, 2016 at 4:03 PM, Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: >> On Tue, Nov 08, 2016 at 03:51:40PM -0800, Andy Lutomirski wrote: >>> >>> I hate to say it, but I think I may see a problem. Current >>> developments are afoot to make cgroups do more than resource control. >>> For example, there's Landlock and there's Daniel's ingress/egress >>> filter thing. Current cgroup controllers can mostly just DoS their >>> controlled processes. These new controllers (or controller-like >>> things) can exfiltrate data and change semantics. >>> >>> Does anyone have a security model in mind for these controllers and >>> the cgroups that they're attached to? I'm reasonably confident that >>> CAP_SYS_RESOURCE is not the answer... >> >> and specifically the answer is... ? >> Also would be great if you start with specifying the question first >> and the problem you're trying to solve. >> > > I don't have a good answer right now. Here are some constraints, though: > > 1. An insufficiently privileged process should not be able to move a > victim into a dangerous cgroup. > > 2. An insufficiently privileged process should not be able to move > itself into a dangerous cgroup and then use execve to gain privilege > such that the execve'd program can be compromised. > > 3. An insufficiently privileged process should not be able to make an > existing cgroup dangerous in a way that could compromise a victim in > that cgroup. > > 4. An insufficiently privileged process should not be able to make a > cgroup dangerous in a way that bypasses protections that would > otherwise protect execve() as used by itself or some other process in > that cgroup. > > Keep in mind that "dangerous" may apply to a cgroup's descendents in > addition to the cgroup being controlled. Sorry for taking awhile to get back to you here. I'm a little befuddled as to what next steps I should consider (and honestly, I'm not totally sure I really grok your concern here, particularly what you mean with "dangrous cgroups"). So is going back to the CAP_CGROUP_MIGRATE approach (to properly separate "sufficiently" from "insufficiently privileged") better? Or something closer to the original method Android used of each cgroup having an allow_attach() check which could determine what is sufficiently privledged for the respective level of danger the cgroup might poise? Or just stepping back, what method would you imagine to be reasonable to allow a specified task to migrate other tasks between cgroups without it having to be root/suid? thanks -john
On Tue, Nov 22, 2016 at 4:57 PM, John Stultz <john.stultz@linaro.org> wrote: > On Tue, Nov 8, 2016 at 4:12 PM, Andy Lutomirski <luto@amacapital.net> wrote: >> On Tue, Nov 8, 2016 at 4:03 PM, Alexei Starovoitov >> <alexei.starovoitov@gmail.com> wrote: >>> On Tue, Nov 08, 2016 at 03:51:40PM -0800, Andy Lutomirski wrote: >>>> >>>> I hate to say it, but I think I may see a problem. Current >>>> developments are afoot to make cgroups do more than resource control. >>>> For example, there's Landlock and there's Daniel's ingress/egress >>>> filter thing. Current cgroup controllers can mostly just DoS their >>>> controlled processes. These new controllers (or controller-like >>>> things) can exfiltrate data and change semantics. >>>> >>>> Does anyone have a security model in mind for these controllers and >>>> the cgroups that they're attached to? I'm reasonably confident that >>>> CAP_SYS_RESOURCE is not the answer... >>> >>> and specifically the answer is... ? >>> Also would be great if you start with specifying the question first >>> and the problem you're trying to solve. >>> >> >> I don't have a good answer right now. Here are some constraints, though: >> >> 1. An insufficiently privileged process should not be able to move a >> victim into a dangerous cgroup. >> >> 2. An insufficiently privileged process should not be able to move >> itself into a dangerous cgroup and then use execve to gain privilege >> such that the execve'd program can be compromised. >> >> 3. An insufficiently privileged process should not be able to make an >> existing cgroup dangerous in a way that could compromise a victim in >> that cgroup. >> >> 4. An insufficiently privileged process should not be able to make a >> cgroup dangerous in a way that bypasses protections that would >> otherwise protect execve() as used by itself or some other process in >> that cgroup. >> >> Keep in mind that "dangerous" may apply to a cgroup's descendents in >> addition to the cgroup being controlled. > > Sorry for taking awhile to get back to you here. I'm a little > befuddled as to what next steps I should consider (and honestly, I'm > not totally sure I really grok your concern here, particularly what > you mean with "dangrous cgroups"). > > So is going back to the CAP_CGROUP_MIGRATE approach (to properly > separate "sufficiently" from "insufficiently privileged") better? > > Or something closer to the original method Android used of each cgroup > having an allow_attach() check which could determine what is > sufficiently privledged for the respective level of danger the cgroup > might poise? > > Or just stepping back, what method would you imagine to be reasonable > to allow a specified task to migrate other tasks between cgroups > without it having to be root/suid? Any suggested feedback here? thanks -john
On Mon, Dec 5, 2016 at 4:28 PM, John Stultz <john.stultz@linaro.org> wrote: > On Tue, Nov 22, 2016 at 4:57 PM, John Stultz <john.stultz@linaro.org> wrote: >> On Tue, Nov 8, 2016 at 4:12 PM, Andy Lutomirski <luto@amacapital.net> wrote: >>> On Tue, Nov 8, 2016 at 4:03 PM, Alexei Starovoitov >>> <alexei.starovoitov@gmail.com> wrote: >>>> On Tue, Nov 08, 2016 at 03:51:40PM -0800, Andy Lutomirski wrote: >>>>> >>>>> I hate to say it, but I think I may see a problem. Current >>>>> developments are afoot to make cgroups do more than resource control. >>>>> For example, there's Landlock and there's Daniel's ingress/egress >>>>> filter thing. Current cgroup controllers can mostly just DoS their >>>>> controlled processes. These new controllers (or controller-like >>>>> things) can exfiltrate data and change semantics. >>>>> >>>>> Does anyone have a security model in mind for these controllers and >>>>> the cgroups that they're attached to? I'm reasonably confident that >>>>> CAP_SYS_RESOURCE is not the answer... >>>> >>>> and specifically the answer is... ? >>>> Also would be great if you start with specifying the question first >>>> and the problem you're trying to solve. >>>> >>> >>> I don't have a good answer right now. Here are some constraints, though: >>> >>> 1. An insufficiently privileged process should not be able to move a >>> victim into a dangerous cgroup. >>> >>> 2. An insufficiently privileged process should not be able to move >>> itself into a dangerous cgroup and then use execve to gain privilege >>> such that the execve'd program can be compromised. >>> >>> 3. An insufficiently privileged process should not be able to make an >>> existing cgroup dangerous in a way that could compromise a victim in >>> that cgroup. >>> >>> 4. An insufficiently privileged process should not be able to make a >>> cgroup dangerous in a way that bypasses protections that would >>> otherwise protect execve() as used by itself or some other process in >>> that cgroup. >>> >>> Keep in mind that "dangerous" may apply to a cgroup's descendents in >>> addition to the cgroup being controlled. >> >> Sorry for taking awhile to get back to you here. I'm a little >> befuddled as to what next steps I should consider (and honestly, I'm >> not totally sure I really grok your concern here, particularly what >> you mean with "dangrous cgroups"). >> >> So is going back to the CAP_CGROUP_MIGRATE approach (to properly >> separate "sufficiently" from "insufficiently privileged") better? >> >> Or something closer to the original method Android used of each cgroup >> having an allow_attach() check which could determine what is >> sufficiently privledged for the respective level of danger the cgroup >> might poise? >> >> Or just stepping back, what method would you imagine to be reasonable >> to allow a specified task to migrate other tasks between cgroups >> without it having to be root/suid? > > Any suggested feedback here? I really don't know. The cgroupfs interface is a bit unfortunate in that it doesn't really express the constraints. To safely migrate a task, ISTM you ought to have some form of privilege over the task *and* some form of privilege over the cgroup. cgroupfs only handles the latter. CAP_CGROUP_MIGRATE ought to be okay. Or maybe cgroupfs needs to gain a concept of "dangerous" cgroups and further restrict them and CAP_SYS_RESOURCE should be fine for non-dangerous cgroups? I think I favor the latter, but it might be nice to hear from Tejun first. --Andy
On Mon, Dec 05, 2016 at 04:36:51PM -0800, Andy Lutomirski wrote: > On Mon, Dec 5, 2016 at 4:28 PM, John Stultz <john.stultz@linaro.org> wrote: > > On Tue, Nov 22, 2016 at 4:57 PM, John Stultz <john.stultz@linaro.org> wrote: > >> On Tue, Nov 8, 2016 at 4:12 PM, Andy Lutomirski <luto@amacapital.net> wrote: > >>> On Tue, Nov 8, 2016 at 4:03 PM, Alexei Starovoitov > >>> <alexei.starovoitov@gmail.com> wrote: > >>>> On Tue, Nov 08, 2016 at 03:51:40PM -0800, Andy Lutomirski wrote: > >>>>> > >>>>> I hate to say it, but I think I may see a problem. Current > >>>>> developments are afoot to make cgroups do more than resource control. > >>>>> For example, there's Landlock and there's Daniel's ingress/egress > >>>>> filter thing. Current cgroup controllers can mostly just DoS their > >>>>> controlled processes. These new controllers (or controller-like > >>>>> things) can exfiltrate data and change semantics. > >>>>> > >>>>> Does anyone have a security model in mind for these controllers and > >>>>> the cgroups that they're attached to? I'm reasonably confident that > >>>>> CAP_SYS_RESOURCE is not the answer... > >>>> > >>>> and specifically the answer is... ? > >>>> Also would be great if you start with specifying the question first > >>>> and the problem you're trying to solve. > >>>> > >>> > >>> I don't have a good answer right now. Here are some constraints, though: > >>> > >>> 1. An insufficiently privileged process should not be able to move a > >>> victim into a dangerous cgroup. > >>> > >>> 2. An insufficiently privileged process should not be able to move > >>> itself into a dangerous cgroup and then use execve to gain privilege > >>> such that the execve'd program can be compromised. > >>> > >>> 3. An insufficiently privileged process should not be able to make an > >>> existing cgroup dangerous in a way that could compromise a victim in > >>> that cgroup. > >>> > >>> 4. An insufficiently privileged process should not be able to make a > >>> cgroup dangerous in a way that bypasses protections that would > >>> otherwise protect execve() as used by itself or some other process in > >>> that cgroup. > >>> > >>> Keep in mind that "dangerous" may apply to a cgroup's descendents in > >>> addition to the cgroup being controlled. > >> > >> Sorry for taking awhile to get back to you here. I'm a little > >> befuddled as to what next steps I should consider (and honestly, I'm > >> not totally sure I really grok your concern here, particularly what > >> you mean with "dangrous cgroups"). > >> > >> So is going back to the CAP_CGROUP_MIGRATE approach (to properly > >> separate "sufficiently" from "insufficiently privileged") better? > >> > >> Or something closer to the original method Android used of each cgroup > >> having an allow_attach() check which could determine what is > >> sufficiently privledged for the respective level of danger the cgroup > >> might poise? > >> > >> Or just stepping back, what method would you imagine to be reasonable > >> to allow a specified task to migrate other tasks between cgroups > >> without it having to be root/suid? > > > > Any suggested feedback here? > > I really don't know. The cgroupfs interface is a bit unfortunate in > that it doesn't really express the constraints. To safely migrate a > task, ISTM you ought to have some form of privilege over the task > *and* some form of privilege over the cgroup. Agreed. The problem is that the privilege required should depend on the controller (I guess). For memory and cpuset, CAP_SYS_NICE seems right. Perhaps CAP_SYS_RESOURCE would be needed for some.. but then, as I look through the lists (capabilities(7) and the list of controllers), it seems like CAP_SYS_NICE works for everything. What else would we need? Maybe CAP_NET_ADMIN for net_cls and net_prio? CAP_SYS_RESOURCE|CAP_SYS_ADMIN for pids? > cgroupfs only handles > the latter. If we need different checks for different controllers, we can add checks to cgroupfs. > CAP_CGROUP_MIGRATE ought to be okay. Or maybe cgroupfs needs to gain > a concept of "dangerous" cgroups and further restrict them and > CAP_SYS_RESOURCE should be fine for non-dangerous cgroups? I think I > favor the latter, but it might be nice to hear from Tejun first. > > --Andy
Hello, On Mon, Dec 05, 2016 at 04:36:51PM -0800, Andy Lutomirski wrote: > I really don't know. The cgroupfs interface is a bit unfortunate in > that it doesn't really express the constraints. To safely migrate a > task, ISTM you ought to have some form of privilege over the task > *and* some form of privilege over the cgroup. cgroupfs only handles > the latter. > > CAP_CGROUP_MIGRATE ought to be okay. Or maybe cgroupfs needs to gain > a concept of "dangerous" cgroups and further restrict them and > CAP_SYS_RESOURCE should be fine for non-dangerous cgroups? I think I > favor the latter, but it might be nice to hear from Tejun first. If we can't do CAP_SYS_RESOURCE due to overlaps, let's go with a separate CAP. While for android and cgroup v1, it's nice to have a finer grained CAP for security control, privilege isolation in cgroup should also primarily done through hierarchical delegation. It doesn't make sense to have another system in parallel. We can't do it properly on v1 because some controllers aren't properly hierarchical and delegation model isn't well defined. e.g. nothing prevents a process from being pulled across different subtrees with the same delegation, but v2 can do it properly. All that's necessary is to make the CAP test OR'd to other perm checks instead of AND'ing so that the CAP just allows overriding restrictions expressed through delegation but it's normally possible to move processes around in one's own delegated subtree. Thanks. -- tejun
Hello, Serge. On Mon, Dec 05, 2016 at 08:00:11PM -0600, Serge E. Hallyn wrote: > > I really don't know. The cgroupfs interface is a bit unfortunate in > > that it doesn't really express the constraints. To safely migrate a > > task, ISTM you ought to have some form of privilege over the task > > *and* some form of privilege over the cgroup. > > Agreed. The problem is that the privilege required should depend on > the controller (I guess). For memory and cpuset, CAP_SYS_NICE seems > right. Perhaps CAP_SYS_RESOURCE would be needed for some.. but then, > as I look through the lists (capabilities(7) and the list of controllers), > it seems like CAP_SYS_NICE works for everything. What else would we need? > Maybe CAP_NET_ADMIN for net_cls and net_prio? CAP_SYS_RESOURCE|CAP_SYS_ADMIN > for pids? Please see my other reply but I don't think it's a good idea to have these extra checks on the side when there already is hierarchical delegation mechanism which should be able to handle both resource control and process management delegation. Thanks. -- tejun
On Tue, Dec 6, 2016 at 8:55 AM, Tejun Heo <tj@kernel.org> wrote: > Hello, > > On Mon, Dec 05, 2016 at 04:36:51PM -0800, Andy Lutomirski wrote: >> I really don't know. The cgroupfs interface is a bit unfortunate in >> that it doesn't really express the constraints. To safely migrate a >> task, ISTM you ought to have some form of privilege over the task >> *and* some form of privilege over the cgroup. cgroupfs only handles >> the latter. >> >> CAP_CGROUP_MIGRATE ought to be okay. Or maybe cgroupfs needs to gain >> a concept of "dangerous" cgroups and further restrict them and >> CAP_SYS_RESOURCE should be fine for non-dangerous cgroups? I think I >> favor the latter, but it might be nice to hear from Tejun first. > > If we can't do CAP_SYS_RESOURCE due to overlaps, let's go with a > separate CAP. While for android and cgroup v1, it's nice to have a > finer grained CAP for security control, privilege isolation in cgroup > should also primarily done through hierarchical delegation. It > doesn't make sense to have another system in parallel. > > We can't do it properly on v1 because some controllers aren't properly > hierarchical and delegation model isn't well defined. e.g. nothing > prevents a process from being pulled across different subtrees with > the same delegation, but v2 can do it properly. All that's necessary > is to make the CAP test OR'd to other perm checks instead of AND'ing > so that the CAP just allows overriding restrictions expressed through > delegation but it's normally possible to move processes around in > one's own delegated subtree. How would one be granted the right to move processes around in one's own subtree? Are you imagining that, if you're in /a/b and you want to move a process that's currently in /a/b/c to /a/b/d then you're allowed to because the target process is in your tree? If so, I doubt this has the security properties you want -- namely, if you can cooperate with anyone in /, even if they're unprivileged, you can break it.
Hello, On Tue, Dec 06, 2016 at 09:01:17AM -0800, Andy Lutomirski wrote: > How would one be granted the right to move processes around in one's > own subtree? Through expicit delegation - chowning of the directory and cgroup.procs file. > Are you imagining that, if you're in /a/b and you want to move a > process that's currently in /a/b/c to /a/b/d then you're allowed to > because the target process is in your tree? If so, I doubt this has > the security properties you want -- namely, if you can cooperate with > anyone in /, even if they're unprivileged, you can break it. Delegation is an explicit operation and reflected in the ownership of the subdirectories and cgroup interface files in them. The subhierarchy containment is achieved by requiring the user who's trying to migrate a process to have write perm on cgroup.procs on the common ancestor of the source and target in addition to the target. So, a user who has a subhierarchy delegated to it can move processes around inside but not out of or into it. Also, if there are multiple delegated disjoint subhierarchies, processes can't be moved across them by the delegatee either. Thanks. -- tejun
On Tue, Dec 6, 2016 at 10:12 AM, Tejun Heo <tj@kernel.org> wrote: > Hello, > > On Tue, Dec 06, 2016 at 09:01:17AM -0800, Andy Lutomirski wrote: >> How would one be granted the right to move processes around in one's >> own subtree? > > Through expicit delegation - chowning of the directory and > cgroup.procs file. > >> Are you imagining that, if you're in /a/b and you want to move a >> process that's currently in /a/b/c to /a/b/d then you're allowed to >> because the target process is in your tree? If so, I doubt this has >> the security properties you want -- namely, if you can cooperate with >> anyone in /, even if they're unprivileged, you can break it. > > Delegation is an explicit operation and reflected in the ownership of > the subdirectories and cgroup interface files in them. The > subhierarchy containment is achieved by requiring the user who's > trying to migrate a process to have write perm on cgroup.procs on the > common ancestor of the source and target in addition to the target. OK, I see what you're doing. That's interesting.
On Tue, Dec 6, 2016 at 10:23 AM, Tejun Heo <tj@kernel.org> wrote: > Hello, > > On Tue, Dec 06, 2016 at 10:13:53AM -0800, Andy Lutomirski wrote: >> > Delegation is an explicit operation and reflected in the ownership of >> > the subdirectories and cgroup interface files in them. The >> > subhierarchy containment is achieved by requiring the user who's >> > trying to migrate a process to have write perm on cgroup.procs on the >> > common ancestor of the source and target in addition to the target. >> >> OK, I see what you're doing. That's interesting. > > It's something born out of usages of cgroup v1. People used it that > way (chowning files and directories) and combined with the uid checksn > it yielded something which is useful sometimes, but it always had > issues with hierarchical behaviors, which files to chmod and the weird > combination of uid checks. cgroup v2 has a clear delegation model but > the uid checks are still left in as not changing was the default. > > It's not necessary and I'm thinking about queueing something like the > following in the next cycle. > > As for the android CAP discussion, I think it'd be nice to share an > existing CAP but if we can't find a good one to share, let's create a > new one. So just to clarify the discussion for my purposes and make sure I understood, per-cgroup CAP rules was not desired, and instead we should either utilize an existing cap (are there still objections to CAP_SYS_RESOURCE? - this isn't clear to me) or create a new one (ie, bring back the older CAP_CGROUP_MIGRATE patch). Tejun: Do you have a more finished version of your patch that I should add my changes on top of? thanks -john
Hello, John. On Thu, Dec 08, 2016 at 09:39:38PM -0800, John Stultz wrote: > So just to clarify the discussion for my purposes and make sure I > understood, per-cgroup CAP rules was not desired, and instead we > should either utilize an existing cap (are there still objections to > CAP_SYS_RESOURCE? - this isn't clear to me) or create a new one (ie, > bring back the older CAP_CGROUP_MIGRATE patch). Let's create a new one. It looks to be a bit too different to share with an existing one. > Tejun: Do you have a more finished version of your patch that I should > add my changes on top of? Oh, just submit the patch on top of the current for-next. I can queue mine on top of yours. They are mostly orthogonal. Thanks. -- tejun
diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 85bc9be..866059a 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -2856,7 +2856,8 @@ static int cgroup_procs_write_permission(struct task_struct *task, */ if (!uid_eq(cred->euid, GLOBAL_ROOT_UID) && !uid_eq(cred->euid, tcred->uid) && - !uid_eq(cred->euid, tcred->suid)) + !uid_eq(cred->euid, tcred->suid) && + !ns_capable(tcred->user_ns, CAP_SYS_RESOURCE)) ret = -EACCES; if (!ret && cgroup_on_dfl(dst_cgrp)) {