Message ID | 20250513163438.3942405-9-tabba@google.com |
---|---|
State | New |
Headers | show |
Series | KVM: Mapping guest_memfd backed memory at the host for software protected VMs | expand |
On Tue, May 13, 2025 at 9:34 AM Fuad Tabba <tabba@google.com> wrote: > > From: Ackerley Tng <ackerleytng@google.com> > > On binding of a guest_memfd with a memslot, check that the slot's > userspace_addr and the requested fd and offset refer to the same memory > range. > > This check is best-effort: nothing prevents userspace from later mapping > other memory to the same provided in slot->userspace_addr and breaking > guest operation. > > Suggested-by: David Hildenbrand <david@redhat.com> > Suggested-by: Sean Christopherson <seanjc@google.com> > Suggested-by: Yan Zhao <yan.y.zhao@intel.com> > Signed-off-by: Ackerley Tng <ackerleytng@google.com> > Signed-off-by: Fuad Tabba <tabba@google.com> > --- > virt/kvm/guest_memfd.c | 37 ++++++++++++++++++++++++++++++++++--- > 1 file changed, 34 insertions(+), 3 deletions(-) > > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c > index 8e6d1866b55e..2f499021df66 100644 > --- a/virt/kvm/guest_memfd.c > +++ b/virt/kvm/guest_memfd.c > @@ -556,6 +556,32 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args) > return __kvm_gmem_create(kvm, size, flags); > } > > +static bool kvm_gmem_is_same_range(struct kvm *kvm, > + struct kvm_memory_slot *slot, > + struct file *file, loff_t offset) > +{ > + struct mm_struct *mm = kvm->mm; > + loff_t userspace_addr_offset; > + struct vm_area_struct *vma; > + bool ret = false; > + > + mmap_read_lock(mm); > + > + vma = vma_lookup(mm, slot->userspace_addr); > + if (!vma) > + goto out; > + > + if (vma->vm_file != file) > + goto out; > + > + userspace_addr_offset = slot->userspace_addr - vma->vm_start; > + ret = userspace_addr_offset + (vma->vm_pgoff << PAGE_SHIFT) == offset; > +out: > + mmap_read_unlock(mm); > + > + return ret; > +} > + > int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot, > unsigned int fd, loff_t offset) > { > @@ -585,9 +611,14 @@ int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot, > offset + size > i_size_read(inode)) > goto err; > > - if (kvm_gmem_supports_shared(inode) && > - !kvm_arch_vm_supports_gmem_shared_mem(kvm)) > - goto err; > + if (kvm_gmem_supports_shared(inode)) { > + if (!kvm_arch_vm_supports_gmem_shared_mem(kvm)) > + goto err; > + > + if (slot->userspace_addr && > + !kvm_gmem_is_same_range(kvm, slot, file, offset)) > + goto err; This is very nit-picky, but I would rather this not be -EINVAL, maybe -EIO instead? Or maybe a pr_warn_once() and let the call proceed? The userspace_addr we got isn't invalid per se, we're just trying to give a hint to the user that their VMAs (or the userspace address they gave us) are messed up. I don't really like lumping this in with truly invalid arguments. > + } > > filemap_invalidate_lock(inode->i_mapping); > > -- > 2.49.0.1045.g170613ef41-goog >
Hi James, On Tue, 13 May 2025 at 21:31, James Houghton <jthoughton@google.com> wrote: > > On Tue, May 13, 2025 at 9:34 AM Fuad Tabba <tabba@google.com> wrote: > > > > From: Ackerley Tng <ackerleytng@google.com> > > > > On binding of a guest_memfd with a memslot, check that the slot's > > userspace_addr and the requested fd and offset refer to the same memory > > range. > > > > This check is best-effort: nothing prevents userspace from later mapping > > other memory to the same provided in slot->userspace_addr and breaking > > guest operation. > > > > Suggested-by: David Hildenbrand <david@redhat.com> > > Suggested-by: Sean Christopherson <seanjc@google.com> > > Suggested-by: Yan Zhao <yan.y.zhao@intel.com> > > Signed-off-by: Ackerley Tng <ackerleytng@google.com> > > Signed-off-by: Fuad Tabba <tabba@google.com> > > --- > > virt/kvm/guest_memfd.c | 37 ++++++++++++++++++++++++++++++++++--- > > 1 file changed, 34 insertions(+), 3 deletions(-) > > > > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c > > index 8e6d1866b55e..2f499021df66 100644 > > --- a/virt/kvm/guest_memfd.c > > +++ b/virt/kvm/guest_memfd.c > > @@ -556,6 +556,32 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args) > > return __kvm_gmem_create(kvm, size, flags); > > } > > > > +static bool kvm_gmem_is_same_range(struct kvm *kvm, > > + struct kvm_memory_slot *slot, > > + struct file *file, loff_t offset) > > +{ > > + struct mm_struct *mm = kvm->mm; > > + loff_t userspace_addr_offset; > > + struct vm_area_struct *vma; > > + bool ret = false; > > + > > + mmap_read_lock(mm); > > + > > + vma = vma_lookup(mm, slot->userspace_addr); > > + if (!vma) > > + goto out; > > + > > + if (vma->vm_file != file) > > + goto out; > > + > > + userspace_addr_offset = slot->userspace_addr - vma->vm_start; > > + ret = userspace_addr_offset + (vma->vm_pgoff << PAGE_SHIFT) == offset; > > +out: > > + mmap_read_unlock(mm); > > + > > + return ret; > > +} > > + > > int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot, > > unsigned int fd, loff_t offset) > > { > > @@ -585,9 +611,14 @@ int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot, > > offset + size > i_size_read(inode)) > > goto err; > > > > - if (kvm_gmem_supports_shared(inode) && > > - !kvm_arch_vm_supports_gmem_shared_mem(kvm)) > > - goto err; > > + if (kvm_gmem_supports_shared(inode)) { > > + if (!kvm_arch_vm_supports_gmem_shared_mem(kvm)) > > + goto err; > > + > > + if (slot->userspace_addr && > > + !kvm_gmem_is_same_range(kvm, slot, file, offset)) > > + goto err; > > This is very nit-picky, but I would rather this not be -EINVAL, maybe > -EIO instead? Or maybe a pr_warn_once() and let the call proceed? > > The userspace_addr we got isn't invalid per se, we're just trying to > give a hint to the user that their VMAs (or the userspace address they > gave us) are messed up. I don't really like lumping this in with truly > invalid arguments. I don't mind changing the return error, but I don't think that we should have a kernel warning (pr_warn_once) for something userspace can trigger. It's not an IO error either. I think that this is an invalid argument (EINVAL). That said, other than opposing the idea of pr_warn, I am happy to change it. Cheers, /fuad > > + } > > > > filemap_invalidate_lock(inode->i_mapping); > > > > -- > > 2.49.0.1045.g170613ef41-goog > >
On Wed, May 14, 2025, Fuad Tabba wrote: > On Tue, 13 May 2025 at 21:31, James Houghton <jthoughton@google.com> wrote: > > > > On Tue, May 13, 2025 at 9:34 AM Fuad Tabba <tabba@google.com> wrote: > > > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c > > > index 8e6d1866b55e..2f499021df66 100644 > > > --- a/virt/kvm/guest_memfd.c > > > +++ b/virt/kvm/guest_memfd.c > > > @@ -556,6 +556,32 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args) > > > return __kvm_gmem_create(kvm, size, flags); > > > } > > > > > > +static bool kvm_gmem_is_same_range(struct kvm *kvm, > > > + struct kvm_memory_slot *slot, > > > + struct file *file, loff_t offset) > > > +{ > > > + struct mm_struct *mm = kvm->mm; > > > + loff_t userspace_addr_offset; > > > + struct vm_area_struct *vma; > > > + bool ret = false; > > > + > > > + mmap_read_lock(mm); > > > + > > > + vma = vma_lookup(mm, slot->userspace_addr); > > > + if (!vma) > > > + goto out; > > > + > > > + if (vma->vm_file != file) > > > + goto out; > > > + > > > + userspace_addr_offset = slot->userspace_addr - vma->vm_start; > > > + ret = userspace_addr_offset + (vma->vm_pgoff << PAGE_SHIFT) == offset; > > > +out: > > > + mmap_read_unlock(mm); > > > + > > > + return ret; > > > +} > > > + > > > int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot, > > > unsigned int fd, loff_t offset) > > > { > > > @@ -585,9 +611,14 @@ int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot, > > > offset + size > i_size_read(inode)) > > > goto err; > > > > > > - if (kvm_gmem_supports_shared(inode) && > > > - !kvm_arch_vm_supports_gmem_shared_mem(kvm)) > > > - goto err; > > > + if (kvm_gmem_supports_shared(inode)) { > > > + if (!kvm_arch_vm_supports_gmem_shared_mem(kvm)) > > > + goto err; > > > + > > > + if (slot->userspace_addr && > > > + !kvm_gmem_is_same_range(kvm, slot, file, offset)) > > > + goto err; > > > > This is very nit-picky, but I would rather this not be -EINVAL, maybe > > -EIO instead? Or maybe a pr_warn_once() and let the call proceed? Or just omit the check entirely. The check isn't binding (ba-dump, ching!), because the mapping/VMA can change the instant mmap_read_unlock() is called. > > The userspace_addr we got isn't invalid per se, we're just trying to > > give a hint to the user that their VMAs (or the userspace address they > > gave us) are messed up. I don't really like lumping this in with truly > > invalid arguments. > > I don't mind changing the return error, but I don't think that we > should have a kernel warning (pr_warn_once) for something userspace > can trigger. This isn't a WARN, e.g. won't trip panic_on_warn. In practice, it's not meaningfully different than pr_info(). That said, I agree that printing anything is a bad approach. > It's not an IO error either. I think that this is an invalid argument > (EINVAL). I agree with James, this isn't an invalid argument. Having the validity of an input hinge on the ordering between a KVM ioctl() and mmap() is quite odd. I know KVM arm64 does exactly this for KVM_SET_USER_MEMORY_REGION{,2}, but I don't love the semantics. And unlike that scenario, where e.g. MTE tags are verified again at fault-time, KVM won't re-check the VMA when accessing guest memory via the userspace mapping, e.g. through uaccess. Unless I'm forgetting something, I'm leaning toward omitting the check entirely. > That said, other than opposing the idea of pr_warn, I am happy to change it.
Sean Christopherson <seanjc@google.com> writes: > On Wed, May 14, 2025, Fuad Tabba wrote: >> On Tue, 13 May 2025 at 21:31, James Houghton <jthoughton@google.com> wrote: >> > >> > On Tue, May 13, 2025 at 9:34 AM Fuad Tabba <tabba@google.com> wrote: >> > > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c >> > > index 8e6d1866b55e..2f499021df66 100644 >> > > --- a/virt/kvm/guest_memfd.c >> > > +++ b/virt/kvm/guest_memfd.c >> > > @@ -556,6 +556,32 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args) >> > > return __kvm_gmem_create(kvm, size, flags); >> > > } >> > > >> > > +static bool kvm_gmem_is_same_range(struct kvm *kvm, >> > > + struct kvm_memory_slot *slot, >> > > + struct file *file, loff_t offset) >> > > +{ >> > > + struct mm_struct *mm = kvm->mm; >> > > + loff_t userspace_addr_offset; >> > > + struct vm_area_struct *vma; >> > > + bool ret = false; >> > > + >> > > + mmap_read_lock(mm); >> > > + >> > > + vma = vma_lookup(mm, slot->userspace_addr); >> > > + if (!vma) >> > > + goto out; >> > > + >> > > + if (vma->vm_file != file) >> > > + goto out; >> > > + >> > > + userspace_addr_offset = slot->userspace_addr - vma->vm_start; >> > > + ret = userspace_addr_offset + (vma->vm_pgoff << PAGE_SHIFT) == offset; >> > > +out: >> > > + mmap_read_unlock(mm); >> > > + >> > > + return ret; >> > > +} >> > > + >> > > int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot, >> > > unsigned int fd, loff_t offset) >> > > { >> > > @@ -585,9 +611,14 @@ int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot, >> > > offset + size > i_size_read(inode)) >> > > goto err; >> > > >> > > - if (kvm_gmem_supports_shared(inode) && >> > > - !kvm_arch_vm_supports_gmem_shared_mem(kvm)) >> > > - goto err; >> > > + if (kvm_gmem_supports_shared(inode)) { >> > > + if (!kvm_arch_vm_supports_gmem_shared_mem(kvm)) >> > > + goto err; >> > > + >> > > + if (slot->userspace_addr && >> > > + !kvm_gmem_is_same_range(kvm, slot, file, offset)) >> > > + goto err; >> > >> > This is very nit-picky, but I would rather this not be -EINVAL, maybe >> > -EIO instead? Or maybe a pr_warn_once() and let the call proceed? > > Or just omit the check entirely. The check isn't binding (ba-dump, ching!), > because the mapping/VMA can change the instant mmap_read_unlock() is called. > >> > The userspace_addr we got isn't invalid per se, we're just trying to >> > give a hint to the user that their VMAs (or the userspace address they >> > gave us) are messed up. I don't really like lumping this in with truly >> > invalid arguments. >> >> I don't mind changing the return error, but I don't think that we >> should have a kernel warning (pr_warn_once) for something userspace >> can trigger. > > This isn't a WARN, e.g. won't trip panic_on_warn. In practice, it's not > meaningfully different than pr_info(). That said, I agree that printing anything > is a bad approach. > >> It's not an IO error either. I think that this is an invalid argument >> (EINVAL). > > I agree with James, this isn't an invalid argument. Having the validity of an > input hinge on the ordering between a KVM ioctl() and mmap() is quite odd. I > know KVM arm64 does exactly this for KVM_SET_USER_MEMORY_REGION{,2}, but I don't > love the semantics. And unlike that scenario, where e.g. MTE tags are verified > again at fault-time, KVM won't re-check the VMA when accessing guest memory via > the userspace mapping, e.g. through uaccess. > > Unless I'm forgetting something, I'm leaning toward omitting the check entirely. > I'm good with dropping this patch. I might have misunderstood the conclusion of the guest_memfd call. >> That said, other than opposing the idea of pr_warn, I am happy to change it.
On Wed, May 14, 2025, Ackerley Tng wrote: > Sean Christopherson <seanjc@google.com> writes: > > On Wed, May 14, 2025, Fuad Tabba wrote: > >> On Tue, 13 May 2025 at 21:31, James Houghton <jthoughton@google.com> wrote: > >> > > @@ -585,9 +611,14 @@ int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot, > >> > > offset + size > i_size_read(inode)) > >> > > goto err; > >> > > > >> > > - if (kvm_gmem_supports_shared(inode) && > >> > > - !kvm_arch_vm_supports_gmem_shared_mem(kvm)) > >> > > - goto err; > >> > > + if (kvm_gmem_supports_shared(inode)) { > >> > > + if (!kvm_arch_vm_supports_gmem_shared_mem(kvm)) > >> > > + goto err; > >> > > + > >> > > + if (slot->userspace_addr && > >> > > + !kvm_gmem_is_same_range(kvm, slot, file, offset)) > >> > > + goto err; > >> > > >> > This is very nit-picky, but I would rather this not be -EINVAL, maybe > >> > -EIO instead? Or maybe a pr_warn_once() and let the call proceed? > > > > Or just omit the check entirely. The check isn't binding (ba-dump, ching!), > > because the mapping/VMA can change the instant mmap_read_unlock() is called. > > > >> > The userspace_addr we got isn't invalid per se, we're just trying to > >> > give a hint to the user that their VMAs (or the userspace address they > >> > gave us) are messed up. I don't really like lumping this in with truly > >> > invalid arguments. > >> > >> I don't mind changing the return error, but I don't think that we > >> should have a kernel warning (pr_warn_once) for something userspace > >> can trigger. > > > > This isn't a WARN, e.g. won't trip panic_on_warn. In practice, it's not > > meaningfully different than pr_info(). That said, I agree that printing anything > > is a bad approach. > > > >> It's not an IO error either. I think that this is an invalid argument > >> (EINVAL). > > > > I agree with James, this isn't an invalid argument. Having the validity of an > > input hinge on the ordering between a KVM ioctl() and mmap() is quite odd. I > > know KVM arm64 does exactly this for KVM_SET_USER_MEMORY_REGION{,2}, but I don't > > love the semantics. And unlike that scenario, where e.g. MTE tags are verified > > again at fault-time, KVM won't re-check the VMA when accessing guest memory via > > the userspace mapping, e.g. through uaccess. > > > > Unless I'm forgetting something, I'm leaning toward omitting the check entirely. > > > > I'm good with dropping this patch. I might have misunderstood the conclusion > of the guest_memfd call. No, I don't think you misunderstood anything. It's just that sometimes opinions different when there's actual code, versus a verbal discussion. I.e. this sounds like a good idea, but when seeing the code and thinking through the effects, it's less appealing.
On 13.05.25 18:34, Fuad Tabba wrote: > From: Ackerley Tng <ackerleytng@google.com> > > On binding of a guest_memfd with a memslot, check that the slot's > userspace_addr and the requested fd and offset refer to the same memory > range. > > This check is best-effort: nothing prevents userspace from later mapping > other memory to the same provided in slot->userspace_addr and breaking > guest operation. > > Suggested-by: David Hildenbrand <david@redhat.com> > Suggested-by: Sean Christopherson <seanjc@google.com> > Suggested-by: Yan Zhao <yan.y.zhao@intel.com> > Signed-off-by: Ackerley Tng <ackerleytng@google.com> > Signed-off-by: Fuad Tabba <tabba@google.com> > --- > virt/kvm/guest_memfd.c | 37 ++++++++++++++++++++++++++++++++++--- > 1 file changed, 34 insertions(+), 3 deletions(-) > > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c > index 8e6d1866b55e..2f499021df66 100644 > --- a/virt/kvm/guest_memfd.c > +++ b/virt/kvm/guest_memfd.c > @@ -556,6 +556,32 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args) > return __kvm_gmem_create(kvm, size, flags); > } > > +static bool kvm_gmem_is_same_range(struct kvm *kvm, > + struct kvm_memory_slot *slot, > + struct file *file, loff_t offset) > +{ > + struct mm_struct *mm = kvm->mm; > + loff_t userspace_addr_offset; > + struct vm_area_struct *vma; > + bool ret = false; > + > + mmap_read_lock(mm); > + > + vma = vma_lookup(mm, slot->userspace_addr); > + if (!vma) > + goto out; > + > + if (vma->vm_file != file) > + goto out; > + > + userspace_addr_offset = slot->userspace_addr - vma->vm_start; > + ret = userspace_addr_offset + (vma->vm_pgoff << PAGE_SHIFT) == offset; You'd probably have to iterate over the whole range (which might span multiple VMAs), but reading the discussion, I'm fine with dropping this patch for now. I think it's more important to document that thoroughly: what does it mean when we use GUEST_MEMFD_FLAG_SUPPORT_SHARED and then pass that fd in a memslot. Skimming over patch #15, I assume this is properly documented in there.
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 8e6d1866b55e..2f499021df66 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -556,6 +556,32 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args) return __kvm_gmem_create(kvm, size, flags); } +static bool kvm_gmem_is_same_range(struct kvm *kvm, + struct kvm_memory_slot *slot, + struct file *file, loff_t offset) +{ + struct mm_struct *mm = kvm->mm; + loff_t userspace_addr_offset; + struct vm_area_struct *vma; + bool ret = false; + + mmap_read_lock(mm); + + vma = vma_lookup(mm, slot->userspace_addr); + if (!vma) + goto out; + + if (vma->vm_file != file) + goto out; + + userspace_addr_offset = slot->userspace_addr - vma->vm_start; + ret = userspace_addr_offset + (vma->vm_pgoff << PAGE_SHIFT) == offset; +out: + mmap_read_unlock(mm); + + return ret; +} + int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot, unsigned int fd, loff_t offset) { @@ -585,9 +611,14 @@ int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot, offset + size > i_size_read(inode)) goto err; - if (kvm_gmem_supports_shared(inode) && - !kvm_arch_vm_supports_gmem_shared_mem(kvm)) - goto err; + if (kvm_gmem_supports_shared(inode)) { + if (!kvm_arch_vm_supports_gmem_shared_mem(kvm)) + goto err; + + if (slot->userspace_addr && + !kvm_gmem_is_same_range(kvm, slot, file, offset)) + goto err; + } filemap_invalidate_lock(inode->i_mapping);