From patchwork Wed Aug 17 21:47:24 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Axel Rasmussen X-Patchwork-Id: 598003 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6B57AC32774 for ; Wed, 17 Aug 2022 21:47:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242098AbiHQVri (ORCPT ); Wed, 17 Aug 2022 17:47:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48916 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242045AbiHQVrh (ORCPT ); Wed, 17 Aug 2022 17:47:37 -0400 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AA41AA99CD for ; Wed, 17 Aug 2022 14:47:35 -0700 (PDT) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-31f5960500bso172493807b3.14 for ; Wed, 17 Aug 2022 14:47:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:from:to:cc; bh=cU7/6qINBgFAfTXfLa178LOY1yjmrSCVESmZNe7FU4M=; b=hkkW+cPBtBwBtd13GQEN9rmENzaMrtG41NPJb/PgUS0IMKflR0ybOP+DOGkh+tegM3 KBc0tw/HnrcDhmSWECnXug+OFT36fJNH86hY9N4cveCHsYT2LEukQiA09NHf7PSze2Ze gZ6LkYMOcbz5YFTbs94rJ/F5oy6jy0zQu2NiuE5bYAa5RNRIKLPf9+P9+LOt+MgI9FBW Sp/z8rhpongmYqFSVPDBXxTdr6wo5uIPwD7YOlJ9g45SO426YhCNSRnTzgnZf4HMx+Qs ciFa5HfjDwGNQvaxJcZqAUAIcrXkTU0CcM/vlFfkF1C7iOesvdf8kAzEvWJPbtltnegU mAiw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:x-gm-message-state:from:to:cc; bh=cU7/6qINBgFAfTXfLa178LOY1yjmrSCVESmZNe7FU4M=; b=XoEbtHmu9JgvR6y6QWhd/+MwgaJ0a97KRx5oVXd5hqZzbOSY3Mgxd0ul0dTurQQyIn gOykoo0nhXRBfHtUSbCAV4TZd81OfemG7aXXk4eVtPzFJ+gkcJ6LR2ddz0esTAC4Y1w2 Nnkg2hWes0eJ4TPbN5sY4DJpppo7dwQsKel/aZwnbB4m5TsibtfeMJnzm7imwztVvAa9 0zMuuAId7LP6lI9E1IymLtWHE3DCYkotLETUyiaqW99YpeZ4fzL1FIDSH99pZQAB3Ivc +OhWHorOgHWItBo3vnyPRdYEO6X3w56/CQM5iaGxEZP9PZg552tLo0ce1LPtyRShYwEd MXFQ== X-Gm-Message-State: ACgBeo3tHa5V3O/oQJSuN+6niiAV2c02670JJXpMWE4qr6JsXzvE3QUb Eligbclmz3cXLvlAjTGovRZF+YR0U/bODqHHI02a X-Google-Smtp-Source: AA6agR6m2oPxeL8QFTcUaPmrHHrUfsAkYPB07Q9g2heOngwoQkb+NCEv/HhiuzywTvWsUou7cQl2z0Qm9xTtUBZYcsCH X-Received: from ajr0.svl.corp.google.com ([2620:15c:2d4:203:2f41:f176:4bac:b729]) (user=axelrasmussen job=sendgmr) by 2002:a81:ae0a:0:b0:324:59ab:feec with SMTP id m10-20020a81ae0a000000b0032459abfeecmr144125ywh.7.1660772854950; Wed, 17 Aug 2022 14:47:34 -0700 (PDT) Date: Wed, 17 Aug 2022 14:47:24 -0700 In-Reply-To: <20220817214728.489904-1-axelrasmussen@google.com> Message-Id: <20220817214728.489904-2-axelrasmussen@google.com> Mime-Version: 1.0 References: <20220817214728.489904-1-axelrasmussen@google.com> X-Mailer: git-send-email 2.37.1.595.g718a3a8f04-goog Subject: [PATCH v6 1/5] selftests: vm: add hugetlb_shared userfaultfd test to run_vmtests.sh From: Axel Rasmussen To: Alexander Viro , Andrew Morton , Dave Hansen , "Dmitry V . Levin" , Gleb Fotengauer-Malinovskiy , Hugh Dickins , Jan Kara , Jonathan Corbet , Mel Gorman , Mike Kravetz , Mike Rapoport , Nadav Amit , Peter Xu , Shuah Khan , Suren Baghdasaryan , Vlastimil Babka , zhangyi Cc: Axel Rasmussen , linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, linux-security-module@vger.kernel.org, Shuah Khan Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org This not being included was just a simple oversight. There are certain features (like minor fault support) which are only enabled on shared mappings, so without including hugetlb_shared we actually lose a significant amount of test coverage. Reviewed-by: Shuah Khan Reviewed-by: Peter Xu Signed-off-by: Axel Rasmussen --- tools/testing/selftests/vm/run_vmtests.sh | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/vm/run_vmtests.sh b/tools/testing/selftests/vm/run_vmtests.sh index de86983b8a0f..b8e7f6f38d64 100755 --- a/tools/testing/selftests/vm/run_vmtests.sh +++ b/tools/testing/selftests/vm/run_vmtests.sh @@ -121,9 +121,11 @@ run_test ./gup_test -a run_test ./gup_test -ct -F 0x1 0 19 0x1000 run_test ./userfaultfd anon 20 16 -# Test requires source and destination huge pages. Size of source -# (half_ufd_size_MB) is passed as argument to test. +# Hugetlb tests require source and destination huge pages. Pass in half the +# size ($half_ufd_size_MB), which is used for *each*. run_test ./userfaultfd hugetlb "$half_ufd_size_MB" 32 +run_test ./userfaultfd hugetlb_shared "$half_ufd_size_MB" 32 "$mnt"/uffd-test +rm -f "$mnt"/uffd-test run_test ./userfaultfd shmem 20 16 #cleanup From patchwork Wed Aug 17 21:47:26 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Axel Rasmussen X-Patchwork-Id: 598002 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 13A59C32772 for ; Wed, 17 Aug 2022 21:47:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242365AbiHQVrz (ORCPT ); Wed, 17 Aug 2022 17:47:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49062 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241794AbiHQVrw (ORCPT ); Wed, 17 Aug 2022 17:47:52 -0400 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8B6D0AB1BD for ; Wed, 17 Aug 2022 14:47:40 -0700 (PDT) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-334f49979a0so52211007b3.10 for ; Wed, 17 Aug 2022 14:47:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:from:to:cc; bh=yddoa+559b8nrUIn5RkzA6LuoLnD/Z+9mfZhmJLmujQ=; b=m9tiA0nPg/s7VYVOjcdjDNni4budWfetLx4rz82pZbVDwEPrPwRNE+d0za8Ah41wVX fAZ9qV7UwzKqhXTuh15M/FMYRAjV5W8R9lY//jQ5RJ+qm9Pu+o+KpTelgenH5TAnKE4X cvQWA++nmAxmU3CLPfqnSRY+N+KY8PPFoBLYeOwn7NxThxdLKXwGQkhEwwKZ4utGheye gRVg8rFECj25MeBphygKyPxeUprIcAXTnCEkY8F1Sz9BHEmIldeeg5VjXBGC2pnmhIOp 48WWLZ9QbUGdbla+SIs+rDOPKuT+EF2+7NZ004G3S1o0JwvkQ3HqTFd0RUz1R9bkouNS zmCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:x-gm-message-state:from:to:cc; bh=yddoa+559b8nrUIn5RkzA6LuoLnD/Z+9mfZhmJLmujQ=; b=pVuN0JDMSf/FjLiBONhw76j+omyPTKnNI5XHX4eR+wDCWlyi8bc22OdmzEVMt7Oljx EhPKCwtxfHuG8fb/eHCa59o2fXBk8NgHLxDcX2FyewRPgJKemBJTkcxhxrEGO8A184uN V1wL/EP3mClCKHJAnt1eu7be55iJy5m9/m3APulTvL7PARDpfk+Phi4m+2WW4sSfwYSF UpWT68KSA/ojAw2yBABULR4vbKS1Zf4DMlTsnG3GvYC67cbW40AZ3ax9sEI58VBmP07T SDsCAMZ3RkFllRl7xhfqiQbpBBLcY/CP5zHQS1vTVVsQh7wSCybkJgQaB9F7Qm9w//sS nBcg== X-Gm-Message-State: ACgBeo00ZUbc5qiwzyYZm4nxf6AeiTFckPHeUvJ+8vsZAVKSl37kbSMf NcISTXo8+ArUTOt94PlGrCifTkktXaNpeviW0CUb X-Google-Smtp-Source: AA6agR5uiGSbWNSyAVwfvEjoliYuIG/CANePnm5YVailnBUiHaTKY38N4XKiRjGKv1XOqkLFxT5GeSaoN05kueWOJ71k X-Received: from ajr0.svl.corp.google.com ([2620:15c:2d4:203:2f41:f176:4bac:b729]) (user=axelrasmussen job=sendgmr) by 2002:a5b:a0d:0:b0:689:9eee:348f with SMTP id k13-20020a5b0a0d000000b006899eee348fmr219249ybq.111.1660772859837; Wed, 17 Aug 2022 14:47:39 -0700 (PDT) Date: Wed, 17 Aug 2022 14:47:26 -0700 In-Reply-To: <20220817214728.489904-1-axelrasmussen@google.com> Message-Id: <20220817214728.489904-4-axelrasmussen@google.com> Mime-Version: 1.0 References: <20220817214728.489904-1-axelrasmussen@google.com> X-Mailer: git-send-email 2.37.1.595.g718a3a8f04-goog Subject: [PATCH v6 3/5] userfaultfd: selftests: modify selftest to use /dev/userfaultfd From: Axel Rasmussen To: Alexander Viro , Andrew Morton , Dave Hansen , "Dmitry V . Levin" , Gleb Fotengauer-Malinovskiy , Hugh Dickins , Jan Kara , Jonathan Corbet , Mel Gorman , Mike Kravetz , Mike Rapoport , Nadav Amit , Peter Xu , Shuah Khan , Suren Baghdasaryan , Vlastimil Babka , zhangyi Cc: Axel Rasmussen , linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, linux-security-module@vger.kernel.org, Mike Rapoport Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org We clearly want to ensure both userfaultfd(2) and /dev/userfaultfd keep working into the future, so just run the test twice, using each interface. Instead of always testing both userfaultfd(2) and /dev/userfaultfd, let the user choose which to test. As with other test features, change the behavior based on a new command line flag. Introduce the idea of "test mods", which are generic (not specific to a test type) modifications to the behavior of the test. This is sort of borrowed from this RFC patch series [1], but simplified a bit. The benefit is, in "typical" configurations this test is somewhat slow (say, 30sec or something). Testing both clearly doubles it, so it may not always be desirable, as users are likely to use one or the other, but never both, in the "real world". [1]: https://patchwork.kernel.org/project/linux-mm/patch/20201129004548.1619714-14-namit@vmware.com/ Acked-by: Mike Rapoport Acked-by: Peter Xu Signed-off-by: Axel Rasmussen --- tools/testing/selftests/vm/userfaultfd.c | 76 ++++++++++++++++++++---- 1 file changed, 66 insertions(+), 10 deletions(-) diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c index 7c3f1b0ab468..7be709d9eed0 100644 --- a/tools/testing/selftests/vm/userfaultfd.c +++ b/tools/testing/selftests/vm/userfaultfd.c @@ -77,6 +77,11 @@ static int bounces; #define TEST_SHMEM 3 static int test_type; +#define UFFD_FLAGS (O_CLOEXEC | O_NONBLOCK | UFFD_USER_MODE_ONLY) + +/* test using /dev/userfaultfd, instead of userfaultfd(2) */ +static bool test_dev_userfaultfd; + /* exercise the test_uffdio_*_eexist every ALARM_INTERVAL_SECS */ #define ALARM_INTERVAL_SECS 10 static volatile bool test_uffdio_copy_eexist = true; @@ -125,6 +130,8 @@ struct uffd_stats { const char *examples = "# Run anonymous memory test on 100MiB region with 99999 bounces:\n" "./userfaultfd anon 100 99999\n\n" + "# Run the same anonymous memory test, but using /dev/userfaultfd:\n" + "./userfaultfd anon:dev 100 99999\n\n" "# Run share memory test on 1GiB region with 99 bounces:\n" "./userfaultfd shmem 1000 99\n\n" "# Run hugetlb memory test on 256MiB region with 50 bounces:\n" @@ -141,6 +148,14 @@ static void usage(void) "[hugetlbfs_file]\n\n"); fprintf(stderr, "Supported : anon, hugetlb, " "hugetlb_shared, shmem\n\n"); + fprintf(stderr, "'Test mods' can be joined to the test type string with a ':'. " + "Supported mods:\n"); + fprintf(stderr, "\tsyscall - Use userfaultfd(2) (default)\n"); + fprintf(stderr, "\tdev - Use /dev/userfaultfd instead of userfaultfd(2)\n"); + fprintf(stderr, "\nExample test mod usage:\n"); + fprintf(stderr, "# Run anonymous memory test with /dev/userfaultfd:\n"); + fprintf(stderr, "./userfaultfd anon:dev 100 99999\n\n"); + fprintf(stderr, "Examples:\n\n"); fprintf(stderr, "%s", examples); exit(1); @@ -154,12 +169,14 @@ static void usage(void) ret, __LINE__); \ } while (0) -#define err(fmt, ...) \ +#define errexit(exitcode, fmt, ...) \ do { \ _err(fmt, ##__VA_ARGS__); \ - exit(1); \ + exit(exitcode); \ } while (0) +#define err(fmt, ...) errexit(1, fmt, ##__VA_ARGS__) + static void uffd_stats_reset(struct uffd_stats *uffd_stats, unsigned long n_cpus) { @@ -383,13 +400,34 @@ static void assert_expected_ioctls_present(uint64_t mode, uint64_t ioctls) } } +static int __userfaultfd_open_dev(void) +{ + int fd, _uffd; + + fd = open("/dev/userfaultfd", O_RDWR | O_CLOEXEC); + if (fd < 0) + errexit(KSFT_SKIP, "opening /dev/userfaultfd failed"); + + _uffd = ioctl(fd, USERFAULTFD_IOC_NEW, UFFD_FLAGS); + if (_uffd < 0) + errexit(errno == ENOTTY ? KSFT_SKIP : 1, + "creating userfaultfd failed"); + close(fd); + return _uffd; +} + static void userfaultfd_open(uint64_t *features) { struct uffdio_api uffdio_api; - uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK | UFFD_USER_MODE_ONLY); - if (uffd < 0) - err("userfaultfd syscall not available in this kernel"); + if (test_dev_userfaultfd) + uffd = __userfaultfd_open_dev(); + else { + uffd = syscall(__NR_userfaultfd, UFFD_FLAGS); + if (uffd < 0) + errexit(errno == ENOSYS ? KSFT_SKIP : 1, + "creating userfaultfd failed"); + } uffd_flags = fcntl(uffd, F_GETFD, NULL); uffdio_api.api = UFFD_API; @@ -1584,8 +1622,6 @@ unsigned long default_huge_page_size(void) static void set_test_type(const char *type) { - uint64_t features = UFFD_API_FEATURES; - if (!strcmp(type, "anon")) { test_type = TEST_ANON; uffd_test_ops = &anon_uffd_test_ops; @@ -1603,9 +1639,29 @@ static void set_test_type(const char *type) test_type = TEST_SHMEM; uffd_test_ops = &shmem_uffd_test_ops; test_uffdio_minor = true; - } else { - err("Unknown test type: %s", type); } +} + +static void parse_test_type_arg(const char *raw_type) +{ + char *buf = strdup(raw_type); + uint64_t features = UFFD_API_FEATURES; + + while (buf) { + const char *token = strsep(&buf, ":"); + + if (!test_type) + set_test_type(token); + else if (!strcmp(token, "dev")) + test_dev_userfaultfd = true; + else if (!strcmp(token, "syscall")) + test_dev_userfaultfd = false; + else + err("unrecognized test mod '%s'", token); + } + + if (!test_type) + err("failed to parse test type argument: '%s'", raw_type); if (test_type == TEST_HUGETLB) page_size = default_huge_page_size(); @@ -1653,7 +1709,7 @@ int main(int argc, char **argv) err("failed to arm SIGALRM"); alarm(ALARM_INTERVAL_SECS); - set_test_type(argv[1]); + parse_test_type_arg(argv[1]); nr_cpus = sysconf(_SC_NPROCESSORS_ONLN); nr_pages_per_cpu = atol(argv[2]) * 1024*1024 / page_size / From patchwork Wed Aug 17 21:47:27 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Axel Rasmussen X-Patchwork-Id: 598001 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E92F8C32772 for ; Wed, 17 Aug 2022 21:48:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242340AbiHQVsQ (ORCPT ); Wed, 17 Aug 2022 17:48:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49084 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242344AbiHQVrx (ORCPT ); Wed, 17 Aug 2022 17:47:53 -0400 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CC1FFAB42D for ; Wed, 17 Aug 2022 14:47:42 -0700 (PDT) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-3339532b6a8so94292747b3.1 for ; Wed, 17 Aug 2022 14:47:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:from:to:cc; bh=jMyZorWE6E22hV3NDdjIMErSuhZAX9bJlMRQ4lfwoQk=; b=kmf5TPe5TDihxUks/W6HE9xnDr7EKu9fX5X/rvsm0Ya68c0awpeJBVGH0nzmUKuYJ8 rE8tjpC3S/ed1CEBFsuxhHdeIjP56/BSB5drqGzrmIJXsqSJBNKymt6+K0z6vPXyWx3i 0tBuuyCs3uyC0x5ddlwxY2lKwN7wUGTezJaC4PMmelE9aNTy46ALdXCVGFD99eosR85J gV9LNz5Sca6lRKdL7jFoNuTCrfmAaL+Lx/NKNp5PuCO/vmUdvXCvKOciNBepvGWehlrZ 8nDB0O42+9TdWUts8kxMGVRvWkzlxt6ohWl6fGZIeamEURvXy40mRJONvGqeYX0aNkb3 QEog== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:x-gm-message-state:from:to:cc; bh=jMyZorWE6E22hV3NDdjIMErSuhZAX9bJlMRQ4lfwoQk=; b=qWex1Uff6hnUgMmo/HlxDdxBqiCuu2YJIMkhEUATL7m3T3SvERWFDKaT1az6+feXHZ DCQv0aILz/xSoQ9lAwR1qHeuF2L5v9MfTfDL6Xy3HfwOpk/8z9IEgRrb8PMXVTc5k/vi xwTyPGQo8EP+9rR34Kc/YQtAc3dmtYWeLH/Iza83BqHveyWkEXuR78zXSZwhtWbx4ph4 pC1yXfNEoaOLhc0IFXeaTz+jcGR4jQJ32Bkt76+vbqAzFJ2LcIRLye9xY9IfBbzs0JO1 5xC8cLYPeMGOg11a4LGlvngSXiMqZ4T7KRslCgbTg3b+Kqpy1UPzyhe31hnTQj5AE2Qc zTHg== X-Gm-Message-State: ACgBeo0AoUfTRogJQ/d7cd00yIleq9mOxUCd+liJAVDaVQ+jQEKyDZXg INWfSxOcx5USWqiSdCX3CRuIsWZYpKSKFo4zqAtU X-Google-Smtp-Source: AA6agR6aBMm/RqNkrbkKOZWC8FO8iRne3LMJ4YFzZ+fs5wAz7NO45vvRzpcivQOY2FdD/cwWt1QvlitSsbytGa9Pxd0v X-Received: from ajr0.svl.corp.google.com ([2620:15c:2d4:203:2f41:f176:4bac:b729]) (user=axelrasmussen job=sendgmr) by 2002:a81:1dd1:0:b0:335:dd05:372c with SMTP id d200-20020a811dd1000000b00335dd05372cmr159548ywd.342.1660772862105; Wed, 17 Aug 2022 14:47:42 -0700 (PDT) Date: Wed, 17 Aug 2022 14:47:27 -0700 In-Reply-To: <20220817214728.489904-1-axelrasmussen@google.com> Message-Id: <20220817214728.489904-5-axelrasmussen@google.com> Mime-Version: 1.0 References: <20220817214728.489904-1-axelrasmussen@google.com> X-Mailer: git-send-email 2.37.1.595.g718a3a8f04-goog Subject: [PATCH v6 4/5] userfaultfd: update documentation to describe /dev/userfaultfd From: Axel Rasmussen To: Alexander Viro , Andrew Morton , Dave Hansen , "Dmitry V . Levin" , Gleb Fotengauer-Malinovskiy , Hugh Dickins , Jan Kara , Jonathan Corbet , Mel Gorman , Mike Kravetz , Mike Rapoport , Nadav Amit , Peter Xu , Shuah Khan , Suren Baghdasaryan , Vlastimil Babka , zhangyi Cc: Axel Rasmussen , linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, linux-security-module@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org Explain the different ways to create a new userfaultfd, and how access control works for each way. Acked-by: Peter Xu Signed-off-by: Axel Rasmussen --- Documentation/admin-guide/mm/userfaultfd.rst | 41 ++++++++++++++++++-- Documentation/admin-guide/sysctl/vm.rst | 3 ++ 2 files changed, 41 insertions(+), 3 deletions(-) diff --git a/Documentation/admin-guide/mm/userfaultfd.rst b/Documentation/admin-guide/mm/userfaultfd.rst index 6528036093e1..83f31919ebb3 100644 --- a/Documentation/admin-guide/mm/userfaultfd.rst +++ b/Documentation/admin-guide/mm/userfaultfd.rst @@ -17,7 +17,10 @@ of the ``PROT_NONE+SIGSEGV`` trick. Design ====== -Userfaults are delivered and resolved through the ``userfaultfd`` syscall. +Userspace creates a new userfaultfd, initializes it, and registers one or more +regions of virtual memory with it. Then, any page faults which occur within the +region(s) result in a message being delivered to the userfaultfd, notifying +userspace of the fault. The ``userfaultfd`` (aside from registering and unregistering virtual memory ranges) provides two primary functionalities: @@ -34,12 +37,11 @@ The real advantage of userfaults if compared to regular virtual memory management of mremap/mprotect is that the userfaults in all their operations never involve heavyweight structures like vmas (in fact the ``userfaultfd`` runtime load never takes the mmap_lock for writing). - Vmas are not suitable for page- (or hugepage) granular fault tracking when dealing with virtual address spaces that could span Terabytes. Too many vmas would be needed for that. -The ``userfaultfd`` once opened by invoking the syscall, can also be +The ``userfaultfd``, once created, can also be passed using unix domain sockets to a manager process, so the same manager process could handle the userfaults of a multitude of different processes without them being aware about what is going on @@ -50,6 +52,39 @@ is a corner case that would currently return ``-EBUSY``). API === +Creating a userfaultfd +---------------------- + +There are two ways to create a new userfaultfd, each of which provide ways to +restrict access to this functionality (since historically userfaultfds which +handle kernel page faults have been a useful tool for exploiting the kernel). + +The first way, supported since userfaultfd was introduced, is the +userfaultfd(2) syscall. Access to this is controlled in several ways: + +- Any user can always create a userfaultfd which traps userspace page faults + only. Such a userfaultfd can be created using the userfaultfd(2) syscall + with the flag UFFD_USER_MODE_ONLY. + +- In order to also trap kernel page faults for the address space, either the + process needs the CAP_SYS_PTRACE capability, or the system must have + vm.unprivileged_userfaultfd set to 1. By default, vm.unprivileged_userfaultfd + is set to 0. + +The second way, added to the kernel more recently, is by opening +/dev/userfaultfd and issuing a USERFAULTFD_IOC_NEW ioctl to it. This method +yields equivalent userfaultfds to the userfaultfd(2) syscall. + +Unlike userfaultfd(2), access to /dev/userfaultfd is controlled via normal +filesystem permissions (user/group/mode), which gives fine grained access to +userfaultfd specifically, without also granting other unrelated privileges at +the same time (as e.g. granting CAP_SYS_PTRACE would do). Users who have access +to /dev/userfaultfd can always create userfaultfds that trap kernel page faults; +vm.unprivileged_userfaultfd is not considered. + +Initializing a userfaultfd +-------------------------- + When first opened the ``userfaultfd`` must be enabled invoking the ``UFFDIO_API`` ioctl specifying a ``uffdio_api.api`` value set to ``UFFD_API`` (or a later API version) which will specify the ``read/POLLIN`` protocol diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst index 9b833e439f09..988f6a4c8084 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -926,6 +926,9 @@ calls without any restrictions. The default value is 0. +Another way to control permissions for userfaultfd is to use +/dev/userfaultfd instead of userfaultfd(2). See +Documentation/admin-guide/mm/userfaultfd.rst. user_reserve_kbytes ===================