Message ID | 1486414193-11241-1-git-send-email-adhemerval.zanella@linaro.org |
---|---|
State | Accepted |
Commit | 0edbf1230131dfeb03d843d2859e2104456fad80 |
Headers | show |
Ping. On 06/02/2017 18:49, Adhemerval Zanella wrote: > This is an update from my previous patch [1]. Then change from previous > version are: > > - Create inline functions for guard page address calculation and > segments protection setup; > - Fix an issue for downwards stack allocation. > > -- > > Current allocate_stack logic for create stacks is to first mmap all > the required memory with the desirable memory and then mprotect the > guard area with PROT_NONE if required. Although it works as expected, > it pessimizes the allocation because it requires the kernel to actually > increase commit charge (it counts against the available physical/swap > memory available for the system). > > The only issue is to actually check this change since side-effects are > really Linux specific and to actually account them it would require a > kernel specific tests to parse the system wide information. On the kernel > I checked /proc/self/statm does not show any meaningful difference for > vmm and/or rss before and after thread creation. I could only see > really meaningful information checking on system wide /proc/meminfo > between thread creation: MemFree, MemAvailable, and Committed_AS shows > large difference without the patch. I think trying to use these > kind of information on a testcase is fragile. > > The BZ#18988 reports shows that the commit pages are easily seen with > mlockall (MCL_FUTURE) (with lock all pages that become mapped in the > process) however a more straighfoward testcase shows that pthread_create > could be faster using this patch: > > -- > static const int inner_count = 256; > static const int outer_count = 128; > > static > void *thread1(void *arg) > { > return NULL; > } > > static > void *sleeper(void *arg) > { > pthread_t ts[inner_count]; > for (int i = 0; i < inner_count; i++) > pthread_create (&ts[i], &a, thread1, NULL); > for (int i = 0; i < inner_count; i++) > pthread_join (ts[i], NULL); > > return NULL; > } > > int main(void) > { > pthread_attr_init(&a); > pthread_attr_setguardsize(&a, 1<<20); > pthread_attr_setstacksize(&a, 1134592); > > pthread_t ts[outer_count]; > for (int i = 0; i < outer_count; i++) > pthread_create(&ts[i], &a, sleeper, NULL); > for (int i = 0; i < outer_count; i++) > pthread_join(ts[i], NULL); > assert(r == 0); > } > return 0; > } > > -- > > On x86_64 (4.4.0-45-generic, gcc 5.4.0) running the small benchtests > I see: > > $ time ./test > > real 0m3.647s > user 0m0.080s > sys 0m11.836s > > While with the patch I see: > > $ time ./test > > real 0m0.696s > user 0m0.040s > sys 0m1.152s > > So I added a pthread_create benchtest (thread_create) which check > the thread creation latency. As for the simple benchtests, I saw > improvements in thread creation on all architectures I tested the > change. > > Checked on x86_64-linux-gnu, i686-linux-gnu, aarch64-linux-gnu, > arm-linux-gnueabihf, and powerpc64le-linux-gnu. > > [BZ #18988] > * benchtests/thread_create-inputs: New file. > * benchtests/thread_create-source.c: Likewise. > * support/xpthread_attr_setguardsize.c: Likewise. > * support/Makefile (libsupport-routines): Add > xpthread_attr_setguardsize object. > * support/xthread.h: Add xpthread_attr_setguardsize prototype. > * benchtests/Makefile (bench-pthread): Add thread_create. > * nptl/allocatestack.c (allocate_stack): Call mmap with PROT_NONE and > then mprotect the required area. > > [1] https://sourceware.org/ml/libc-alpha/2017-02/msg00033.html > > --- > ChangeLog | 13 +++++++ > benchtests/Makefile | 2 +- > benchtests/thread_create-inputs | 14 ++++++++ > benchtests/thread_create-source.c | 58 +++++++++++++++++++++++++++++++ > nptl/allocatestack.c | 66 +++++++++++++++++++++++++++++++----- > support/Makefile | 1 + > support/xpthread_attr_setguardsize.c | 26 ++++++++++++++ > support/xthread.h | 2 ++ > 8 files changed, 173 insertions(+), 9 deletions(-) > create mode 100644 benchtests/thread_create-inputs > create mode 100644 benchtests/thread_create-source.c > create mode 100644 support/xpthread_attr_setguardsize.c > > diff --git a/ChangeLog b/ChangeLog > index 710f9b4..8a81549 100644 > --- a/ChangeLog > +++ b/ChangeLog > @@ -1,5 +1,18 @@ > 2016-02-06 Adhemerval Zanella <adhemerval.zanella@linaro.org> > > + [BZ #18988] > + * benchtests/thread_create-inputs: New file. > + * benchtests/thread_create-source.c: Likewise. > + * support/xpthread_attr_setguardsize.c: Likewise. > + * support/Makefile (libsupport-routines): Add > + xpthread_attr_setguardsize object. > + * support/xthread.h: Add xpthread_attr_setguardsize prototype. > + * benchtests/Makefile (bench-pthread): Add thread_create. > + * nptl/allocatestack.c (allocate_stack): Call mmap with PROT_NONE and > + then mprotect the required area. > + (guard_position): New function. > + (setup_stack_prot): Likewise. > + > * nptl/allocatestack.c [COLORING_INCREMENT] (nptl_ncreated): Remove. > (allocate_stack): Remove COLORING_INCREMENT usage. > * nptl/stack-aliasing.h (COLORING_INCREMENT). Likewise. > diff --git a/benchtests/Makefile b/benchtests/Makefile > index 81edf8a..6535373 100644 > --- a/benchtests/Makefile > +++ b/benchtests/Makefile > @@ -25,7 +25,7 @@ bench-math := acos acosh asin asinh atan atanh cos cosh exp exp2 log log2 \ > modf pow rint sin sincos sinh sqrt tan tanh fmin fmax fminf \ > fmaxf > > -bench-pthread := pthread_once > +bench-pthread := pthread_once thread_create > > bench-string := ffs ffsll > > diff --git a/benchtests/thread_create-inputs b/benchtests/thread_create-inputs > new file mode 100644 > index 0000000..e3ca03b > --- /dev/null > +++ b/benchtests/thread_create-inputs > @@ -0,0 +1,14 @@ > +## args: int:size_t:size_t > +## init: thread_create_init > +## includes: pthread.h > +## include-sources: thread_create-source.c > + > +## name: stack=1024,guard=1 > +32, 1024, 1 > +## name: stack=1024,guard=2 > +32, 1024, 2 > + > +## name: stack=2048,guard=1 > +32, 2048, 1 > +## name: stack=2048,guard=2 > +32, 2048, 2 > diff --git a/benchtests/thread_create-source.c b/benchtests/thread_create-source.c > new file mode 100644 > index 0000000..74e7777 > --- /dev/null > +++ b/benchtests/thread_create-source.c > @@ -0,0 +1,58 @@ > +/* Measure pthread_create thread creation with different stack > + and guard sizes. > + > + Copyright (C) 2017 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + <http://www.gnu.org/licenses/>. */ > + > +#include <stdio.h> > +#include <unistd.h> > +#include <support/xthread.h> > + > +static size_t pgsize; > + > +static void > +thread_create_init (void) > +{ > + pgsize = sysconf (_SC_PAGESIZE); > +} > + > +static void * > +thread_dummy (void *arg) > +{ > + return NULL; > +} > + > +static void > +thread_create (int nthreads, size_t stacksize, size_t guardsize) > +{ > + pthread_attr_t attr; > + xpthread_attr_init (&attr); > + > + stacksize = stacksize * pgsize; > + guardsize = guardsize * pgsize; > + > + xpthread_attr_setstacksize (&attr, stacksize); > + xpthread_attr_setguardsize (&attr, guardsize); > + > + pthread_t ts[nthreads]; > + > + for (int i = 0; i < nthreads; i++) > + ts[i] = xpthread_create (&attr, thread_dummy, NULL); > + > + for (int i = 0; i < nthreads; i++) > + xpthread_join (ts[i]); > +} > diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c > index e5c5f79..8364406 100644 > --- a/nptl/allocatestack.c > +++ b/nptl/allocatestack.c > @@ -334,6 +334,43 @@ change_stack_perm (struct pthread *pd > return 0; > } > > +/* Return the guard page position on allocated stack. */ > +static inline char * > +__attribute ((always_inline)) > +guard_position (void *mem, size_t size, size_t guardsize, struct pthread *pd, > + size_t pagesize_m1) > +{ > +#ifdef NEED_SEPARATE_REGISTER_STACK > + return mem + (((size - guardsize) / 2) & ~pagesize_m1); > +#elif _STACK_GROWS_DOWN > + return mem; > +#elif _STACK_GROWS_UP > + return (char *) (((uintptr_t) pd - guardsize) & ~pagesize_m1); > +#endif > +} > + > +/* Based on stack allocated with PROT_NONE, setup the required portions with > + 'prot' flags based on the guard page position. */ > +static inline int > +setup_stack_prot (char *mem, size_t size, char *guard, size_t guardsize, > + const int prot) > +{ > + char *guardend = guard + guardsize; > +#if _STACK_GROWS_DOWN > + /* As defined at guard_position, for architectures with downward stack > + the guard page is always at start of the allocated area. */ > + if (mprotect (guardend, size - guardsize, prot) != 0) > + return errno; > +#else > + size_t mprots1 = (uintptr_t) guard - (uintptr_t) mem; > + if (mprotect (mem, mprots1, prot) != 0) > + return errno; > + size_t mprots2 = ((uintptr_t) mem + size) - (uintptr_t) guardend; > + if (mprotect (guardend, mprots2, prot) != 0) > + return errno; > +#endif > + return 0; > +} > > /* Returns a usable stack for a new thread either by allocating a > new stack or reusing a cached stack of sufficient size. > @@ -490,7 +527,10 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp, > size += pagesize_m1 + 1; > #endif > > - mem = mmap (NULL, size, prot, > + /* If a guard page is required, avoid committing memory by first > + allocate with PROT_NONE and then reserve with required permission > + excluding the guard page. */ > + mem = mmap (NULL, size, (guardsize == 0) ? prot : PROT_NONE, > MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0); > > if (__glibc_unlikely (mem == MAP_FAILED)) > @@ -510,9 +550,24 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp, > - TLS_PRE_TCB_SIZE); > #endif > > + /* Now mprotect the required region excluding the guard area. */ > + if (__glibc_likely (guardsize > 0)) > + { > + char *guard = guard_position (mem, size, guardsize, pd, > + pagesize_m1); > + if (setup_stack_prot (mem, size, guard, guardsize, prot) != 0) > + { > + munmap (mem, size); > + return errno; > + } > + } > + > /* Remember the stack-related values. */ > pd->stackblock = mem; > pd->stackblock_size = size; > + /* Update guardsize for newly allocated guardsize to avoid > + an mprotect in guard resize below. */ > + pd->guardsize = guardsize; > > /* We allocated the first block thread-specific data array. > This address will not change for the lifetime of this > @@ -593,13 +648,8 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp, > /* Create or resize the guard area if necessary. */ > if (__glibc_unlikely (guardsize > pd->guardsize)) > { > -#ifdef NEED_SEPARATE_REGISTER_STACK > - char *guard = mem + (((size - guardsize) / 2) & ~pagesize_m1); > -#elif _STACK_GROWS_DOWN > - char *guard = mem; > -#elif _STACK_GROWS_UP > - char *guard = (char *) (((uintptr_t) pd - guardsize) & ~pagesize_m1); > -#endif > + char *guard = guard_position (mem, size, guardsize, pd, > + pagesize_m1); > if (mprotect (guard, guardsize, PROT_NONE) != 0) > { > mprot_error: > diff --git a/support/Makefile b/support/Makefile > index 2ace559..c0a443f 100644 > --- a/support/Makefile > +++ b/support/Makefile > @@ -68,6 +68,7 @@ libsupport-routines = \ > xpthread_attr_init \ > xpthread_attr_setdetachstate \ > xpthread_attr_setstacksize \ > + xpthread_attr_setguardsize \ > xpthread_barrier_destroy \ > xpthread_barrier_init \ > xpthread_barrier_wait \ > diff --git a/support/xpthread_attr_setguardsize.c b/support/xpthread_attr_setguardsize.c > new file mode 100644 > index 0000000..35fed5d > --- /dev/null > +++ b/support/xpthread_attr_setguardsize.c > @@ -0,0 +1,26 @@ > +/* pthread_attr_setguardsize with error checking. > + Copyright (C) 2017 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + <http://www.gnu.org/licenses/>. */ > + > +#include <support/xthread.h> > + > +void > +xpthread_attr_setguardsize (pthread_attr_t *attr, size_t guardsize) > +{ > + xpthread_check_return ("pthread_attr_setguardize", > + pthread_attr_setguardsize (attr, guardsize)); > +} > diff --git a/support/xthread.h b/support/xthread.h > index 6dd7e70..3552a73 100644 > --- a/support/xthread.h > +++ b/support/xthread.h > @@ -67,6 +67,8 @@ void xpthread_attr_setdetachstate (pthread_attr_t *attr, > int detachstate); > void xpthread_attr_setstacksize (pthread_attr_t *attr, > size_t stacksize); > +void xpthread_attr_setguardsize (pthread_attr_t *attr, > + size_t guardsize); > > /* This function returns non-zero if pthread_barrier_wait returned > PTHREAD_BARRIER_SERIAL_THREAD. */ >
Ping x2. On 14/02/2017 17:36, Adhemerval Zanella wrote: > Ping. > > On 06/02/2017 18:49, Adhemerval Zanella wrote: >> This is an update from my previous patch [1]. Then change from previous >> version are: >> >> - Create inline functions for guard page address calculation and >> segments protection setup; >> - Fix an issue for downwards stack allocation. >> >> -- >> >> Current allocate_stack logic for create stacks is to first mmap all >> the required memory with the desirable memory and then mprotect the >> guard area with PROT_NONE if required. Although it works as expected, >> it pessimizes the allocation because it requires the kernel to actually >> increase commit charge (it counts against the available physical/swap >> memory available for the system). >> >> The only issue is to actually check this change since side-effects are >> really Linux specific and to actually account them it would require a >> kernel specific tests to parse the system wide information. On the kernel >> I checked /proc/self/statm does not show any meaningful difference for >> vmm and/or rss before and after thread creation. I could only see >> really meaningful information checking on system wide /proc/meminfo >> between thread creation: MemFree, MemAvailable, and Committed_AS shows >> large difference without the patch. I think trying to use these >> kind of information on a testcase is fragile. >> >> The BZ#18988 reports shows that the commit pages are easily seen with >> mlockall (MCL_FUTURE) (with lock all pages that become mapped in the >> process) however a more straighfoward testcase shows that pthread_create >> could be faster using this patch: >> >> -- >> static const int inner_count = 256; >> static const int outer_count = 128; >> >> static >> void *thread1(void *arg) >> { >> return NULL; >> } >> >> static >> void *sleeper(void *arg) >> { >> pthread_t ts[inner_count]; >> for (int i = 0; i < inner_count; i++) >> pthread_create (&ts[i], &a, thread1, NULL); >> for (int i = 0; i < inner_count; i++) >> pthread_join (ts[i], NULL); >> >> return NULL; >> } >> >> int main(void) >> { >> pthread_attr_init(&a); >> pthread_attr_setguardsize(&a, 1<<20); >> pthread_attr_setstacksize(&a, 1134592); >> >> pthread_t ts[outer_count]; >> for (int i = 0; i < outer_count; i++) >> pthread_create(&ts[i], &a, sleeper, NULL); >> for (int i = 0; i < outer_count; i++) >> pthread_join(ts[i], NULL); >> assert(r == 0); >> } >> return 0; >> } >> >> -- >> >> On x86_64 (4.4.0-45-generic, gcc 5.4.0) running the small benchtests >> I see: >> >> $ time ./test >> >> real 0m3.647s >> user 0m0.080s >> sys 0m11.836s >> >> While with the patch I see: >> >> $ time ./test >> >> real 0m0.696s >> user 0m0.040s >> sys 0m1.152s >> >> So I added a pthread_create benchtest (thread_create) which check >> the thread creation latency. As for the simple benchtests, I saw >> improvements in thread creation on all architectures I tested the >> change. >> >> Checked on x86_64-linux-gnu, i686-linux-gnu, aarch64-linux-gnu, >> arm-linux-gnueabihf, and powerpc64le-linux-gnu. >> >> [BZ #18988] >> * benchtests/thread_create-inputs: New file. >> * benchtests/thread_create-source.c: Likewise. >> * support/xpthread_attr_setguardsize.c: Likewise. >> * support/Makefile (libsupport-routines): Add >> xpthread_attr_setguardsize object. >> * support/xthread.h: Add xpthread_attr_setguardsize prototype. >> * benchtests/Makefile (bench-pthread): Add thread_create. >> * nptl/allocatestack.c (allocate_stack): Call mmap with PROT_NONE and >> then mprotect the required area. >> >> [1] https://sourceware.org/ml/libc-alpha/2017-02/msg00033.html >> >> --- >> ChangeLog | 13 +++++++ >> benchtests/Makefile | 2 +- >> benchtests/thread_create-inputs | 14 ++++++++ >> benchtests/thread_create-source.c | 58 +++++++++++++++++++++++++++++++ >> nptl/allocatestack.c | 66 +++++++++++++++++++++++++++++++----- >> support/Makefile | 1 + >> support/xpthread_attr_setguardsize.c | 26 ++++++++++++++ >> support/xthread.h | 2 ++ >> 8 files changed, 173 insertions(+), 9 deletions(-) >> create mode 100644 benchtests/thread_create-inputs >> create mode 100644 benchtests/thread_create-source.c >> create mode 100644 support/xpthread_attr_setguardsize.c >> >> diff --git a/ChangeLog b/ChangeLog >> index 710f9b4..8a81549 100644 >> --- a/ChangeLog >> +++ b/ChangeLog >> @@ -1,5 +1,18 @@ >> 2016-02-06 Adhemerval Zanella <adhemerval.zanella@linaro.org> >> >> + [BZ #18988] >> + * benchtests/thread_create-inputs: New file. >> + * benchtests/thread_create-source.c: Likewise. >> + * support/xpthread_attr_setguardsize.c: Likewise. >> + * support/Makefile (libsupport-routines): Add >> + xpthread_attr_setguardsize object. >> + * support/xthread.h: Add xpthread_attr_setguardsize prototype. >> + * benchtests/Makefile (bench-pthread): Add thread_create. >> + * nptl/allocatestack.c (allocate_stack): Call mmap with PROT_NONE and >> + then mprotect the required area. >> + (guard_position): New function. >> + (setup_stack_prot): Likewise. >> + >> * nptl/allocatestack.c [COLORING_INCREMENT] (nptl_ncreated): Remove. >> (allocate_stack): Remove COLORING_INCREMENT usage. >> * nptl/stack-aliasing.h (COLORING_INCREMENT). Likewise. >> diff --git a/benchtests/Makefile b/benchtests/Makefile >> index 81edf8a..6535373 100644 >> --- a/benchtests/Makefile >> +++ b/benchtests/Makefile >> @@ -25,7 +25,7 @@ bench-math := acos acosh asin asinh atan atanh cos cosh exp exp2 log log2 \ >> modf pow rint sin sincos sinh sqrt tan tanh fmin fmax fminf \ >> fmaxf >> >> -bench-pthread := pthread_once >> +bench-pthread := pthread_once thread_create >> >> bench-string := ffs ffsll >> >> diff --git a/benchtests/thread_create-inputs b/benchtests/thread_create-inputs >> new file mode 100644 >> index 0000000..e3ca03b >> --- /dev/null >> +++ b/benchtests/thread_create-inputs >> @@ -0,0 +1,14 @@ >> +## args: int:size_t:size_t >> +## init: thread_create_init >> +## includes: pthread.h >> +## include-sources: thread_create-source.c >> + >> +## name: stack=1024,guard=1 >> +32, 1024, 1 >> +## name: stack=1024,guard=2 >> +32, 1024, 2 >> + >> +## name: stack=2048,guard=1 >> +32, 2048, 1 >> +## name: stack=2048,guard=2 >> +32, 2048, 2 >> diff --git a/benchtests/thread_create-source.c b/benchtests/thread_create-source.c >> new file mode 100644 >> index 0000000..74e7777 >> --- /dev/null >> +++ b/benchtests/thread_create-source.c >> @@ -0,0 +1,58 @@ >> +/* Measure pthread_create thread creation with different stack >> + and guard sizes. >> + >> + Copyright (C) 2017 Free Software Foundation, Inc. >> + This file is part of the GNU C Library. >> + >> + The GNU C Library is free software; you can redistribute it and/or >> + modify it under the terms of the GNU Lesser General Public >> + License as published by the Free Software Foundation; either >> + version 2.1 of the License, or (at your option) any later version. >> + >> + The GNU C Library is distributed in the hope that it will be useful, >> + but WITHOUT ANY WARRANTY; without even the implied warranty of >> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU >> + Lesser General Public License for more details. >> + >> + You should have received a copy of the GNU Lesser General Public >> + License along with the GNU C Library; if not, see >> + <http://www.gnu.org/licenses/>. */ >> + >> +#include <stdio.h> >> +#include <unistd.h> >> +#include <support/xthread.h> >> + >> +static size_t pgsize; >> + >> +static void >> +thread_create_init (void) >> +{ >> + pgsize = sysconf (_SC_PAGESIZE); >> +} >> + >> +static void * >> +thread_dummy (void *arg) >> +{ >> + return NULL; >> +} >> + >> +static void >> +thread_create (int nthreads, size_t stacksize, size_t guardsize) >> +{ >> + pthread_attr_t attr; >> + xpthread_attr_init (&attr); >> + >> + stacksize = stacksize * pgsize; >> + guardsize = guardsize * pgsize; >> + >> + xpthread_attr_setstacksize (&attr, stacksize); >> + xpthread_attr_setguardsize (&attr, guardsize); >> + >> + pthread_t ts[nthreads]; >> + >> + for (int i = 0; i < nthreads; i++) >> + ts[i] = xpthread_create (&attr, thread_dummy, NULL); >> + >> + for (int i = 0; i < nthreads; i++) >> + xpthread_join (ts[i]); >> +} >> diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c >> index e5c5f79..8364406 100644 >> --- a/nptl/allocatestack.c >> +++ b/nptl/allocatestack.c >> @@ -334,6 +334,43 @@ change_stack_perm (struct pthread *pd >> return 0; >> } >> >> +/* Return the guard page position on allocated stack. */ >> +static inline char * >> +__attribute ((always_inline)) >> +guard_position (void *mem, size_t size, size_t guardsize, struct pthread *pd, >> + size_t pagesize_m1) >> +{ >> +#ifdef NEED_SEPARATE_REGISTER_STACK >> + return mem + (((size - guardsize) / 2) & ~pagesize_m1); >> +#elif _STACK_GROWS_DOWN >> + return mem; >> +#elif _STACK_GROWS_UP >> + return (char *) (((uintptr_t) pd - guardsize) & ~pagesize_m1); >> +#endif >> +} >> + >> +/* Based on stack allocated with PROT_NONE, setup the required portions with >> + 'prot' flags based on the guard page position. */ >> +static inline int >> +setup_stack_prot (char *mem, size_t size, char *guard, size_t guardsize, >> + const int prot) >> +{ >> + char *guardend = guard + guardsize; >> +#if _STACK_GROWS_DOWN >> + /* As defined at guard_position, for architectures with downward stack >> + the guard page is always at start of the allocated area. */ >> + if (mprotect (guardend, size - guardsize, prot) != 0) >> + return errno; >> +#else >> + size_t mprots1 = (uintptr_t) guard - (uintptr_t) mem; >> + if (mprotect (mem, mprots1, prot) != 0) >> + return errno; >> + size_t mprots2 = ((uintptr_t) mem + size) - (uintptr_t) guardend; >> + if (mprotect (guardend, mprots2, prot) != 0) >> + return errno; >> +#endif >> + return 0; >> +} >> >> /* Returns a usable stack for a new thread either by allocating a >> new stack or reusing a cached stack of sufficient size. >> @@ -490,7 +527,10 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp, >> size += pagesize_m1 + 1; >> #endif >> >> - mem = mmap (NULL, size, prot, >> + /* If a guard page is required, avoid committing memory by first >> + allocate with PROT_NONE and then reserve with required permission >> + excluding the guard page. */ >> + mem = mmap (NULL, size, (guardsize == 0) ? prot : PROT_NONE, >> MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0); >> >> if (__glibc_unlikely (mem == MAP_FAILED)) >> @@ -510,9 +550,24 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp, >> - TLS_PRE_TCB_SIZE); >> #endif >> >> + /* Now mprotect the required region excluding the guard area. */ >> + if (__glibc_likely (guardsize > 0)) >> + { >> + char *guard = guard_position (mem, size, guardsize, pd, >> + pagesize_m1); >> + if (setup_stack_prot (mem, size, guard, guardsize, prot) != 0) >> + { >> + munmap (mem, size); >> + return errno; >> + } >> + } >> + >> /* Remember the stack-related values. */ >> pd->stackblock = mem; >> pd->stackblock_size = size; >> + /* Update guardsize for newly allocated guardsize to avoid >> + an mprotect in guard resize below. */ >> + pd->guardsize = guardsize; >> >> /* We allocated the first block thread-specific data array. >> This address will not change for the lifetime of this >> @@ -593,13 +648,8 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp, >> /* Create or resize the guard area if necessary. */ >> if (__glibc_unlikely (guardsize > pd->guardsize)) >> { >> -#ifdef NEED_SEPARATE_REGISTER_STACK >> - char *guard = mem + (((size - guardsize) / 2) & ~pagesize_m1); >> -#elif _STACK_GROWS_DOWN >> - char *guard = mem; >> -#elif _STACK_GROWS_UP >> - char *guard = (char *) (((uintptr_t) pd - guardsize) & ~pagesize_m1); >> -#endif >> + char *guard = guard_position (mem, size, guardsize, pd, >> + pagesize_m1); >> if (mprotect (guard, guardsize, PROT_NONE) != 0) >> { >> mprot_error: >> diff --git a/support/Makefile b/support/Makefile >> index 2ace559..c0a443f 100644 >> --- a/support/Makefile >> +++ b/support/Makefile >> @@ -68,6 +68,7 @@ libsupport-routines = \ >> xpthread_attr_init \ >> xpthread_attr_setdetachstate \ >> xpthread_attr_setstacksize \ >> + xpthread_attr_setguardsize \ >> xpthread_barrier_destroy \ >> xpthread_barrier_init \ >> xpthread_barrier_wait \ >> diff --git a/support/xpthread_attr_setguardsize.c b/support/xpthread_attr_setguardsize.c >> new file mode 100644 >> index 0000000..35fed5d >> --- /dev/null >> +++ b/support/xpthread_attr_setguardsize.c >> @@ -0,0 +1,26 @@ >> +/* pthread_attr_setguardsize with error checking. >> + Copyright (C) 2017 Free Software Foundation, Inc. >> + This file is part of the GNU C Library. >> + >> + The GNU C Library is free software; you can redistribute it and/or >> + modify it under the terms of the GNU Lesser General Public >> + License as published by the Free Software Foundation; either >> + version 2.1 of the License, or (at your option) any later version. >> + >> + The GNU C Library is distributed in the hope that it will be useful, >> + but WITHOUT ANY WARRANTY; without even the implied warranty of >> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU >> + Lesser General Public License for more details. >> + >> + You should have received a copy of the GNU Lesser General Public >> + License along with the GNU C Library; if not, see >> + <http://www.gnu.org/licenses/>. */ >> + >> +#include <support/xthread.h> >> + >> +void >> +xpthread_attr_setguardsize (pthread_attr_t *attr, size_t guardsize) >> +{ >> + xpthread_check_return ("pthread_attr_setguardize", >> + pthread_attr_setguardsize (attr, guardsize)); >> +} >> diff --git a/support/xthread.h b/support/xthread.h >> index 6dd7e70..3552a73 100644 >> --- a/support/xthread.h >> +++ b/support/xthread.h >> @@ -67,6 +67,8 @@ void xpthread_attr_setdetachstate (pthread_attr_t *attr, >> int detachstate); >> void xpthread_attr_setstacksize (pthread_attr_t *attr, >> size_t stacksize); >> +void xpthread_attr_setguardsize (pthread_attr_t *attr, >> + size_t guardsize); >> >> /* This function returns non-zero if pthread_barrier_wait returned >> PTHREAD_BARRIER_SERIAL_THREAD. */ >>
On 02/06/2017 09:49 PM, Adhemerval Zanella wrote: > [BZ #18988] > * benchtests/thread_create-inputs: New file. > * benchtests/thread_create-source.c: Likewise. > * support/xpthread_attr_setguardsize.c: Likewise. > * support/Makefile (libsupport-routines): Add > xpthread_attr_setguardsize object. > * support/xthread.h: Add xpthread_attr_setguardsize prototype. > * benchtests/Makefile (bench-pthread): Add thread_create. > * nptl/allocatestack.c (allocate_stack): Call mmap with PROT_NONE and > then mprotect the required area. Looks good to me. Thanks, Florian
diff --git a/ChangeLog b/ChangeLog index 710f9b4..8a81549 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,5 +1,18 @@ 2016-02-06 Adhemerval Zanella <adhemerval.zanella@linaro.org> + [BZ #18988] + * benchtests/thread_create-inputs: New file. + * benchtests/thread_create-source.c: Likewise. + * support/xpthread_attr_setguardsize.c: Likewise. + * support/Makefile (libsupport-routines): Add + xpthread_attr_setguardsize object. + * support/xthread.h: Add xpthread_attr_setguardsize prototype. + * benchtests/Makefile (bench-pthread): Add thread_create. + * nptl/allocatestack.c (allocate_stack): Call mmap with PROT_NONE and + then mprotect the required area. + (guard_position): New function. + (setup_stack_prot): Likewise. + * nptl/allocatestack.c [COLORING_INCREMENT] (nptl_ncreated): Remove. (allocate_stack): Remove COLORING_INCREMENT usage. * nptl/stack-aliasing.h (COLORING_INCREMENT). Likewise. diff --git a/benchtests/Makefile b/benchtests/Makefile index 81edf8a..6535373 100644 --- a/benchtests/Makefile +++ b/benchtests/Makefile @@ -25,7 +25,7 @@ bench-math := acos acosh asin asinh atan atanh cos cosh exp exp2 log log2 \ modf pow rint sin sincos sinh sqrt tan tanh fmin fmax fminf \ fmaxf -bench-pthread := pthread_once +bench-pthread := pthread_once thread_create bench-string := ffs ffsll diff --git a/benchtests/thread_create-inputs b/benchtests/thread_create-inputs new file mode 100644 index 0000000..e3ca03b --- /dev/null +++ b/benchtests/thread_create-inputs @@ -0,0 +1,14 @@ +## args: int:size_t:size_t +## init: thread_create_init +## includes: pthread.h +## include-sources: thread_create-source.c + +## name: stack=1024,guard=1 +32, 1024, 1 +## name: stack=1024,guard=2 +32, 1024, 2 + +## name: stack=2048,guard=1 +32, 2048, 1 +## name: stack=2048,guard=2 +32, 2048, 2 diff --git a/benchtests/thread_create-source.c b/benchtests/thread_create-source.c new file mode 100644 index 0000000..74e7777 --- /dev/null +++ b/benchtests/thread_create-source.c @@ -0,0 +1,58 @@ +/* Measure pthread_create thread creation with different stack + and guard sizes. + + Copyright (C) 2017 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#include <stdio.h> +#include <unistd.h> +#include <support/xthread.h> + +static size_t pgsize; + +static void +thread_create_init (void) +{ + pgsize = sysconf (_SC_PAGESIZE); +} + +static void * +thread_dummy (void *arg) +{ + return NULL; +} + +static void +thread_create (int nthreads, size_t stacksize, size_t guardsize) +{ + pthread_attr_t attr; + xpthread_attr_init (&attr); + + stacksize = stacksize * pgsize; + guardsize = guardsize * pgsize; + + xpthread_attr_setstacksize (&attr, stacksize); + xpthread_attr_setguardsize (&attr, guardsize); + + pthread_t ts[nthreads]; + + for (int i = 0; i < nthreads; i++) + ts[i] = xpthread_create (&attr, thread_dummy, NULL); + + for (int i = 0; i < nthreads; i++) + xpthread_join (ts[i]); +} diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c index e5c5f79..8364406 100644 --- a/nptl/allocatestack.c +++ b/nptl/allocatestack.c @@ -334,6 +334,43 @@ change_stack_perm (struct pthread *pd return 0; } +/* Return the guard page position on allocated stack. */ +static inline char * +__attribute ((always_inline)) +guard_position (void *mem, size_t size, size_t guardsize, struct pthread *pd, + size_t pagesize_m1) +{ +#ifdef NEED_SEPARATE_REGISTER_STACK + return mem + (((size - guardsize) / 2) & ~pagesize_m1); +#elif _STACK_GROWS_DOWN + return mem; +#elif _STACK_GROWS_UP + return (char *) (((uintptr_t) pd - guardsize) & ~pagesize_m1); +#endif +} + +/* Based on stack allocated with PROT_NONE, setup the required portions with + 'prot' flags based on the guard page position. */ +static inline int +setup_stack_prot (char *mem, size_t size, char *guard, size_t guardsize, + const int prot) +{ + char *guardend = guard + guardsize; +#if _STACK_GROWS_DOWN + /* As defined at guard_position, for architectures with downward stack + the guard page is always at start of the allocated area. */ + if (mprotect (guardend, size - guardsize, prot) != 0) + return errno; +#else + size_t mprots1 = (uintptr_t) guard - (uintptr_t) mem; + if (mprotect (mem, mprots1, prot) != 0) + return errno; + size_t mprots2 = ((uintptr_t) mem + size) - (uintptr_t) guardend; + if (mprotect (guardend, mprots2, prot) != 0) + return errno; +#endif + return 0; +} /* Returns a usable stack for a new thread either by allocating a new stack or reusing a cached stack of sufficient size. @@ -490,7 +527,10 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp, size += pagesize_m1 + 1; #endif - mem = mmap (NULL, size, prot, + /* If a guard page is required, avoid committing memory by first + allocate with PROT_NONE and then reserve with required permission + excluding the guard page. */ + mem = mmap (NULL, size, (guardsize == 0) ? prot : PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0); if (__glibc_unlikely (mem == MAP_FAILED)) @@ -510,9 +550,24 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp, - TLS_PRE_TCB_SIZE); #endif + /* Now mprotect the required region excluding the guard area. */ + if (__glibc_likely (guardsize > 0)) + { + char *guard = guard_position (mem, size, guardsize, pd, + pagesize_m1); + if (setup_stack_prot (mem, size, guard, guardsize, prot) != 0) + { + munmap (mem, size); + return errno; + } + } + /* Remember the stack-related values. */ pd->stackblock = mem; pd->stackblock_size = size; + /* Update guardsize for newly allocated guardsize to avoid + an mprotect in guard resize below. */ + pd->guardsize = guardsize; /* We allocated the first block thread-specific data array. This address will not change for the lifetime of this @@ -593,13 +648,8 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp, /* Create or resize the guard area if necessary. */ if (__glibc_unlikely (guardsize > pd->guardsize)) { -#ifdef NEED_SEPARATE_REGISTER_STACK - char *guard = mem + (((size - guardsize) / 2) & ~pagesize_m1); -#elif _STACK_GROWS_DOWN - char *guard = mem; -#elif _STACK_GROWS_UP - char *guard = (char *) (((uintptr_t) pd - guardsize) & ~pagesize_m1); -#endif + char *guard = guard_position (mem, size, guardsize, pd, + pagesize_m1); if (mprotect (guard, guardsize, PROT_NONE) != 0) { mprot_error: diff --git a/support/Makefile b/support/Makefile index 2ace559..c0a443f 100644 --- a/support/Makefile +++ b/support/Makefile @@ -68,6 +68,7 @@ libsupport-routines = \ xpthread_attr_init \ xpthread_attr_setdetachstate \ xpthread_attr_setstacksize \ + xpthread_attr_setguardsize \ xpthread_barrier_destroy \ xpthread_barrier_init \ xpthread_barrier_wait \ diff --git a/support/xpthread_attr_setguardsize.c b/support/xpthread_attr_setguardsize.c new file mode 100644 index 0000000..35fed5d --- /dev/null +++ b/support/xpthread_attr_setguardsize.c @@ -0,0 +1,26 @@ +/* pthread_attr_setguardsize with error checking. + Copyright (C) 2017 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#include <support/xthread.h> + +void +xpthread_attr_setguardsize (pthread_attr_t *attr, size_t guardsize) +{ + xpthread_check_return ("pthread_attr_setguardize", + pthread_attr_setguardsize (attr, guardsize)); +} diff --git a/support/xthread.h b/support/xthread.h index 6dd7e70..3552a73 100644 --- a/support/xthread.h +++ b/support/xthread.h @@ -67,6 +67,8 @@ void xpthread_attr_setdetachstate (pthread_attr_t *attr, int detachstate); void xpthread_attr_setstacksize (pthread_attr_t *attr, size_t stacksize); +void xpthread_attr_setguardsize (pthread_attr_t *attr, + size_t guardsize); /* This function returns non-zero if pthread_barrier_wait returned PTHREAD_BARRIER_SERIAL_THREAD. */