Message ID | 1515588482-15744-6-git-send-email-adhemerval.zanella@linaro.org |
---|---|
State | New |
Headers | show |
Series | Improve generic string routines | expand |
On 01/10/2018 04:47 AM, Adhemerval Zanella wrote: > + /* Align pointer to sizeof op_t. */ > + const uintptr_t s_int = (uintptr_t) str; > + const op_t *word_ptr = (const op_t*) (s_int & -sizeof (op_t)); I see this sort of code used in multiple places (often with different implementations), and suggest packaging it up into an inline function. Also, you might consider implementing it this way, which is a bit simpler (it's the method used in your generic strcmp): op_t * word_containing (char const *p) { return (op_t *) (p - (uintptr_t) p % sizeof (op_t)); } This generates the same code with gcc -O2, and minimizes the usage of pointers as integers. Similarly, in other code I suggest using char * instead of uintptr_t, as much as possible. This works just as well with ordinary addition, and will simplify GCC's job if we ever build with -fcheck-pointer-bounds. > + while (1) > { > - /* 64-bit version of the magic. */ > - /* Do the shift in two steps to avoid a warning if long has 32 bits. */ > - himagic = ((himagic << 16) << 16) | himagic; > - lomagic = ((lomagic << 16) << 16) | lomagic; > + if (has_zero (word)) > + break; > + word = *++word_ptr; > } This would be a bit simpler: while (! has_zero (word)) word = *++word_ptr;
On 11/01/2018 15:21, Paul Eggert wrote: > On 01/10/2018 04:47 AM, Adhemerval Zanella wrote: >> + /* Align pointer to sizeof op_t. */ >> + const uintptr_t s_int = (uintptr_t) str; >> + const op_t *word_ptr = (const op_t*) (s_int & -sizeof (op_t)); > > I see this sort of code used in multiple places (often with different implementations), and suggest packaging it up into an inline function. Also, you might consider implementing it this way, which is a bit simpler (it's the method used in your generic strcmp): > > op_t * > word_containing (char const *p) > { > return (op_t *) (p - (uintptr_t) p % sizeof (op_t)); > } > > This generates the same code with gcc -O2, and minimizes the usage of pointers as integers. Thanks, I have added this function suggestion to string-maskoff.h and used on the generic implementations. > > Similarly, in other code I suggest using char * instead of uintptr_t, as much as possible. This works just as well with ordinary addition, and will simplify GCC's job if we ever build with -fcheck-pointer-bounds. > >> + while (1) >> { >> - /* 64-bit version of the magic. */ >> - /* Do the shift in two steps to avoid a warning if long has 32 bits. */ >> - himagic = ((himagic << 16) << 16) | himagic; >> - lomagic = ((lomagic << 16) << 16) | lomagic; >> + if (has_zero (word)) >> + break; >> + word = *++word_ptr; >> } > > This would be a bit simpler: > > while (! has_zero (word)) > word = *++word_ptr; > Thanks, I have applied it locally as well.
diff --git a/string/strlen.c b/string/strlen.c index 8ce1318..6bd0ed9 100644 --- a/string/strlen.c +++ b/string/strlen.c @@ -20,6 +20,11 @@ #include <string.h> #include <stdlib.h> +#include <stdint.h> +#include <string-fza.h> +#include <string-fzb.h> +#include <string-fzi.h> +#include <string-maskoff.h> #undef strlen @@ -32,78 +37,20 @@ size_t STRLEN (const char *str) { - const char *char_ptr; - const unsigned long int *longword_ptr; - unsigned long int longword, himagic, lomagic; + /* Align pointer to sizeof op_t. */ + const uintptr_t s_int = (uintptr_t) str; + const op_t *word_ptr = (const op_t*) (s_int & -sizeof (op_t)); - /* Handle the first few characters by reading one character at a time. - Do this until CHAR_PTR is aligned on a longword boundary. */ - for (char_ptr = str; ((unsigned long int) char_ptr - & (sizeof (longword) - 1)) != 0; - ++char_ptr) - if (*char_ptr == '\0') - return char_ptr - str; + /* Read and MASK the first word. */ + op_t word = *word_ptr | create_mask (s_int); - /* All these elucidatory comments refer to 4-byte longwords, - but the theory applies equally well to 8-byte longwords. */ - - longword_ptr = (unsigned long int *) char_ptr; - - /* Bits 31, 24, 16, and 8 of this number are zero. Call these bits - the "holes." Note that there is a hole just to the left of - each byte, with an extra at the end: - - bits: 01111110 11111110 11111110 11111111 - bytes: AAAAAAAA BBBBBBBB CCCCCCCC DDDDDDDD - - The 1-bits make sure that carries propagate to the next 0-bit. - The 0-bits provide holes for carries to fall into. */ - himagic = 0x80808080L; - lomagic = 0x01010101L; - if (sizeof (longword) > 4) + while (1) { - /* 64-bit version of the magic. */ - /* Do the shift in two steps to avoid a warning if long has 32 bits. */ - himagic = ((himagic << 16) << 16) | himagic; - lomagic = ((lomagic << 16) << 16) | lomagic; + if (has_zero (word)) + break; + word = *++word_ptr; } - if (sizeof (longword) > 8) - abort (); - /* Instead of the traditional loop which tests each character, - we will test a longword at a time. The tricky part is testing - if *any of the four* bytes in the longword in question are zero. */ - for (;;) - { - longword = *longword_ptr++; - - if (((longword - lomagic) & ~longword & himagic) != 0) - { - /* Which of the bytes was the zero? If none of them were, it was - a misfire; continue the search. */ - - const char *cp = (const char *) (longword_ptr - 1); - - if (cp[0] == 0) - return cp - str; - if (cp[1] == 0) - return cp - str + 1; - if (cp[2] == 0) - return cp - str + 2; - if (cp[3] == 0) - return cp - str + 3; - if (sizeof (longword) > 4) - { - if (cp[4] == 0) - return cp - str + 4; - if (cp[5] == 0) - return cp - str + 5; - if (cp[6] == 0) - return cp - str + 6; - if (cp[7] == 0) - return cp - str + 7; - } - } - } + return ((const char *) word_ptr) + index_first_zero (word) - str; } libc_hidden_builtin_def (strlen)
From: Richard Henderson <rth@twiddle.net> New algorithm have the following key differences: - Reads first word unaligned and use string-maskoff functions to remove unwanted data. This strategy follow assemble optimized ones for powerpc, sparc, and SH. - Use of has_zero and index_first_zero parametrized functions. Checked on x86_64-linux-gnu, i686-linux-gnu, sparc64-linux-gnu, and sparcv9-linux-gnu by removing the arch-specific assembly implementation and disabling multi-arch (it covers both LE and BE for 64 and 32 bits). [BZ #5806] * string/strlen.c: Use them. --- string/strlen.c | 83 +++++++++++---------------------------------------------- 1 file changed, 15 insertions(+), 68 deletions(-) -- 2.7.4