diff mbox series

[08/11] locale: Fix UB in elem_hash

Message ID 20250507142110.3452012-9-adhemerval.zanella@linaro.org
State New
Headers show
Series Add initial support for --enable-ubsan | expand

Commit Message

Adhemerval Zanella Netto May 7, 2025, 2:17 p.m. UTC
The ubsan triggers:

UBSAN: Undefined behaviour in ./elem-hash.h:27:14 left shift of 360447856 by 3 cannot be represented in type 'int'

Using unsigned shift here zero fill like signed.
---
 locale/elem-hash.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Florian Weimer May 20, 2025, 12:57 p.m. UTC | #1
* Adhemerval Zanella:

> The ubsan triggers:
>
> UBSAN: Undefined behaviour in ./elem-hash.h:27:14 left shift of 360447856 by 3 cannot be represented in type 'int'
>
> Using unsigned shift here zero fill like signed.
> ---
>  locale/elem-hash.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/locale/elem-hash.h b/locale/elem-hash.h
> index 5d34ec43f3..8052b0fe6d 100644
> --- a/locale/elem-hash.h
> +++ b/locale/elem-hash.h
> @@ -24,7 +24,7 @@ elem_hash (const char *str, int32_t n)
>  
>    while (n-- > 0)
>      {
> -      result <<= 3;
> +      result = (uint32_t)result << 3;
>        result += *str++;
>      }

Wow, this hash function is really bad with signed chars.

Is this part of the ABI?  Can we chang its definition without changing
localedef output files?

Thanks,
Florian
Andreas Schwab May 20, 2025, 1:26 p.m. UTC | #2
On Mai 07 2025, Adhemerval Zanella wrote:

> diff --git a/locale/elem-hash.h b/locale/elem-hash.h
> index 5d34ec43f3..8052b0fe6d 100644
> --- a/locale/elem-hash.h
> +++ b/locale/elem-hash.h
> @@ -24,7 +24,7 @@ elem_hash (const char *str, int32_t n)
>  
>    while (n-- > 0)
>      {
> -      result <<= 3;
> +      result = (uint32_t)result << 3;
>        result += *str++;
>      }

The only use of elem_hash assigns the value to a uint32_t, so it would
be better to adjust the return type of the function.
Andreas Schwab May 20, 2025, 1:40 p.m. UTC | #3
On Mai 20 2025, Florian Weimer wrote:

> Is this part of the ABI?  Can we chang its definition without changing
> localedef output files?

There are only two uses of the _NL_COLLATE_SYMB_TABLEMB element of
LC_COLLATE (posix/fnmatch_loop.c, and seek_collating_symbol_entry in
posix/regcomp.c), and both just look for a non-zero hash, but ignore it
otherwise.  I don't think the particular definition of the hash function
is part of the ABI.
diff mbox series

Patch

diff --git a/locale/elem-hash.h b/locale/elem-hash.h
index 5d34ec43f3..8052b0fe6d 100644
--- a/locale/elem-hash.h
+++ b/locale/elem-hash.h
@@ -24,7 +24,7 @@  elem_hash (const char *str, int32_t n)
 
   while (n-- > 0)
     {
-      result <<= 3;
+      result = (uint32_t)result << 3;
       result += *str++;
     }