diff mbox series

[07/17] malloc: Add specialized dynarray for C strings

Message ID 1496956411-25594-8-git-send-email-adhemerval.zanella@linaro.org
State New
Headers show
Series posix: glob fixes and refactor | expand

Commit Message

Adhemerval Zanella June 8, 2017, 9:13 p.m. UTC
This patch adds an specialized dynarray to manage C strings using the
dynarray internal implementation.  It uses some private fields from
dynarray and thus it provided specific files to access and manage
the internal string buffer.

   For instance:

   struct char_array str;
   // str == "testing"
   char_array_init_str (&str, "testing");
   // c == 's'
   char c = char_array_pos (&str, 2);
   // str = "testing2"
   char_array_set_str (&str, "testing2");
   // str = "testi"
   char_array_erase (&str, 5);
   // str = "123testi"
   char_array_prepend_str (&str, "123");
   // len = 8
   size_t len = char_array_length (&str);
   // str = "123testi456"
   char_array_append_str (&str, "456");
   // str = "123testi789"
   char_array_replace_str_pos (&str, 7, "789", 3);

The provided function are not extensive and meant mainly to be use in
subsequent glob implementation cleanup.  For internal object consistency
only the function provided by char_array.c should be used, including
internal object manipulation.

To check for possible overflows in internal size manipulation a new
function, check_add_wrapv_size_t, is added on malloc-internal.  It basically
return whether the addition of two size_t overflows.

Checked on x86_64-linux-gnu.

	* malloc/Makefile (test-internal): Add tst-char_array.
	* malloc/malloc-internal (check_add_wrapv_size_t): New function.
	* malloc/char_array.c: New file.
	* malloc/tst-char-array.c: Likewise.
---
 malloc/Makefile          |   1 +
 malloc/char_array.c      | 256 +++++++++++++++++++++++++++++++++++++++++++++++
 malloc/malloc-internal.h |  14 +++
 malloc/tst-char_array.c  | 107 ++++++++++++++++++++
 4 files changed, 378 insertions(+)
 create mode 100644 malloc/char_array.c
 create mode 100644 malloc/tst-char_array.c

-- 
2.7.4

Comments

Florian Weimer June 13, 2017, 9:46 a.m. UTC | #1
Adhemerval Zanella <adhemerval.zanella@linaro.org> writes:

> The provided function are not extensive and meant mainly to be use in

> subsequent glob implementation cleanup.  For internal object

> consistency only the function provided by char_array.c should be used,

> including internal object manipulation.


It's surprising that glob needs these many functions.

I'm not sure if the automatic NUL termination makes sense if we ever
want to generalize this interface.  How ingrained is that to the glob
use case?

> +++ b/malloc/char_array.c


Should this be malloc/char_array-skeleton.c, to indicate that this file
is parameterized and intended for inclusion into another .c file?

> +#define DYNARRAY_ELEMENT_INIT(__e) (*__e = '\0')


Why is this needed?  I don't think the code cares for the contents of
freshly added bytes.  For the trailing NUL byte, it seems better to add
that separately from the initialization operations.

> +static bool __attribute_used__

> +char_array_replace_str_pos (struct char_array *array, size_t pos,

> +                            const char *str, size_t len)

> +{

> +  if (pos > array->dynarray_header.used)

> +    return false;

> +

> +  size_t newsize;

> +  if (check_add_wrapv_size_t (pos, len, &newsize)

> +      || check_add_wrapv_size_t (newsize, 1, &newsize)

> +      || !char_array_resize (array, newsize))

> +    return false;


I don't think it is a good idea to mix the reporting of usage errors
(index out of bounds) with environmental conditions (memory allocation
failure).  Have you considered calling __libc_fatal if pos is out of
range, similar to the existing *_at function?

> +++ b/malloc/malloc-internal.h


> +/* Set *R = A + B.  Return true if the answer is mathematically incorrect due

> +   to overflow; in this case, *R is the low order bits of the correct

> +   answer.  */

> +static inline bool

> +check_add_wrapv_size_t (size_t a, size_t b, size_t *r)


Why not use check_add_overflow_size_t, for consistency with
check_mul_overflow_size_t?
Adhemerval Zanella June 13, 2017, 12:17 p.m. UTC | #2
On 13/06/2017 06:46, Florian Weimer wrote:
> Adhemerval Zanella <adhemerval.zanella@linaro.org> writes:

> 

>> The provided function are not extensive and meant mainly to be use in

>> subsequent glob implementation cleanup.  For internal object

>> consistency only the function provided by char_array.c should be used,

>> including internal object manipulation.

> 

> It's surprising that glob needs these many functions.

> 

> I'm not sure if the automatic NUL termination makes sense if we ever

> want to generalize this interface.  How ingrained is that to the glob

> use case?


glob internally prepends the search directory by replacing the tilde with
a external home directory (GLOB_TILDE, either by environment variable or
through getXXX functions) or append new data by calling new patterns
recursively (GLOB_BRACE).

It is not an specific usage to glob, but rather a generic one which with 
more higher level languages all the memory buffer bikeshedding will be 
avoided by a string object. The idea is to keep a consistent C string buffer
stack-allocated (for general use) or heap based (to not bound due the
internal expansion) which can be used as constant argument for other
function call (fnmatch or recursively in case of GLOB_BRACE). 

> 

>> +++ b/malloc/char_array.c

> 

> Should this be malloc/char_array-skeleton.c, to indicate that this file

> is parameterized and intended for inclusion into another .c file?


I am not sure, the idea of char_array is to be self-contained, there is
no need to define anything in the file to include it.  Also, if we see
other usage of dynamic C string inside glibc it would be a good idea to
add internal symbols for common symbols.

> 

>> +#define DYNARRAY_ELEMENT_INIT(__e) (*__e = '\0')

> 

> Why is this needed?  I don't think the code cares for the contents of

> freshly added bytes.  For the trailing NUL byte, it seems better to add

> that separately from the initialization operations.


It does not, it is an artefact I added and it should be removed.

> 

>> +static bool __attribute_used__

>> +char_array_replace_str_pos (struct char_array *array, size_t pos,

>> +                            const char *str, size_t len)

>> +{

>> +  if (pos > array->dynarray_header.used)

>> +    return false;

>> +

>> +  size_t newsize;

>> +  if (check_add_wrapv_size_t (pos, len, &newsize)

>> +      || check_add_wrapv_size_t (newsize, 1, &newsize)

>> +      || !char_array_resize (array, newsize))

>> +    return false;

> 

> I don't think it is a good idea to mix the reporting of usage errors

> (index out of bounds) with environmental conditions (memory allocation

> failure).  Have you considered calling __libc_fatal if pos is out of

> range, similar to the existing *_at function?


Indeed, I will change to call __libc_fatal in such cases.

> 

>> +++ b/malloc/malloc-internal.h

> 

>> +/* Set *R = A + B.  Return true if the answer is mathematically incorrect due

>> +   to overflow; in this case, *R is the low order bits of the correct

>> +   answer.  */

>> +static inline bool

>> +check_add_wrapv_size_t (size_t a, size_t b, size_t *r)

> 

> Why not use check_add_overflow_size_t, for consistency with

> check_mul_overflow_size_t?

> 


I will change it.
Adhemerval Zanella June 13, 2017, 2:24 p.m. UTC | #3
On 13/06/2017 09:17, Adhemerval Zanella wrote:
> 

> 

> On 13/06/2017 06:46, Florian Weimer wrote:

>> Adhemerval Zanella <adhemerval.zanella@linaro.org> writes:

>>

>>> The provided function are not extensive and meant mainly to be use in

>>> subsequent glob implementation cleanup.  For internal object

>>> consistency only the function provided by char_array.c should be used,

>>> including internal object manipulation.

>>

>> It's surprising that glob needs these many functions.

>>

>> I'm not sure if the automatic NUL termination makes sense if we ever

>> want to generalize this interface.  How ingrained is that to the glob

>> use case?

> 

> glob internally prepends the search directory by replacing the tilde with

> a external home directory (GLOB_TILDE, either by environment variable or

> through getXXX functions) or append new data by calling new patterns

> recursively (GLOB_BRACE).

> 

> It is not an specific usage to glob, but rather a generic one which with 

> more higher level languages all the memory buffer bikeshedding will be 

> avoided by a string object. The idea is to keep a consistent C string buffer

> stack-allocated (for general use) or heap based (to not bound due the

> internal expansion) which can be used as constant argument for other

> function call (fnmatch or recursively in case of GLOB_BRACE). 

> 

>>

>>> +++ b/malloc/char_array.c

>>

>> Should this be malloc/char_array-skeleton.c, to indicate that this file

>> is parameterized and intended for inclusion into another .c file?

> 

> I am not sure, the idea of char_array is to be self-contained, there is

> no need to define anything in the file to include it.  Also, if we see

> other usage of dynamic C string inside glibc it would be a good idea to

> add internal symbols for common symbols.

> 

>>

>>> +#define DYNARRAY_ELEMENT_INIT(__e) (*__e = '\0')

>>

>> Why is this needed?  I don't think the code cares for the contents of

>> freshly added bytes.  For the trailing NUL byte, it seems better to add

>> that separately from the initialization operations.

> 

> It does not, it is an artefact I added and it should be removed.

> 

>>

>>> +static bool __attribute_used__

>>> +char_array_replace_str_pos (struct char_array *array, size_t pos,

>>> +                            const char *str, size_t len)

>>> +{

>>> +  if (pos > array->dynarray_header.used)

>>> +    return false;

>>> +

>>> +  size_t newsize;

>>> +  if (check_add_wrapv_size_t (pos, len, &newsize)

>>> +      || check_add_wrapv_size_t (newsize, 1, &newsize)

>>> +      || !char_array_resize (array, newsize))

>>> +    return false;

>>

>> I don't think it is a good idea to mix the reporting of usage errors

>> (index out of bounds) with environmental conditions (memory allocation

>> failure).  Have you considered calling __libc_fatal if pos is out of

>> range, similar to the existing *_at function?

> 

> Indeed, I will change to call __libc_fatal in such cases.

> 

>>

>>> +++ b/malloc/malloc-internal.h

>>

>>> +/* Set *R = A + B.  Return true if the answer is mathematically incorrect due

>>> +   to overflow; in this case, *R is the low order bits of the correct

>>> +   answer.  */

>>> +static inline bool

>>> +check_add_wrapv_size_t (size_t a, size_t b, size_t *r)

>>

>> Why not use check_add_overflow_size_t, for consistency with

>> check_mul_overflow_size_t?

>>

> 

> I will change it.

> 


Here is an updated patch for the specialized dynarray.  I have incorporated all
your suggestion but the skeleton name change (which I am not sure if it should
follow the idea since it should be a complete 'api').  I also added some nonull
attribute as for dynarray implementation.

----diff --git a/malloc/Makefile b/malloc/Makefile
index 14c13f1..323b3fb 100644
--- a/malloc/Makefile
+++ b/malloc/Makefile
@@ -46,6 +46,7 @@ tests-internal += \
 	 tst-dynarray \
 	 tst-dynarray-fail \
 	 tst-dynarray-at-fail \
+	 tst-char_array
 
 ifneq (no,$(have-tunables))
 tests += tst-malloc-usable-tunables
@@ -58,7 +59,7 @@ test-srcs = tst-mtrace
 routines = malloc morecore mcheck mtrace obstack reallocarray \
   scratch_buffer_grow scratch_buffer_grow_preserve \
   scratch_buffer_set_array_size \
-  dynarray_at_failure \
+  dynarray_at_failure dynarray_overflow_failure \
   dynarray_emplace_enlarge \
   dynarray_finalize \
   dynarray_resize \
diff --git a/malloc/char_array.c b/malloc/char_array.c
new file mode 100644
index 0000000..4d73203
--- /dev/null
+++ b/malloc/char_array.c
@@ -0,0 +1,281 @@
+/* Specialized dynarray for C strings.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* This file provides a dynamic C string with an initial stack allocated
+   buffer.  Since it is based on dynarray, it provided dynamic size
+   expansion and heap usage for large strings.
+
+   The following parameters are optional:
+
+   CHAR_ARRAY_INITIAL_SIZE
+      The size of the statically allocated array (default is 256).  It will
+      be used to define DYNARRAY_INITIAL_SIZE.
+
+   The following functions are provided:
+
+   bool char_array_init_empty (struct char_array *);
+   bool char_array_init_str (struct char_array *, const char *);
+   bool char_array_init_str_size (struct char_array *, const char *, size_t);
+   bool char_array_is_empty (struct char_array *);
+   const char *char_array_str (struct char_array *);
+   char char_array_pos (struct char_array *, size_t);
+   size_t char_array_length (struct char_array *);
+   bool char_array_set_str (struct char_array *, const char *);
+   bool char_array_set_str_size (struct char_array *, const char *, size_t);
+   void char_array_erase (struct char_array *, size_t);
+   bool char_array_resize_str (struct char_array *, size_t);
+   bool char_array_prepend_str (struct char_array *, const char *);
+   bool char_array_append_str (struct char_array *, const char *);
+   bool char_array_replace_str_pos (struct char_array *, size_t, const char *,
+				    size_t);
+
+   For instance:
+
+   struct char_array str;
+   // str == "testing";
+   char_array_init_str (&str, "testing");
+   // c == 's'
+   char c = char_array_pos (&str, 2);
+   // str = "testing2";
+   char_array_set_str (&str, "testing2");
+   // str = "testi";
+   char_array_erase (&str, 5);
+   // str = "123testi";
+   char_array_prepend_str (&str, "123");
+   // len = 8;
+   size_t len = char_array_length (&str);
+   // str = "123testi456";
+   char_array_append_str (&str, "456");
+   // str = "123testi789";
+   char_array_replace_str_pos (&str, 7, "789", 3);
+ */
+
+#define DYNARRAY_STRUCT            char_array
+#define DYNARRAY_ELEMENT           char
+#define DYNARRAY_PREFIX            char_array_
+#ifndef CHAR_ARRAY_INITIAL_SIZE
+# define CHAR_ARRAY_INITIAL_SIZE 256
+#endif
+#define DYNARRAY_INITIAL_SIZE  CHAR_ARRAY_INITIAL_SIZE
+#include <malloc/dynarray-skeleton.c>
+#include <malloc/malloc-internal.h>
+
+/* Return a const char for the internal C string handled by 'array'.  */
+__attribute__ ((nonnull (1)))
+static const char *
+char_array_str (struct char_array *array)
+{
+  return char_array_at (array, 0);
+}
+
+/* Return the character at position 'pos' from the char_array 'array'.  */
+__attribute__ ((nonnull (1)))
+static char
+char_array_pos (struct char_array *array, size_t pos)
+{
+  return *char_array_at (array, pos);
+}
+
+/* Calculate the length of the string, excluding the terminating null.  */
+__attribute__ ((unused, nonnull (1)))
+static size_t
+char_array_length (struct char_array *array)
+{
+  /* Exclude the final '\0'.  */
+  return array->dynarray_header.used - 1;
+}
+
+/* Copy the contents of string 'str' to char_array 'array', including the
+   final '\0'.  */
+__attribute__ ((nonnull (1, 2)))
+static bool
+char_array_set_str (struct char_array *array, const char *str)
+{
+  size_t size = strlen (str) + 1;
+  if (!char_array_resize (array, size))
+    return false;
+  memcpy (array->dynarray_header.array, str, size);
+  array->dynarray_header.used = size;
+  return true;
+}
+
+/* Copy up 'size' bytes from string 'str' to char_array 'array'.  A final
+   '\0' is appended in the char_array.  */
+__attribute__ ((nonnull (1, 2)))
+static bool
+char_array_set_str_size (struct char_array *array, const char *str,
+			 size_t size)
+{
+  size_t newsize;
+  if (check_add_wrapv_size_t (size, 1, &newsize))
+    __libc_dynarray_overflow_failure (size, 1);
+
+  if (!char_array_resize (array, newsize))
+    return false;
+
+  *((char *) mempcpy (array->dynarray_header.array, str, size)) = '\0';
+  array->dynarray_header.used = newsize;
+  return true;
+}
+
+/* Initialize the char_array 'array' and sets it to an empty string ("").  */
+__attribute__ ((nonnull (1)))
+static bool
+char_array_init_empty (struct char_array *array)
+{
+  char_array_init (array);
+  return char_array_set_str (array, "");
+}
+
+/* Initialize the char_array 'array' and copy the content of string 'str'.  */
+__attribute__ ((nonnull (1, 2)))
+static bool
+char_array_init_str (struct char_array *array, const char *str)
+{
+  char_array_init (array);
+  return char_array_set_str (array, str);
+}
+
+/* Initialize the char_array 'array' and copy the content of string 'str'
+   up to 'size' characteres.  */
+__attribute__ ((nonnull (1, 2)))
+static bool
+char_array_init_str_size (struct char_array *array, const char *str,
+			  size_t size)
+{
+  char_array_init (array);
+  return char_array_set_str_size (array, str, size);
+}
+
+/* Return if the char_array contain any characteres.  */
+__attribute__ ((nonnull (1)))
+static bool
+char_array_is_empty (struct char_array *array)
+{
+  return *char_array_at (array, 0) == '\0';
+}
+
+/* Remove the byte at position 'pos' from char_array 'array'.  The contents
+   are moved internally if the position is not at the end of the internal
+   buffer.  */
+__attribute__ ((nonnull (1)))
+static bool
+char_array_erase (struct char_array *array, size_t pos)
+{
+  if (pos >= array->dynarray_header.used - 1)
+    return false;
+
+  char *ppos = char_array_at (array, pos);
+  char *lpos = array->dynarray_header.array + array->dynarray_header.used;
+  ptrdiff_t size = lpos - ppos;
+  memmove (ppos, ppos + 1, size);
+  array->dynarray_header.used--;
+  return true;
+}
+
+/* Resize the char_array 'array' to size 'count' maintaining the ending
+   '\0' byte.  */
+__attribute__ ((nonnull (1)))
+static bool
+char_array_crop (struct char_array *array, size_t size)
+{
+  if (size >= (array->dynarray_header.used - 1)
+      || !char_array_resize (array, size + 1))
+    return false;
+
+  array->dynarray_header.array[array->dynarray_header.used] = '\0';
+  return true;
+}
+
+/* Prepend the contents of string 'str' to char_array 'array', including the
+   final '\0' byte.  */
+__attribute__ ((nonnull (1, 2)))
+static bool
+char_array_prepend_str (struct char_array *array, const char *str)
+{
+  size_t size = strlen (str);
+  /* Resizing the array might change its used elements and we need below
+     to correct copy the elements.  */
+  size_t used = array->dynarray_header.used;
+
+  size_t newsize;
+  if (check_add_wrapv_size_t (used, size, &newsize))
+    __libc_dynarray_overflow_failure (used, size);
+
+  if (!char_array_resize (array, newsize))
+    return false;
+
+  /* Make room for the string and copy it.  */
+  memmove (array->dynarray_header.array + size, array->dynarray_header.array,
+           used);
+  memcpy (array->dynarray_header.array, str, size);
+  array->dynarray_header.used = newsize;
+  return true;
+}
+
+/* Append the contents of string 'str' to char_array 'array, including the
+   final '\0' byte.  */
+__attribute__ ((nonnull (1, 2)))
+static bool
+char_array_append_str (struct char_array *array, const char *str)
+{
+  size_t size = strlen (str);
+  /* Resizing the array might change its used elements and it used it below
+     to correct copy the elements.  */
+  size_t used = array->dynarray_header.used - 1;
+
+  /* 'used' does account for final '\0', so there is no need to add
+     an extra element to calculate the final required size.  */
+  size_t newsize;
+  if (check_add_wrapv_size_t (used + 1, size, &newsize))
+    __libc_dynarray_overflow_failure (used + 1, size);
+
+  if (!char_array_resize (array, newsize))
+    return false;
+
+  /* Start to append at '\0' up to string length and add a final '\0'.  */
+  *(char*) mempcpy (array->dynarray_header.array + used, str, size) = '\0';
+  array->dynarray_header.used = newsize;
+  return true;
+}
+
+/* Replace the contents starting of position 'pos' of char_array 'array'
+   with the contents of string 'str' up to 'len' bytes.  A final '\0'
+   is appended in the string.  */
+__attribute__ ((nonnull (1, 3)))
+static bool
+char_array_replace_str_pos (struct char_array *array, size_t pos,
+                            const char *str, size_t len)
+{
+  if (pos > array->dynarray_header.used)
+    return false;
+
+  size_t newsize;
+  if (check_add_wrapv_size_t (pos, len, &newsize))
+    __libc_dynarray_overflow_failure (pos, len);
+  if (check_add_wrapv_size_t (newsize, 1, &newsize))
+    __libc_dynarray_overflow_failure (newsize, 1);
+
+  if (!char_array_resize (array, newsize))
+    return false;
+
+  char *start = char_array_at (array, pos);
+  *(char *) mempcpy (start, str, len) = '\0';
+  array->dynarray_header.used = newsize;
+  return true;
+}
diff --git a/malloc/dynarray.h b/malloc/dynarray.h
index c73e08b..5a491b4 100644
--- a/malloc/dynarray.h
+++ b/malloc/dynarray.h
@@ -173,4 +173,12 @@ void __libc_dynarray_at_failure (size_t size, size_t index)
   __attribute__ ((noreturn));
 libc_hidden_proto (__libc_dynarray_at_failure)
 
+/* Internal function.  TErminate the process after an overflow in
+   new size allocation.  SIZE is the current number of elements in
+   dynamic array and INCR is the new elements to add on current
+   size.  */
+void __libc_dynarray_overflow_failure (size_t size, size_t incr)
+  __attribute__ ((noreturn));
+libc_hidden_proto (__libc_dynarray_overflow_failure)
+
 #endif /* _DYNARRAY_H */
diff --git a/malloc/dynarray_overflow_failure.c b/malloc/dynarray_overflow_failure.c
new file mode 100644
index 0000000..14936b0
--- /dev/null
+++ b/malloc/dynarray_overflow_failure.c
@@ -0,0 +1,31 @@
+/* Report an dynamic array size overflow condition.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <dynarray.h>
+#include <stdio.h>
+
+void
+__libc_dynarray_overflow_failure (size_t size, size_t incr)
+{
+  char buf[200];
+  __snprintf (buf, sizeof (buf), "Fatal glibc error: "
+              "new size overflows (old %zu and increment %zu)\n",
+              size, incr);
+ __libc_fatal (buf);
+}
+libc_hidden_def (__libc_dynarray_overflow_failure)
diff --git a/malloc/malloc-internal.h b/malloc/malloc-internal.h
index dbd801a..3066cd3 100644
--- a/malloc/malloc-internal.h
+++ b/malloc/malloc-internal.h
@@ -101,4 +101,18 @@ check_mul_overflow_size_t (size_t left, size_t right, size_t *result)
 #endif
 }
 
+/* Set *R = A + B.  Return true if the answer is mathematically incorrect due
+   to overflow; in this case, *R is the low order bits of the correct
+   answer.  */
+static inline bool
+check_add_wrapv_size_t (size_t a, size_t b, size_t *r)
+{
+#if 5 <= __GNUC__
+  return __builtin_add_overflow (a, b, r);
+#else
+  *r = a + b;
+  return *r < a;
+#endif
+}
+
 #endif /* _MALLOC_INTERNAL_H */
diff --git a/malloc/tst-char_array.c b/malloc/tst-char_array.c
new file mode 100644
index 0000000..53f9482
--- /dev/null
+++ b/malloc/tst-char_array.c
@@ -0,0 +1,107 @@
+/* Test for char_array.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <string.h>
+
+#include <malloc/char_array.c>
+
+#include <malloc.h>
+#include <mcheck.h>
+#include <stdint.h>
+#include <support/check.h>
+#include <support/support.h>
+
+static int
+do_test (void)
+{
+  mtrace ();
+
+  {
+    struct char_array str;
+    TEST_VERIFY_EXIT (char_array_init_empty (&str) == true);
+    TEST_VERIFY_EXIT (char_array_length (&str) == 0);
+    TEST_VERIFY_EXIT (char_array_is_empty (&str) == true);
+    TEST_VERIFY_EXIT (strcmp (char_array_str (&str), "") == 0);
+    char_array_free (&str);
+  }
+
+  {
+    struct char_array str;
+    TEST_VERIFY_EXIT (char_array_init_str (&str, "testing"));
+    TEST_VERIFY_EXIT (char_array_length (&str) == strlen ("testing"));
+    TEST_VERIFY_EXIT (char_array_pos (&str, 2) == 's');
+    TEST_VERIFY_EXIT (char_array_is_empty (&str) == false);
+    TEST_VERIFY_EXIT (strcmp (char_array_str (&str), "testing") == 0);
+    char_array_free (&str);
+  }
+
+  {
+    struct char_array str;
+    TEST_VERIFY_EXIT (char_array_init_str_size (&str, "testing", 4));
+    TEST_VERIFY_EXIT (char_array_length (&str) == 4);
+    TEST_VERIFY_EXIT (char_array_pos (&str, 2) == 's');
+    TEST_VERIFY_EXIT (char_array_is_empty (&str) == false);
+    TEST_VERIFY_EXIT (strcmp (char_array_str (&str), "test") == 0);
+    char_array_free (&str);
+  }
+
+  {
+    struct char_array str;
+    TEST_VERIFY_EXIT (char_array_init_str (&str, "testing"));
+    TEST_VERIFY_EXIT (char_array_set_str (&str, "abcdef"));
+    TEST_VERIFY_EXIT (strcmp (char_array_str (&str), "abcdef") == 0);
+    TEST_VERIFY_EXIT (char_array_set_str_size (&str, "abcdef", 4));
+    TEST_VERIFY_EXIT (strcmp (char_array_str (&str), "abcd") == 0);
+    char_array_free (&str);
+  }
+
+  {
+    struct char_array str;
+    TEST_VERIFY_EXIT (char_array_init_str (&str, "testing"));
+    TEST_VERIFY_EXIT (char_array_erase (&str, 4) == true);
+    TEST_VERIFY_EXIT (char_array_length (&str) == strlen ("testing") - 1);
+    TEST_VERIFY_EXIT (strcmp (char_array_str (&str), "testng") == 0);
+    TEST_VERIFY_EXIT (char_array_erase (&str, char_array_length (&str))
+		      == false);
+    TEST_VERIFY_EXIT (char_array_length (&str) == strlen ("testing") - 1);
+    TEST_VERIFY_EXIT (char_array_erase (&str, char_array_length (&str) - 1)
+		      == true);
+    TEST_VERIFY_EXIT (char_array_length (&str) == strlen ("testing") - 2);
+    TEST_VERIFY_EXIT (strcmp (char_array_str (&str), "testn") == 0);
+    char_array_free (&str);
+  }
+
+  {
+    struct char_array str;
+    TEST_VERIFY_EXIT (char_array_init_str (&str, "test"));
+    TEST_VERIFY_EXIT (char_array_prepend_str (&str, "123"));
+    TEST_VERIFY_EXIT (strcmp (char_array_str (&str), "123test") == 0);
+    TEST_VERIFY_EXIT (char_array_length (&str) == strlen ("123test"));
+    TEST_VERIFY_EXIT (char_array_append_str (&str, "456"));
+    TEST_VERIFY_EXIT (strcmp (char_array_str (&str), "123test456") == 0);
+    TEST_VERIFY_EXIT (char_array_length (&str) == strlen ("123test456"));
+    TEST_VERIFY_EXIT (char_array_replace_str_pos (&str, 7, "789", 3));
+    TEST_VERIFY_EXIT (strcmp (char_array_str (&str), "123test789") == 0);
+    TEST_VERIFY_EXIT (char_array_length (&str) == strlen ("123test789"));
+    char_array_free (&str);
+  }
+
+  return 0;
+}
+
+#include <support/test-driver.c>

Adhemerval Zanella June 13, 2017, 8:51 p.m. UTC | #4
On 13/06/2017 11:24, Adhemerval Zanella wrote:

> Here is an updated patch for the specialized dynarray.  I have incorporated all

> your suggestion but the skeleton name change (which I am not sure if it should

> follow the idea since it should be a complete 'api').  I also added some nonull

> attribute as for dynarray implementation.


I rebase against the new begin/end additions and fixed some issues regarding
the make check (which I forgot to actually run before patch submission...).

---diff --git a/malloc/Makefile b/malloc/Makefile
index 14c13f1..323b3fb 100644
--- a/malloc/Makefile
+++ b/malloc/Makefile
@@ -46,6 +46,7 @@ tests-internal += \
 	 tst-dynarray \
 	 tst-dynarray-fail \
 	 tst-dynarray-at-fail \
+	 tst-char_array
 
 ifneq (no,$(have-tunables))
 tests += tst-malloc-usable-tunables
@@ -58,7 +59,7 @@ test-srcs = tst-mtrace
 routines = malloc morecore mcheck mtrace obstack reallocarray \
   scratch_buffer_grow scratch_buffer_grow_preserve \
   scratch_buffer_set_array_size \
-  dynarray_at_failure \
+  dynarray_at_failure dynarray_overflow_failure \
   dynarray_emplace_enlarge \
   dynarray_finalize \
   dynarray_resize \
diff --git a/malloc/Versions b/malloc/Versions
index 5b54306..86b52f7 100644
--- a/malloc/Versions
+++ b/malloc/Versions
@@ -82,6 +82,7 @@ libc {
 
     # dynarray support
     __libc_dynarray_at_failure;
+    __libc_dynarray_overflow_failure;
     __libc_dynarray_emplace_enlarge;
     __libc_dynarray_finalize;
     __libc_dynarray_resize;
diff --git a/malloc/char_array.c b/malloc/char_array.c
new file mode 100644
index 0000000..e37df81
--- /dev/null
+++ b/malloc/char_array.c
@@ -0,0 +1,281 @@
+/* Specialized dynarray for C strings.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* This file provides a dynamic C string with an initial stack allocated
+   buffer.  Since it is based on dynarray, it provided dynamic size
+   expansion and heap usage for large strings.
+
+   The following parameters are optional:
+
+   CHAR_ARRAY_INITIAL_SIZE
+      The size of the statically allocated array (default is 256).  It will
+      be used to define DYNARRAY_INITIAL_SIZE.
+
+   The following functions are provided:
+
+   bool char_array_init_empty (struct char_array *);
+   bool char_array_init_str (struct char_array *, const char *);
+   bool char_array_init_str_size (struct char_array *, const char *, size_t);
+   bool char_array_is_empty (struct char_array *);
+   const char *char_array_str (struct char_array *);
+   char char_array_pos (struct char_array *, size_t);
+   size_t char_array_length (struct char_array *);
+   bool char_array_set_str (struct char_array *, const char *);
+   bool char_array_set_str_size (struct char_array *, const char *, size_t);
+   void char_array_erase (struct char_array *, size_t);
+   bool char_array_crop (struct char_array *, size_t);
+   bool char_array_prepend_str (struct char_array *, const char *);
+   bool char_array_append_str (struct char_array *, const char *);
+   bool char_array_replace_str_pos (struct char_array *, size_t, const char *,
+				    size_t);
+
+   For instance:
+
+   struct char_array str;
+   // str == "testing";
+   char_array_init_str (&str, "testing");
+   // c == 's'
+   char c = char_array_pos (&str, 2);
+   // str = "testing2";
+   char_array_set_str (&str, "testing2");
+   // str = "testi";
+   char_array_erase (&str, 5);
+   // str = "123testi";
+   char_array_prepend_str (&str, "123");
+   // len = 8;
+   size_t len = char_array_length (&str);
+   // str = "123testi456";
+   char_array_append_str (&str, "456");
+   // str = "123testi789";
+   char_array_replace_str_pos (&str, 7, "789", 3);
+ */
+
+#define DYNARRAY_STRUCT            char_array
+#define DYNARRAY_ELEMENT           char
+#define DYNARRAY_PREFIX            char_array_
+#ifndef CHAR_ARRAY_INITIAL_SIZE
+# define CHAR_ARRAY_INITIAL_SIZE 256
+#endif
+#define DYNARRAY_INITIAL_SIZE  CHAR_ARRAY_INITIAL_SIZE
+#include <malloc/dynarray-skeleton.c>
+#include <malloc/malloc-internal.h>
+
+/* Return a const char for the internal C string handled by 'array'.  */
+__attribute__ ((nonnull (1)))
+static const char *
+char_array_str (struct char_array *array)
+{
+  return char_array_at (array, 0);
+}
+
+/* Return the character at position 'pos' from the char_array 'array'.  */
+__attribute__ ((nonnull (1)))
+static char
+char_array_pos (struct char_array *array, size_t pos)
+{
+  return *char_array_at (array, pos);
+}
+
+/* Calculate the length of the string, excluding the terminating null.  */
+__attribute__ ((unused, nonnull (1)))
+static size_t
+char_array_length (struct char_array *array)
+{
+  /* Exclude the final '\0'.  */
+  return array->dynarray_header.used - 1;
+}
+
+/* Copy the contents of string 'str' to char_array 'array', including the
+   final '\0'.  */
+__attribute__ ((nonnull (1, 2)))
+static bool
+char_array_set_str (struct char_array *array, const char *str)
+{
+  size_t size = strlen (str) + 1;
+  if (!char_array_resize (array, size))
+    return false;
+  memcpy (array->dynarray_header.array, str, size);
+  array->dynarray_header.used = size;
+  return true;
+}
+
+/* Copy up 'size' bytes from string 'str' to char_array 'array'.  A final
+   '\0' is appended in the char_array.  */
+__attribute__ ((nonnull (1, 2)))
+static bool
+char_array_set_str_size (struct char_array *array, const char *str,
+			 size_t size)
+{
+  size_t newsize;
+  if (check_add_wrapv_size_t (size, 1, &newsize))
+    __libc_dynarray_overflow_failure (size, 1);
+
+  if (!char_array_resize (array, newsize))
+    return false;
+
+  *((char *) mempcpy (array->dynarray_header.array, str, size)) = '\0';
+  array->dynarray_header.used = newsize;
+  return true;
+}
+
+/* Initialize the char_array 'array' and sets it to an empty string ("").  */
+__attribute__ ((nonnull (1)))
+static bool
+char_array_init_empty (struct char_array *array)
+{
+  char_array_init (array);
+  return char_array_set_str (array, "");
+}
+
+/* Initialize the char_array 'array' and copy the content of string 'str'.  */
+__attribute__ ((nonnull (1, 2)))
+static bool
+char_array_init_str (struct char_array *array, const char *str)
+{
+  char_array_init (array);
+  return char_array_set_str (array, str);
+}
+
+/* Initialize the char_array 'array' and copy the content of string 'str'
+   up to 'size' characteres.  */
+__attribute__ ((nonnull (1, 2)))
+static bool
+char_array_init_str_size (struct char_array *array, const char *str,
+			  size_t size)
+{
+  char_array_init (array);
+  return char_array_set_str_size (array, str, size);
+}
+
+/* Return if the char_array contain any characteres.  */
+__attribute__ ((nonnull (1)))
+static bool
+char_array_is_empty (struct char_array *array)
+{
+  return *char_array_at (array, 0) == '\0';
+}
+
+/* Remove the byte at position 'pos' from char_array 'array'.  The contents
+   are moved internally if the position is not at the end of the internal
+   buffer.  */
+__attribute__ ((nonnull (1)))
+static bool
+char_array_erase (struct char_array *array, size_t pos)
+{
+  if (pos >= array->dynarray_header.used - 1)
+    return false;
+
+  char *ppos = char_array_at (array, pos);
+  char *lpos = array->dynarray_header.array + array->dynarray_header.used;
+  ptrdiff_t size = lpos - ppos;
+  memmove (ppos, ppos + 1, size);
+  array->dynarray_header.used--;
+  return true;
+}
+
+/* Resize the char_array 'array' to size 'count' maintaining the ending
+   '\0' byte.  */
+__attribute__ ((nonnull (1)))
+static bool
+char_array_crop (struct char_array *array, size_t size)
+{
+  if (size >= (array->dynarray_header.used - 1)
+      || !char_array_resize (array, size + 1))
+    return false;
+
+  array->dynarray_header.array[size] = '\0';
+  return true;
+}
+
+/* Prepend the contents of string 'str' to char_array 'array', including the
+   final '\0' byte.  */
+__attribute__ ((nonnull (1, 2)))
+static bool
+char_array_prepend_str (struct char_array *array, const char *str)
+{
+  size_t size = strlen (str);
+  /* Resizing the array might change its used elements and we need below
+     to correct copy the elements.  */
+  size_t used = array->dynarray_header.used;
+
+  size_t newsize;
+  if (check_add_wrapv_size_t (used, size, &newsize))
+    __libc_dynarray_overflow_failure (used, size);
+
+  if (!char_array_resize (array, newsize))
+    return false;
+
+  /* Make room for the string and copy it.  */
+  memmove (array->dynarray_header.array + size, array->dynarray_header.array,
+           used);
+  memcpy (array->dynarray_header.array, str, size);
+  array->dynarray_header.used = newsize;
+  return true;
+}
+
+/* Append the contents of string 'str' to char_array 'array, including the
+   final '\0' byte.  */
+__attribute__ ((nonnull (1, 2)))
+static bool
+char_array_append_str (struct char_array *array, const char *str)
+{
+  size_t size = strlen (str);
+  /* Resizing the array might change its used elements and it used it below
+     to correct copy the elements.  */
+  size_t used = array->dynarray_header.used - 1;
+
+  /* 'used' does account for final '\0', so there is no need to add
+     an extra element to calculate the final required size.  */
+  size_t newsize;
+  if (check_add_wrapv_size_t (used + 1, size, &newsize))
+    __libc_dynarray_overflow_failure (used + 1, size);
+
+  if (!char_array_resize (array, newsize))
+    return false;
+
+  /* Start to append at '\0' up to string length and add a final '\0'.  */
+  *(char*) mempcpy (array->dynarray_header.array + used, str, size) = '\0';
+  array->dynarray_header.used = newsize;
+  return true;
+}
+
+/* Replace the contents starting of position 'pos' of char_array 'array'
+   with the contents of string 'str' up to 'len' bytes.  A final '\0'
+   is appended in the string.  */
+__attribute__ ((nonnull (1, 3)))
+static bool
+char_array_replace_str_pos (struct char_array *array, size_t pos,
+                            const char *str, size_t len)
+{
+  if (pos > array->dynarray_header.used)
+    return false;
+
+  size_t newsize;
+  if (check_add_wrapv_size_t (pos, len, &newsize))
+    __libc_dynarray_overflow_failure (pos, len);
+  if (check_add_wrapv_size_t (newsize, 1, &newsize))
+    __libc_dynarray_overflow_failure (newsize, 1);
+
+  if (!char_array_resize (array, newsize))
+    return false;
+
+  char *start = char_array_at (array, pos);
+  *(char *) mempcpy (start, str, len) = '\0';
+  array->dynarray_header.used = newsize;
+  return true;
+}
diff --git a/malloc/dynarray.h b/malloc/dynarray.h
index c73e08b..5a491b4 100644
--- a/malloc/dynarray.h
+++ b/malloc/dynarray.h
@@ -173,4 +173,12 @@ void __libc_dynarray_at_failure (size_t size, size_t index)
   __attribute__ ((noreturn));
 libc_hidden_proto (__libc_dynarray_at_failure)
 
+/* Internal function.  TErminate the process after an overflow in
+   new size allocation.  SIZE is the current number of elements in
+   dynamic array and INCR is the new elements to add on current
+   size.  */
+void __libc_dynarray_overflow_failure (size_t size, size_t incr)
+  __attribute__ ((noreturn));
+libc_hidden_proto (__libc_dynarray_overflow_failure)
+
 #endif /* _DYNARRAY_H */
diff --git a/malloc/dynarray_overflow_failure.c b/malloc/dynarray_overflow_failure.c
new file mode 100644
index 0000000..14936b0
--- /dev/null
+++ b/malloc/dynarray_overflow_failure.c
@@ -0,0 +1,31 @@
+/* Report an dynamic array size overflow condition.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <dynarray.h>
+#include <stdio.h>
+
+void
+__libc_dynarray_overflow_failure (size_t size, size_t incr)
+{
+  char buf[200];
+  __snprintf (buf, sizeof (buf), "Fatal glibc error: "
+              "new size overflows (old %zu and increment %zu)\n",
+              size, incr);
+ __libc_fatal (buf);
+}
+libc_hidden_def (__libc_dynarray_overflow_failure)
diff --git a/malloc/malloc-internal.h b/malloc/malloc-internal.h
index dbd801a..3066cd3 100644
--- a/malloc/malloc-internal.h
+++ b/malloc/malloc-internal.h
@@ -101,4 +101,18 @@ check_mul_overflow_size_t (size_t left, size_t right, size_t *result)
 #endif
 }
 
+/* Set *R = A + B.  Return true if the answer is mathematically incorrect due
+   to overflow; in this case, *R is the low order bits of the correct
+   answer.  */
+static inline bool
+check_add_wrapv_size_t (size_t a, size_t b, size_t *r)
+{
+#if 5 <= __GNUC__
+  return __builtin_add_overflow (a, b, r);
+#else
+  *r = a + b;
+  return *r < a;
+#endif
+}
+
 #endif /* _MALLOC_INTERNAL_H */
diff --git a/malloc/tst-char_array.c b/malloc/tst-char_array.c
new file mode 100644
index 0000000..cb06b9c
--- /dev/null
+++ b/malloc/tst-char_array.c
@@ -0,0 +1,110 @@
+/* Test for char_array.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <string.h>
+
+#include <malloc/char_array.c>
+
+#include <malloc.h>
+#include <mcheck.h>
+#include <stdint.h>
+#include <support/check.h>
+#include <support/support.h>
+
+static int
+do_test (void)
+{
+  mtrace ();
+
+  {
+    struct char_array str;
+    TEST_VERIFY_EXIT (char_array_init_empty (&str) == true);
+    TEST_VERIFY_EXIT (char_array_length (&str) == 0);
+    TEST_VERIFY_EXIT (char_array_is_empty (&str) == true);
+    TEST_VERIFY_EXIT (strcmp (char_array_str (&str), "") == 0);
+    char_array_free (&str);
+  }
+
+  {
+    struct char_array str;
+    TEST_VERIFY_EXIT (char_array_init_str (&str, "testing"));
+    TEST_VERIFY_EXIT (char_array_length (&str) == strlen ("testing"));
+    TEST_VERIFY_EXIT (char_array_pos (&str, 2) == 's');
+    TEST_VERIFY_EXIT (char_array_is_empty (&str) == false);
+    TEST_VERIFY_EXIT (strcmp (char_array_str (&str), "testing") == 0);
+    char_array_free (&str);
+  }
+
+  {
+    struct char_array str;
+    TEST_VERIFY_EXIT (char_array_init_str_size (&str, "testing", 4));
+    TEST_VERIFY_EXIT (char_array_length (&str) == 4);
+    TEST_VERIFY_EXIT (char_array_pos (&str, 2) == 's');
+    TEST_VERIFY_EXIT (char_array_is_empty (&str) == false);
+    TEST_VERIFY_EXIT (strcmp (char_array_str (&str), "test") == 0);
+    char_array_free (&str);
+  }
+
+  {
+    struct char_array str;
+    TEST_VERIFY_EXIT (char_array_init_str (&str, "testing"));
+    TEST_VERIFY_EXIT (char_array_set_str (&str, "abcdef"));
+    TEST_VERIFY_EXIT (strcmp (char_array_str (&str), "abcdef") == 0);
+    TEST_VERIFY_EXIT (char_array_set_str_size (&str, "abcdef", 4));
+    TEST_VERIFY_EXIT (strcmp (char_array_str (&str), "abcd") == 0);
+    char_array_free (&str);
+  }
+
+  {
+    struct char_array str;
+    TEST_VERIFY_EXIT (char_array_init_str (&str, "testing"));
+    TEST_VERIFY_EXIT (char_array_erase (&str, 4) == true);
+    TEST_VERIFY_EXIT (char_array_length (&str) == strlen ("testing") - 1);
+    TEST_VERIFY_EXIT (strcmp (char_array_str (&str), "testng") == 0);
+    TEST_VERIFY_EXIT (char_array_erase (&str, char_array_length (&str))
+		      == false);
+    TEST_VERIFY_EXIT (char_array_length (&str) == strlen ("testing") - 1);
+    TEST_VERIFY_EXIT (char_array_erase (&str, char_array_length (&str) - 1)
+		      == true);
+    TEST_VERIFY_EXIT (char_array_length (&str) == strlen ("testing") - 2);
+    TEST_VERIFY_EXIT (strcmp (char_array_str (&str), "testn") == 0);
+    char_array_free (&str);
+  }
+
+  {
+    struct char_array str;
+    TEST_VERIFY_EXIT (char_array_init_str (&str, "test"));
+    TEST_VERIFY_EXIT (char_array_prepend_str (&str, "123"));
+    TEST_VERIFY_EXIT (strcmp (char_array_str (&str), "123test") == 0);
+    TEST_VERIFY_EXIT (char_array_length (&str) == strlen ("123test"));
+    TEST_VERIFY_EXIT (char_array_append_str (&str, "456"));
+    TEST_VERIFY_EXIT (strcmp (char_array_str (&str), "123test456") == 0);
+    TEST_VERIFY_EXIT (char_array_length (&str) == strlen ("123test456"));
+    TEST_VERIFY_EXIT (char_array_replace_str_pos (&str, 7, "789", 3));
+    TEST_VERIFY_EXIT (strcmp (char_array_str (&str), "123test789") == 0);
+    TEST_VERIFY_EXIT (char_array_length (&str) == strlen ("123test789"));
+    TEST_VERIFY_EXIT (char_array_crop (&str, 7));
+    TEST_VERIFY_EXIT (char_array_length (&str) == 7);
+    TEST_VERIFY_EXIT (strcmp (char_array_str (&str), "123test") == 0);
+    char_array_free (&str);
+  }
+
+  return 0;
+}
+
+#include <support/test-driver.c>

Florian Weimer June 14, 2017, 1:35 p.m. UTC | #5
Adhemerval Zanella <adhemerval.zanella@linaro.org> writes:

>>> +++ b/malloc/char_array.c

>> 

>> Should this be malloc/char_array-skeleton.c, to indicate that this file

>> is parameterized and intended for inclusion into another .c file?

>

> I am not sure, the idea of char_array is to be self-contained, there is

> no need to define anything in the file to include it.  Also, if we see

> other usage of dynamic C string inside glibc it would be a good idea to

> add internal symbols for common symbols.


But it's still necessary to #include the file because it is what the C++
people call a header-only library.  And there is a hook for customizing
the size of the stack allocation.  This is why I asked.  I don't have a
strong opinion on file naming, though.

Florian
Adhemerval Zanella June 14, 2017, 2:04 p.m. UTC | #6
On 14/06/2017 10:35, Florian Weimer wrote:
> Adhemerval Zanella <adhemerval.zanella@linaro.org> writes:

> 

>>>> +++ b/malloc/char_array.c

>>>

>>> Should this be malloc/char_array-skeleton.c, to indicate that this file

>>> is parameterized and intended for inclusion into another .c file?

>>

>> I am not sure, the idea of char_array is to be self-contained, there is

>> no need to define anything in the file to include it.  Also, if we see

>> other usage of dynamic C string inside glibc it would be a good idea to

>> add internal symbols for common symbols.

> 

> But it's still necessary to #include the file because it is what the C++

> people call a header-only library.  And there is a hook for customizing

> the size of the stack allocation.  This is why I asked.  I don't have a

> strong opinion on file naming, though.


Fair enough then, although if this kind of usage pattern for dynarray
become common it would be interesting to move to internal symbol. I will
change to char_array-skeleton.c.
Florian Weimer June 14, 2017, 2:05 p.m. UTC | #7
Adhemerval Zanella <adhemerval.zanella@linaro.org> writes:

> On 13/06/2017 11:24, Adhemerval Zanella wrote:

>

>> Here is an updated patch for the specialized dynarray.  I have incorporated all

>> your suggestion but the skeleton name change (which I am not sure if it should

>> follow the idea since it should be a complete 'api').  I also added some nonull

>> attribute as for dynarray implementation.

>

> I rebase against the new begin/end additions and fixed some issues

> regarding the make check (which I forgot to actually run before patch

> submission...).


I don't see any begin/end references in the attached patch.

> +/* Replace the contents starting of position 'pos' of char_array 'array'

> +   with the contents of string 'str' up to 'len' bytes.  A final '\0'

> +   is appended in the string.  */

> +__attribute__ ((nonnull (1, 3)))

> +static bool

> +char_array_replace_str_pos (struct char_array *array, size_t pos,

> +                            const char *str, size_t len)

> +{

> +  if (pos > array->dynarray_header.used)

> +    return false;

> +

> +  size_t newsize;

> +  if (check_add_wrapv_size_t (pos, len, &newsize))

> +    __libc_dynarray_overflow_failure (pos, len);

> +  if (check_add_wrapv_size_t (newsize, 1, &newsize))

> +    __libc_dynarray_overflow_failure (newsize, 1);


This is the opposite of what I expect: pos > array->dynarray_header.used
appears to be a usage error, so this could result in __libc_fatal.
Integer overflow while computing sizes for memory allocation is usually
treated as a memory allocation failure, so it would expect a false
return (and no __libc_fatal) for that.

If you want to prevent access to the underlying char_array_* functions
generated by dynarray, you could use #pragma GCC poison.

Florian
Florian Weimer June 14, 2017, 2:06 p.m. UTC | #8
Adhemerval Zanella <adhemerval.zanella@linaro.org> writes:

> On 14/06/2017 10:35, Florian Weimer wrote:

>> Adhemerval Zanella <adhemerval.zanella@linaro.org> writes:

>> 

>>>>> +++ b/malloc/char_array.c

>>>>

>>>> Should this be malloc/char_array-skeleton.c, to indicate that this file

>>>> is parameterized and intended for inclusion into another .c file?

>>>

>>> I am not sure, the idea of char_array is to be self-contained, there is

>>> no need to define anything in the file to include it.  Also, if we see

>>> other usage of dynamic C string inside glibc it would be a good idea to

>>> add internal symbols for common symbols.

>> 

>> But it's still necessary to #include the file because it is what the C++

>> people call a header-only library.  And there is a hook for customizing

>> the size of the stack allocation.  This is why I asked.  I don't have a

>> strong opinion on file naming, though.

>

> Fair enough then, although if this kind of usage pattern for dynarray

> become common it would be interesting to move to internal symbol. I will

> change to char_array-skeleton.c.


Agreed.  In this case, we'd switch away from static functions, too.

Thanks,
Florian
Adhemerval Zanella June 14, 2017, 6:43 p.m. UTC | #9
On 14/06/2017 11:05, Florian Weimer wrote:
> Adhemerval Zanella <adhemerval.zanella@linaro.org> writes:

> 

>> On 13/06/2017 11:24, Adhemerval Zanella wrote:

>>

>>> Here is an updated patch for the specialized dynarray.  I have incorporated all

>>> your suggestion but the skeleton name change (which I am not sure if it should

>>> follow the idea since it should be a complete 'api').  I also added some nonull

>>> attribute as for dynarray implementation.

>>

>> I rebase against the new begin/end additions and fixed some issues

>> regarding the make check (which I forgot to actually run before patch

>> submission...).

> 

> I don't see any begin/end references in the attached patch.


My mistake, I did not commit the two smalls changes that replace char_array_at (0)
with char_array_begin.

> 

>> +/* Replace the contents starting of position 'pos' of char_array 'array'

>> +   with the contents of string 'str' up to 'len' bytes.  A final '\0'

>> +   is appended in the string.  */

>> +__attribute__ ((nonnull (1, 3)))

>> +static bool

>> +char_array_replace_str_pos (struct char_array *array, size_t pos,

>> +                            const char *str, size_t len)

>> +{

>> +  if (pos > array->dynarray_header.used)

>> +    return false;

>> +

>> +  size_t newsize;

>> +  if (check_add_wrapv_size_t (pos, len, &newsize))

>> +    __libc_dynarray_overflow_failure (pos, len);

>> +  if (check_add_wrapv_size_t (newsize, 1, &newsize))

>> +    __libc_dynarray_overflow_failure (newsize, 1);

> 

> This is the opposite of what I expect: pos > array->dynarray_header.used

> appears to be a usage error, so this could result in __libc_fatal.

> Integer overflow while computing sizes for memory allocation is usually

> treated as a memory allocation failure, so it would expect a false

> return (and no __libc_fatal) for that.


So I think a better approach would just to use:

  if (pos > array->dynarray_header.used)
    __libc_dynarray_at_failure (char_array_size (array), pos);

  if (check_add_wrapv_size_t (pos, len, &newsize))
      ||check_add_wrapv_size_t (newsize, 1, &newsize))
    return false;

> 

> If you want to prevent access to the underlying char_array_* functions

> generated by dynarray, you could use #pragma GCC poison.

It could be a nice idea, although there is no direct usage on the glob
patchset.  I will check this out for a future enhancement.
Florian Weimer June 14, 2017, 7:46 p.m. UTC | #10
Adhemerval Zanella <adhemerval.zanella@linaro.org> writes:

>> This is the opposite of what I expect: pos > array->dynarray_header.used

>> appears to be a usage error, so this could result in __libc_fatal.

>> Integer overflow while computing sizes for memory allocation is usually

>> treated as a memory allocation failure, so it would expect a false

>> return (and no __libc_fatal) for that.

>

> So I think a better approach would just to use:

>

>   if (pos > array->dynarray_header.used)

>     __libc_dynarray_at_failure (char_array_size (array), pos);

>

>   if (check_add_wrapv_size_t (pos, len, &newsize))

>       ||check_add_wrapv_size_t (newsize, 1, &newsize))

>     return false;


Right, this is what I would expect from such an interface.  This assumes
that glob doesn't expect to feed bad existing indexes to this function,
of course.

>> If you want to prevent access to the underlying char_array_* functions

>> generated by dynarray, you could use #pragma GCC poison.


> It could be a nice idea, although there is no direct usage on the glob

> patchset.  I will check this out for a future enhancement. 


Understood.

Thanks,
Florian
diff mbox series

Patch

diff --git a/malloc/Makefile b/malloc/Makefile
index af025cb..098e3c6 100644
--- a/malloc/Makefile
+++ b/malloc/Makefile
@@ -46,6 +46,7 @@  tests-internal += \
 	 tst-dynarray \
 	 tst-dynarray-fail \
 	 tst-dynarray-at-fail \
+	 tst-char_array
 
 ifneq (no,$(have-tunables))
 tests += tst-malloc-usable-tunables
diff --git a/malloc/char_array.c b/malloc/char_array.c
new file mode 100644
index 0000000..cce9360
--- /dev/null
+++ b/malloc/char_array.c
@@ -0,0 +1,256 @@ 
+/* Specialized dynarray for C strings.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* This file provides a dynamic C string with an initial stack allocated
+   buffer.  Since it is based on dynarray, it provided dynamic size
+   expansion and heap usage for large strings.
+
+   The following parameters are optional:
+
+   CHAR_ARRAY_INITIAL_SIZE
+      The size of the statically allocated array (default is 256).  It will
+      be used to define DYNARRAY_INITIAL_SIZE.
+
+   The following functions are provided:
+
+   bool char_array_init_empty (struct char_array *);
+   bool char_array_init_str (struct char_array *, const char *);
+   bool char_array_init_str_size (struct char_array *, const char *, size_t);
+   bool char_array_is_empty (struct char_array *);
+   const char *char_array_str (struct char_array *);
+   char char_array_pos (struct char_array *, size_t);
+   size_t char_array_length (struct char_array *);
+   bool char_array_set_str (struct char_array *, const char *);
+   bool char_array_set_str_size (struct char_array *, const char *, size_t);
+   void char_array_erase (struct char_array *, size_t);
+   bool char_array_resize_str (struct char_array *, size_t);
+   bool char_array_prepend_str (struct char_array *, const char *);
+   bool char_array_append_str (struct char_array *, const char *);
+   bool char_array_replace_str_pos (struct char_array *, size_t, const char *,
+				    size_t);
+
+   For instance:
+
+   struct char_array str;
+   // str == "testing";
+   char_array_init_str (&str, "testing");
+   // c == 's'
+   char c = char_array_pos (&str, 2);
+   // str = "testing2";
+   char_array_set_str (&str, "testing2");
+   // str = "testi";
+   char_array_erase (&str, 5);
+   // str = "123testi";
+   char_array_prepend_str (&str, "123");
+   // len = 8;
+   size_t len = char_array_length (&str);
+   // str = "123testi456";
+   char_array_append_str (&str, "456");
+   // str = "123testi789";
+   char_array_replace_str_pos (&str, 7, "789", 3);
+ */
+
+#define DYNARRAY_STRUCT            char_array
+#define DYNARRAY_ELEMENT           char
+#define DYNARRAY_PREFIX            char_array_
+#define DYNARRAY_ELEMENT_INIT(__e) (*__e = '\0')
+#ifndef CHAR_ARRAY_INITIAL_SIZE
+# define CHAR_ARRAY_INITIAL_SIZE 256
+#endif
+#define DYNARRAY_INITIAL_SIZE  CHAR_ARRAY_INITIAL_SIZE
+#include <malloc/dynarray-skeleton.c>
+#include <malloc/malloc-internal.h>
+
+/* Return a const char for the internal C string handled by 'array'.  */
+static const char * __attribute_used__
+char_array_str (struct char_array *array)
+{
+  return char_array_at (array, 0);
+}
+
+/* Return the character at position 'pos' from the char_array 'array'.  */
+static char __attribute_used__
+char_array_pos (struct char_array *array, size_t pos)
+{
+  return *char_array_at (array, pos);
+}
+
+static size_t __attribute_used__
+char_array_length (struct char_array *array)
+{
+  /* Exclude the final '\0'.  */
+  return array->dynarray_header.used - 1;
+}
+
+/* Copy the contents of string 'str' to char_array 'array', including the
+   final '\0'.  */
+static bool __attribute_used__
+char_array_set_str (struct char_array *array, const char *str)
+{
+  size_t size = strlen (str) + 1;
+  if (!char_array_resize (array, size))
+    return false;
+  memcpy (array->dynarray_header.array, str, size);
+  array->dynarray_header.used = size;
+  return true;
+}
+
+/* Copy up 'size' bytes from string 'str' to char_array 'array'.  A final
+   '\0' is appended in the char_array.  */
+static bool __attribute_used__
+char_array_set_str_size (struct char_array *array, const char *str,
+			 size_t size)
+{
+  size_t newsize;
+  if (check_add_wrapv_size_t (size, 1, &newsize)
+      || !char_array_resize (array, newsize))
+    return false;
+  *((char *) mempcpy (array->dynarray_header.array, str, size)) = '\0';
+  array->dynarray_header.used = newsize;
+  return true;
+}
+
+/* Initialize the char_array 'array' and sets it to an empty string ("").  */
+static bool __attribute_used__
+char_array_init_empty (struct char_array *array)
+{
+  char_array_init (array);
+  return char_array_set_str (array, "");
+}
+
+/* Initialize the char_array 'array' and copy the content of string 'str'.  */
+static bool __attribute_used__
+char_array_init_str (struct char_array *array, const char *str)
+{
+  char_array_init (array);
+  return char_array_set_str (array, str);
+}
+
+/* Initialize the char_array 'array' and copy the content of string 'str'
+   up to 'size' characteres.  */
+static bool __attribute_used__
+char_array_init_str_size (struct char_array *array, const char *str,
+			  size_t size)
+{
+  char_array_init (array);
+  return char_array_set_str_size (array, str, size);
+}
+
+static bool __attribute_used__
+char_array_is_empty (struct char_array *array)
+{
+  return *char_array_at (array, 0) == '\0';
+}
+
+/* Remove the byte at position 'pos' from char_array 'array'.  The contents
+   are moved internally if the position is not at the end of the internal
+   buffer.  */
+static bool __attribute_used__
+char_array_erase (struct char_array *array, size_t pos)
+{
+  if (pos >= array->dynarray_header.used - 1)
+    return false;
+
+  char *ppos = char_array_at (array, pos);
+  char *lpos = array->dynarray_header.array + array->dynarray_header.used;
+  ptrdiff_t size = lpos - ppos;
+  memmove (ppos, ppos + 1, size);
+  array->dynarray_header.used--;
+  return true;
+}
+
+/* Resize the char_array 'array' to size 'count' maintaining the ending
+   '\0' byte.  */
+static bool __attribute_used__
+char_array_crop (struct char_array *array, size_t size)
+{
+  if (size >= (array->dynarray_header.used - 1)
+      || !char_array_resize (array, size + 1))
+    return false;
+
+  array->dynarray_header.array[array->dynarray_header.used] = '\0';
+  return true;
+}
+
+/* Prepend the contents of string 'str' to char_array 'array', including the
+   final '\0' byte.  */
+static bool __attribute_used__
+char_array_prepend_str (struct char_array *array, const char *str)
+{
+  size_t size = strlen (str);
+  /* Resizing the array might change its used elements and we need below
+     to correct copy the elements.  */
+  size_t used = array->dynarray_header.used;
+
+  size_t newsize;
+  if (check_add_wrapv_size_t (used, size, &newsize)
+      || !char_array_resize (array, newsize))
+    return false;
+
+  /* Make room for the string and copy it.  */
+  memmove (array->dynarray_header.array + size, array->dynarray_header.array,
+           used);
+  memcpy (array->dynarray_header.array, str, size);
+  array->dynarray_header.used = newsize;
+  return true;
+}
+
+/* Append the contents of string 'str' to char_array 'array, including the
+   final '\0' byte.  */
+static bool __attribute_used__
+char_array_append_str (struct char_array *array, const char *str)
+{
+  size_t size = strlen (str);
+  /* Resizing the array might change its used elements and it used it below
+     to correct copy the elements.  */
+  size_t used = array->dynarray_header.used - 1;
+
+  /* array 'used' does account for final '\0', so there is no need to add
+     an extra element to calculate the final required size.  */
+  size_t newsize;
+  if (check_add_wrapv_size_t (used + 1, size, &newsize)
+      || !char_array_resize (array, newsize))
+    return false;
+
+  /* Start to append at '\0' up to string length and add a final '\0'.  */
+  *(char*) mempcpy (array->dynarray_header.array + used, str, size) = '\0';
+  array->dynarray_header.used = newsize;
+  return true;
+}
+
+/* Replace the contents starting of position 'pos' of char_array 'array'
+   with the contents of string 'str' up to 'len' bytes.  A final '\0'
+   is appended in the string.  */
+static bool __attribute_used__
+char_array_replace_str_pos (struct char_array *array, size_t pos,
+                            const char *str, size_t len)
+{
+  if (pos > array->dynarray_header.used)
+    return false;
+
+  size_t newsize;
+  if (check_add_wrapv_size_t (pos, len, &newsize)
+      || check_add_wrapv_size_t (newsize, 1, &newsize)
+      || !char_array_resize (array, newsize))
+    return false;
+
+  char *start = char_array_at (array, pos);
+  *(char *) mempcpy (start, str, len) = '\0';
+  array->dynarray_header.used = newsize;
+  return true;
+}
diff --git a/malloc/malloc-internal.h b/malloc/malloc-internal.h
index dbd801a..3066cd3 100644
--- a/malloc/malloc-internal.h
+++ b/malloc/malloc-internal.h
@@ -101,4 +101,18 @@  check_mul_overflow_size_t (size_t left, size_t right, size_t *result)
 #endif
 }
 
+/* Set *R = A + B.  Return true if the answer is mathematically incorrect due
+   to overflow; in this case, *R is the low order bits of the correct
+   answer.  */
+static inline bool
+check_add_wrapv_size_t (size_t a, size_t b, size_t *r)
+{
+#if 5 <= __GNUC__
+  return __builtin_add_overflow (a, b, r);
+#else
+  *r = a + b;
+  return *r < a;
+#endif
+}
+
 #endif /* _MALLOC_INTERNAL_H */
diff --git a/malloc/tst-char_array.c b/malloc/tst-char_array.c
new file mode 100644
index 0000000..53f9482
--- /dev/null
+++ b/malloc/tst-char_array.c
@@ -0,0 +1,107 @@ 
+/* Test for char_array.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <string.h>
+
+#include <malloc/char_array.c>
+
+#include <malloc.h>
+#include <mcheck.h>
+#include <stdint.h>
+#include <support/check.h>
+#include <support/support.h>
+
+static int
+do_test (void)
+{
+  mtrace ();
+
+  {
+    struct char_array str;
+    TEST_VERIFY_EXIT (char_array_init_empty (&str) == true);
+    TEST_VERIFY_EXIT (char_array_length (&str) == 0);
+    TEST_VERIFY_EXIT (char_array_is_empty (&str) == true);
+    TEST_VERIFY_EXIT (strcmp (char_array_str (&str), "") == 0);
+    char_array_free (&str);
+  }
+
+  {
+    struct char_array str;
+    TEST_VERIFY_EXIT (char_array_init_str (&str, "testing"));
+    TEST_VERIFY_EXIT (char_array_length (&str) == strlen ("testing"));
+    TEST_VERIFY_EXIT (char_array_pos (&str, 2) == 's');
+    TEST_VERIFY_EXIT (char_array_is_empty (&str) == false);
+    TEST_VERIFY_EXIT (strcmp (char_array_str (&str), "testing") == 0);
+    char_array_free (&str);
+  }
+
+  {
+    struct char_array str;
+    TEST_VERIFY_EXIT (char_array_init_str_size (&str, "testing", 4));
+    TEST_VERIFY_EXIT (char_array_length (&str) == 4);
+    TEST_VERIFY_EXIT (char_array_pos (&str, 2) == 's');
+    TEST_VERIFY_EXIT (char_array_is_empty (&str) == false);
+    TEST_VERIFY_EXIT (strcmp (char_array_str (&str), "test") == 0);
+    char_array_free (&str);
+  }
+
+  {
+    struct char_array str;
+    TEST_VERIFY_EXIT (char_array_init_str (&str, "testing"));
+    TEST_VERIFY_EXIT (char_array_set_str (&str, "abcdef"));
+    TEST_VERIFY_EXIT (strcmp (char_array_str (&str), "abcdef") == 0);
+    TEST_VERIFY_EXIT (char_array_set_str_size (&str, "abcdef", 4));
+    TEST_VERIFY_EXIT (strcmp (char_array_str (&str), "abcd") == 0);
+    char_array_free (&str);
+  }
+
+  {
+    struct char_array str;
+    TEST_VERIFY_EXIT (char_array_init_str (&str, "testing"));
+    TEST_VERIFY_EXIT (char_array_erase (&str, 4) == true);
+    TEST_VERIFY_EXIT (char_array_length (&str) == strlen ("testing") - 1);
+    TEST_VERIFY_EXIT (strcmp (char_array_str (&str), "testng") == 0);
+    TEST_VERIFY_EXIT (char_array_erase (&str, char_array_length (&str))
+		      == false);
+    TEST_VERIFY_EXIT (char_array_length (&str) == strlen ("testing") - 1);
+    TEST_VERIFY_EXIT (char_array_erase (&str, char_array_length (&str) - 1)
+		      == true);
+    TEST_VERIFY_EXIT (char_array_length (&str) == strlen ("testing") - 2);
+    TEST_VERIFY_EXIT (strcmp (char_array_str (&str), "testn") == 0);
+    char_array_free (&str);
+  }
+
+  {
+    struct char_array str;
+    TEST_VERIFY_EXIT (char_array_init_str (&str, "test"));
+    TEST_VERIFY_EXIT (char_array_prepend_str (&str, "123"));
+    TEST_VERIFY_EXIT (strcmp (char_array_str (&str), "123test") == 0);
+    TEST_VERIFY_EXIT (char_array_length (&str) == strlen ("123test"));
+    TEST_VERIFY_EXIT (char_array_append_str (&str, "456"));
+    TEST_VERIFY_EXIT (strcmp (char_array_str (&str), "123test456") == 0);
+    TEST_VERIFY_EXIT (char_array_length (&str) == strlen ("123test456"));
+    TEST_VERIFY_EXIT (char_array_replace_str_pos (&str, 7, "789", 3));
+    TEST_VERIFY_EXIT (strcmp (char_array_str (&str), "123test789") == 0);
+    TEST_VERIFY_EXIT (char_array_length (&str) == strlen ("123test789"));
+    char_array_free (&str);
+  }
+
+  return 0;
+}
+
+#include <support/test-driver.c>