Re: on memory usage with the 64bit patch, and interpretation of various numbers

From: Nikita Popov Date: Wed, 14 May 2014 18:48:34 +0000

Subject: Re: on memory usage with the 64bit patch, and interpretation of various numbers

References: 1 Groups: php.internals

Request: Send a blank email to internals+get-74209@lists.php.net to get a copy of this message

On Wed, May 14, 2014 at 6:39 PM, Pierre Joye <pierre.php@gmail.com> wrote:

> ## Rationale for size_t (string lengths):
>
> This has significant advantages. There are some costs to doing it, but
> they are not as significant as they may appear on the surface. Let's
> dive into it:
>
> ### It's The Correct Data Type
>
> The C89 spec indicates in 3.3.3.4 (
> http://port70.net/~nsz/c/c89/rationale/c3.html#size-95t-3-3-3-4
> ) that
> the size_t type was created specifically for usage in this context. It
> is always, 100% guaranteed to be able to hold the bounds of every
> possible array element. Strings in C are simply char arrays.
> Therefore, the correct data type to use for string sizes (which really
> are just an offset qualifier) is size_t.
>

I don't have a copy of the C standard handy, so I'll quote what C++ has to
say on size_t:

> The type size_t is an implementation-defined unsigned integer type that
is large enough to contain the size in bytes of any object

So a size_t type can contain the size of *any* object. If we want to
support *any* object, then we must use the size_t type, that's correct.
However, we may reasonably choose to disallow creating objects of certain
types (e.g. strings) with sizes that are larger than a uint32_t, to save
memory. If we chose to implement such a restriction, the uint32_t type
would provide us with the *exact* same guarantees as size_t does - it is
large enough to contain the size of any object that we allow to be created.

### It's The Secure Data Type
>
> size_t (and ptrdiff_t) are the only C89 types that are 100% guaranteed
> to be able to hold the size of any possible object that the compiler
> will support. Other types will vary depending on the data model that
> the compiler supports, as the spec only defines minimum widths.
>
> This is so important that CERT issued a coding standard for it:
> INT01-C (
> https://www.securecoding.cert.org/confluence/display/seccode/INT01-C.+Use+rsize_t+or+size_t+for+all+integer+values+representing+the+size+of+an+object
> ).
>
> One of the reasons is that it's difficult to do overflow checks in a
> portable way. See VU#162289: https://www.kb.cert.org/vuls/id/162289 .
> In there, they recommend using the C99 uintptr_t type, but suggest
> using size_t for platforms that don't have uintptr_t support (and
> since we target C89 for the engine, that's out).
>

Overflow checks are an issue that is independent of the chosen size storage
type. They always you need to occur. The addition of two size_t values can
overflow in just the same way as the addition of two uint32_t values. There
is no difference in security between using size_t and uint32_t. The
security issue lies in absence of proper overflow checks (or in the
presence of incorrect overflow checks).

If anybody can come up with an example where usage of uint32_t over size_t
(in a context where we ensured that allocations do not surpass that size)
causes security or other safety issues, please speak up. I wasn't able to
find anything in that direction.

### ZPP Changes
>
> The ZPP changes are critical. The reason is that varargs is casting an
> arbitrary block of memory to a type, and then writing to it. So
> existing code that does zpp("s", str, &int_len) would wind up with a
> buffer overflow. Because zpp would be trying to write a 64 bit value
> to a 32 bit container. The other 32 bits would fall off the end, into
> who knows what. At BEST this can result in a segfault. At worst,
> memory corruption and MASSIVE security vulnerabilities.
>
> Also note that the compiler *can't* and actively doesn't catch these
> types of errors. That means that it's largely luck and testing that
> will lead to it.
>
> So, I chose to break BC and rename the ZPP symbols. Because that WILL
> error, and provide the developer with a meaningful indication that an
> improper data type was provided. As I considered a fatal error that an
> invalid type was supplied was a better way of identifying to the
> developer that "HEY, THIS NEEDS TO BE CHANGED ASAP" than just letting
> them hit random segfaults at runtime.
>
> If there is a way to get around this by giving the compiler more
> information, then do it. But to just leave the types there, and leave
> it to chance if a buffer overflow occurs, is dangerous. Which is why I
> made the call that the ZPP types **needed** to be changed.
>

Afaik Johannes has written a Clang-based static analyzer that detects type
mismatches between the zpp format string and the passed arguments (gcov
also seems to have some kind of analyzer for that). That seems like a good
way to find incorrect declarations without having to rename everything -
and this also covers checks for all types that weren't changed :)

Nikita

Thread (10 messages)

Pierre JoyeWed, 14 May 2014 16:39:32 +0000
Nikita PopovWed, 14 May 2014 18:48:34 +0000
Stas MalyshevThu, 15 May 2014 22:03:24 +0000
Pierre JoyeFri, 16 May 2014 03:31:05 +0000
Ulf WendelFri, 16 May 2014 05:25:46 +0000
Pierre JoyeFri, 16 May 2014 05:30:15 +0000
Ulf WendelFri, 16 May 2014 07:40:43 +0000
Dmitry StogovFri, 16 May 2014 07:50:16 +0000
Pierre JoyeFri, 16 May 2014 08:59:08 +0000
Yasuo OhgakiSat, 05 Jul 2014 22:11:27 +0000

« previous	php.internals (#74209)	next »

From:	Nikita Popov	Date:	Wed, 14 May 2014 18:48:34 +0000
Subject:	Re: on memory usage with the 64bit patch, and interpretation of various numbers
References:	1	Groups:	php.internals
Request:	Send a blank email to internals+get-74209@lists.php.net to get a copy of this message