Re: [VOTE] [RFC] 64 bit platform improvements for string length and integer

From: Andrea Faulds Date: Wed, 14 May 2014 17:13:34 +0000

Subject: Re: [VOTE] [RFC] 64 bit platform improvements for string length and integer

References: 1 2 3 4 5 6 7 8 9 Groups: php.internals

Request: Send a blank email to internals+get-74196@lists.php.net to get a copy of this message

On 14 May 2014, at 18:10, Andrey Hristov <php@hristov.com> wrote:

> This is purely academical. And the standard library has to support everything, it's the
> standard library. PHP is on its own, and if an addition is of little use to the most of the
> developers/scripts, why the heck it should be in/the default.
> A good solution is to typedef a php_size_t, leave it to uint32_t and for those, who need more
> than 4GB in strings and elements they can just build with size_t as definition. Offer the choice,
> don't force.

It is not just “purely academic”. Here, let me quote Pierre (in 'Re: [PHP-DEV] [VOTE] [RFC]
64 bit platform improvements for string length and integer’, just now) quoting Anthony:

> This thread has been pointed out to me by a few people. As the
> originator of this patch and concept I feel that I should clarify a
> few points.
> 
> # Rationale
> 
> The reason that I originally started this patch was to clean up and
> standardize the underlying types. This is to introduce predictability,
> portability and type sanity into the engine and entire cphp
> implementation.
> 
> ## Rationale for Int64:
> 
> Without this patch, the size of integers (longs) varies based on which
> compiler you use. This means that even for identical target
> architectures behavior can change with respect to userland code.
> Refactoring this allows for consistent sizes that can be relied upon
> by the programmer. This is an effort to make it a bit easier to rely
> on integer width as a developer.
> 
> And ideally this is a free cost to most implementations, since ints
> are already 64 bits wide, so there is no memory overhead. And
> performance stays the same as well.
> 
> ## Rationale for size_t (string lengths):
> 
> This has significant advantages. There are some costs to doing it, but
> they are not as significant as they may appear on the surface. Let's
> dive into it:
> 
> ### It's The Correct Data Type
> 
> The C89 spec indicates in 3.3.3.4 (
> http://port70.net/~nsz/c/c89/rationale/c3.html#size-95t-3-3-3-4
> ) that
> the size_t type was created specifically for usage in this context. It
> is always, 100% guaranteed to be able to hold the bounds of every
> possible array element. Strings in C are simply char arrays.
> Therefore, the correct data type to use for string sizes (which really
> are just an offset qualifier) is size_t.
> 
> Additionally, calloc, malloc, etc all expect parameters of type size_t
> for exactly this reason.
> 
> Another good reference on it: http://www.viva64.com/en/a/0050/
> 
> ### It's The Secure Data Type
> 
> size_t (and ptrdiff_t) are the only C89 types that are 100% guaranteed
> to be able to hold the size of any possible object that the compiler
> will support. Other types will vary depending on the data model that
> the compiler supports, as the spec only defines minimum widths.
> 
> This is so important that CERT issued a coding standard for it:
> INT01-C ( https://www.securecoding.cert.org/confluence/display/seccode/INT01-C.+Use+rsize_t+or+size_t+for+all+integer+values+representing+the+size+of+an+object
> ).
> 
> One of the reasons is that it's difficult to do overflow checks in a
> portable way. See VU#162289: https://www.kb.cert.org/vuls/id/162289 .
> In there, they recommend using the C99 uintptr_t type, but suggest
> using size_t for platforms that don't have uintptr_t support (and
> since we target C89 for the engine, that's out).
> 
> Apple's Secure Coding Guide's section on Avoiding Integer Overflows
> and Underflows says the same thing:
> https://developer.apple.com/library/mac/documentation/security/conceptual/securecodingguide/Articles/BufferOverflows.html
> 
> ### About Long Strings
> 
> The fact that changing to size_t allows strings (and arrays) to be >
> 4gb is a side-effect. A welcome one, but a side effect none the less.
> The primary reason to use it is that it's the correct data type, and
> gives you the most safety and security.
> 
> # Response To Concerns Mentioned
> 
> I'll respond here to some of the concerns mentioned in this thread:
> 
> ## size_t uses more memory and will result in more CPU cache misses,
> which will result in worse performance
> 
> Well, size_t will use more memory. No doubt about that.
> 
> But the performance side is more nuanced. And as several benchmarks in
> this thread indicate, there isn't a practical difference. Heck, the
> benchmarks on Windows show an improvement in some cases.
> 
> And there is a reason for that. Since a pointer is a 64 bit data type,
> and a int is a 32 bit data type, any time you add the two will result
> in extra CPU cycles needed for the cast. This can be clearly seen by
> analyzing a simple malloc call with an int vs a size_t param. Here's
> the diff:
> 
>    < movl $5, -12(%rbp)
>    < movl -12(%rbp), %eax
>    < cltq
>    ---
>> movq $5, -16(%rbp)
>> movq -16(%rbp), %rax
> 
> Now, a cache miss is much more expensive than a cast, but we don't
> have proof that cache misses will actually occur.
> 
> In fact, in the benchmarks, the worst difference is 2%. Which is
> hardly significant (as indicated by several people here). But also
> notice that in both benchmarks (those done by Microsoft, and those
> done by Dmitry), some specific tests actually executed **faster** with
> the size_t transforms (namely Hello World, Wordpress, etc). So to say
> even 2% is not really the full story.
> 
> We'll come back to the memory thing in a bit.
> 
> ## Macro Renames and ZPP changes
> 
> This was my idea, and I don't think it's been properly justified.
> 
> ### ZPP Changes
> 
> The ZPP changes are critical. The reason is that varargs is casting an
> arbitrary block of memory to a type, and then writing to it. So
> existing code that does zpp("s", str, &int_len) would wind up with a
> buffer overflow. Because zpp would be trying to write a 64 bit value
> to a 32 bit container. The other 32 bits would fall off the end, into
> who knows what. At BEST this can result in a segfault. At worst,
> memory corruption and MASSIVE security vulnerabilities.
> 
> Also note that the compiler *can't* and actively doesn't catch these
> types of errors. That means that it's largely luck and testing that
> will lead to it.
> 
> So, I chose to break BC and rename the ZPP symbols. Because that WILL
> error, and provide the developer with a meaningful indication that an
> improper data type was provided. As I considered a fatal error that an
> invalid type was supplied was a better way of identifying to the
> developer that "HEY, THIS NEEDS TO BE CHANGED ASAP" than just letting
> them hit random segfaults at runtime.
> 
> If there is a way to get around this by giving the compiler more
> information, then do it. But to just leave the types there, and leave
> it to chance if a buffer overflow occurs, is dangerous. Which is why I
> made the call that the ZPP types **needed** to be changed.
> 
> ### Macro Renames
> 
> The reason for the rename is largely the same as with the ZPP changes.
> The severity of not changing is less (since the compiler will warn and
> do an implicit cast for you). But it's still there. Which is why I
> chose to change it. This is less critical, but was done to better
> indicate to the developer what needs to change to properly support the
> new system.
> 
> ## Memory Overhead
> 
> This is definitely a concern. There is a potential to double the
> amount of memory that PHP takes. Which on the surface looks enormous.
> And if we stop at the surface, we definitely shouldn't do it!
> 
> But as we look deeper, we see that in actuality, the difference is not
> double. In fact, most data structures, as identified by Dmitry
> himself, only increase by between 6% (zend_op_array) 50%
> (zend_string's size). So that "double" figure quickly drops.
> 
> But that's at the structure level. Let's look at what actually happens
> in practice. Dmitry himself also provides these answers. The average
> memory increase is 8% for Wordpress, and 6% for ZF1.
> 
> Let's put that 8% in context. Wordpress used 12MB, and now it uses
> 13MB. 1MB more. That's not overly significant. ZF used 29MB. Now it
> uses 31MB. Still not overly significant.
> 
> Don't get me wrong, it's still more. And more is bad. But it's not
> nearly as bad as it's being played out to be.
> 
> To put this into context, 5.4 saved up to 50% memory from 5.3
> (depending on benchmark). 8 << 50.
> 
> Now, I'm not saying that memory should be thrown around willy-nilly.
> But given the rationale that I gave above, I think the benefits of
> sanity, portability and security clearly are significant enough for
> the relatively small cost in memory.

--
Andrea Faulds
http://ajf.me/

Thread (87 messages)

Anatol BelskiTue, 13 May 2014 20:51:41 +0000
Dmitry StogovTue, 13 May 2014 22:52:21 +0000
Pierre JoyeWed, 14 May 2014 04:44:59 +0000
Nikita PopovWed, 14 May 2014 05:30:48 +0000
Terry EllisonWed, 14 May 2014 05:44:44 +0000
Pierre JoyeWed, 14 May 2014 05:46:46 +0000
Lester CaineWed, 14 May 2014 08:16:51 +0000
Ferenc KovacsWed, 14 May 2014 08:22:49 +0000
Lester CaineWed, 14 May 2014 08:39:36 +0000
Ferenc KovacsWed, 14 May 2014 08:46:21 +0000
Pierre JoyeWed, 14 May 2014 08:49:48 +0000
Dmitry StogovWed, 14 May 2014 08:53:19 +0000
Andrea FauldsWed, 14 May 2014 08:57:52 +0000
Andrey HristovWed, 14 May 2014 17:10:54 +0000
Andrea FauldsWed, 14 May 2014 17:13:34 +0000
Ferenc KovacsWed, 14 May 2014 17:28:16 +0000
Andrey HristovWed, 14 May 2014 17:28:21 +0000
Pierre JoyeWed, 14 May 2014 17:41:33 +0000
Pierre JoyeWed, 14 May 2014 09:01:18 +0000
Stas MalyshevWed, 14 May 2014 18:20:24 +0000
Andrey HristovWed, 14 May 2014 18:34:56 +0000
Stas MalyshevWed, 14 May 2014 18:15:58 +0000
Zeev SuraskiWed, 14 May 2014 07:52:41 +0000RE: [PHP-DEV] [VOTE] [RFC] 64 bit platform improvements for string length and integer
Christian StollerWed, 14 May 2014 08:08:06 +0000
Ferenc KovacsWed, 14 May 2014 08:19:32 +0000Re: [VOTE] [RFC] 64 bit platform improvements for string length and integer
Dmitry StogovWed, 14 May 2014 08:21:13 +0000Re: [VOTE] [RFC] 64 bit platform improvements for string length and integer
Pierre JoyeWed, 14 May 2014 08:46:56 +0000
Lester CaineWed, 14 May 2014 09:13:35 +0000
Dmitry StogovWed, 14 May 2014 09:24:37 +0000
Lester CaineWed, 14 May 2014 19:39:32 +0000
Dmitry StogovWed, 14 May 2014 19:48:25 +0000
Lester CaineWed, 14 May 2014 20:41:46 +0000
Anatol BelskiSat, 17 May 2014 10:59:57 +0000
Zeev SuraskiSat, 17 May 2014 11:15:56 +0000
Pierre JoyeSat, 17 May 2014 12:02:50 +0000
Johannes SchlüterMon, 19 May 2014 12:21:11 +0000
Dmitry StogovMon, 19 May 2014 13:31:15 +0000
Andrea FauldsMon, 19 May 2014 13:57:30 +0000
Dmitry StogovMon, 19 May 2014 14:03:49 +0000
Andrea FauldsMon, 19 May 2014 15:51:40 +0000
David Soria ParraTue, 20 May 2014 18:34:55 +0000
Pierre JoyeWed, 21 May 2014 06:24:16 +0000
Peter CowburnSun, 20 Jul 2014 15:44:22 +0000
Chris WrightWed, 30 Jul 2014 07:42:56 +0000
Andrea FauldsSat, 17 May 2014 14:33:57 +0000
Stas MalyshevSat, 17 May 2014 21:57:52 +0000
Zeev SuraskiWed, 14 May 2014 08:38:54 +0000
Ferenc KovacsWed, 14 May 2014 08:12:18 +0000Re: [VOTE] [RFC] 64 bit platform improvements for string length and integer
Zeev SuraskiWed, 14 May 2014 08:31:33 +0000RE: [PHP-DEV] [VOTE] [RFC] 64 bit platform improvements for string length and integer
Kris CraigWed, 14 May 2014 09:37:42 +0000
Andrea FauldsWed, 14 May 2014 09:40:02 +0000
Kris CraigWed, 14 May 2014 09:44:43 +0000
Kris CraigWed, 14 May 2014 09:47:14 +0000
Andrea FauldsWed, 14 May 2014 09:49:44 +0000
Pierre JoyeWed, 14 May 2014 09:54:40 +0000
Ferenc KovacsWed, 14 May 2014 10:11:01 +0000
Andrey HristovWed, 14 May 2014 18:44:54 +0000
Ferenc KovacsWed, 14 May 2014 09:56:37 +0000
Kris CraigWed, 14 May 2014 10:02:29 +0000
Daniel ConvissorSat, 17 May 2014 13:30:37 +0000Re: [VOTE] [RFC] 2/3 vote needed (was: 64 bit platform improvements...)
Kris CraigSat, 17 May 2014 23:53:18 +0000
guilhermeblanco@gmail.comSun, 18 May 2014 04:57:58 +0000
Zeev SuraskiSun, 18 May 2014 06:13:35 +0000RE: [PHP-DEV] [VOTE] [RFC] 2/3 vote needed (was: 64 bit platform improvements...)
Andreas HeiglSun, 18 May 2014 07:15:37 +0000Re: [VOTE] [RFC] 2/3 vote needed (was: 64 bit platform improvements...)
Kris CraigSun, 18 May 2014 07:26:55 +0000
Zeev SuraskiSun, 18 May 2014 08:28:38 +0000RE: [PHP-DEV] [VOTE] [RFC] 2/3 vote needed (was: 64 bit platform improvements...)
Lester CaineSun, 18 May 2014 08:29:39 +0000Re: [VOTE] [RFC] 2/3 vote needed
Zeev SuraskiSun, 18 May 2014 06:00:52 +0000RE: [PHP-DEV] [VOTE] [RFC] 2/3 vote needed (was: 64 bit platform improvements...)
Pierre JoyeWed, 14 May 2014 08:43:05 +0000Re: [VOTE] [RFC] 64 bit platform improvements for string length and integer
Dmitry StogovWed, 14 May 2014 09:16:11 +0000
Ulf WendelWed, 14 May 2014 09:44:31 +0000
Stas MalyshevWed, 14 May 2014 18:24:00 +0000Re: [VOTE] [RFC] 64 bit platform improvements for string length and integer
Dmitry StogovWed, 14 May 2014 05:57:59 +0000
Pierre JoyeWed, 14 May 2014 06:27:15 +0000
Dmitry StogovWed, 14 May 2014 07:01:23 +0000
Pierre JoyeWed, 14 May 2014 07:16:49 +0000
Yasuo OhgakiMon, 19 May 2014 23:06:13 +0000
Andrea FauldsWed, 14 May 2014 10:07:39 +0000
Pierre JoyeWed, 14 May 2014 10:16:22 +0000
Andrea FauldsWed, 14 May 2014 10:17:36 +0000
Dmitry StogovWed, 14 May 2014 11:18:23 +0000
Pierre JoyeWed, 14 May 2014 15:59:40 +0000
Andrea FauldsWed, 14 May 2014 17:04:12 +0000
Dmitry StogovWed, 14 May 2014 17:14:24 +0000
Pierre JoyeWed, 14 May 2014 17:37:57 +0000
Stas MalyshevWed, 14 May 2014 18:35:16 +0000
Pierre JoyeWed, 14 May 2014 19:10:44 +0000

« previous	php.internals (#74196)	next »

From:	Andrea Faulds	Date:	Wed, 14 May 2014 17:13:34 +0000
Subject:	Re: [VOTE] [RFC] 64 bit platform improvements for string length and integer
References:	1 2 3 4 5 6 7 8 9	Groups:	php.internals
Request:	Send a blank email to internals+get-74196@lists.php.net to get a copy of this message