RE: [PHP-DEV] Rethinking 64bit sizes and PHP-NG

From: Date: Sat, 17 May 2014 18:46:21 +0000
Subject: RE: [PHP-DEV] Rethinking 64bit sizes and PHP-NG
References: 1  Groups: php.internals 
Request: Send a blank email to internals+get-74287@lists.php.net to get a copy of this message
Nikita,

I think your email is spot on, except for one thing.  Thankfully, there
haven't any accusations or name calling - at least none that I've seen.  And
that's a good thing, even when we have a heated debate.
I'll vote Yes on an RFC-translated version of this email in a heartbeat.

Zeev

> -----Original Message-----
> From: Nikita Popov [mailto:nikita.ppv@gmail.com]
> Sent: Saturday, May 17, 2014 9:27 PM
> To: PHP internals
> Subject: [PHP-DEV] Rethinking 64bit sizes and PHP-NG
>
> Hi internals!
>
> The discussions around the patch for improved 64bit support have
> deteriorated
> from a semi-reasonable discussions to pointless accusations and
> name-calling.
> I'd like to get back to discussing this issue from a technical point of
> view and see
> which parts of the patch can be salvaged and which can not.
>
> As the current voting results show, the 64bit changes, if accepted, will
> be
> integrated into phpng rather than into master. As such I will argue purely
> from a
> phpng perspective.
>
> Before going any further, it should be established that two aspects of the
> 64 bit patch are, as far as I can see, acceptable to everyone: The
> introduction of
> 64bit integers on Win64 and other LLP64 architectures and the use of an
> unsigned integer type for lengths and sizes.
>
> The disputed aspect of the patch is the use of 64 bit lengths and sizes.
> It has
> been argued that the use of 64bit sizes will significantly increase memory
> consumption of the phpng branch and may also negatively impact
> performance.
>
> If you take a look at an early patch for porting the 64bit changes for
> phpng [1]
> you'll find that these are very real concerns, as the changes increase the
> sizes of
> many commonly used data structures.
>
> However, this is only the case if 64bit sizes are blindly used in every
> possible
> place. In the following I'll argue how a more careful choice of the used
> size type
> in key places can make this patch a lot more realistic.
>
> First of all it should be noted that the patch introduces size_t usage in
> several
> places where, as far as I can see, it is not necessary. For example it
> changes the
> storage type of line numbers from zend_uint to zend_size_t:
>
> From zend_opline:
>     - uint lineno;
>     + zend_size_t lineno;
>
> From zend_class_entry:
>     - zend_uint line_start;
>     - zend_uint line_end;
>     + zend_size_t line_start;
>     + zend_size_t line_end;
>
> I don't think we need to concern ourselves about files with more than 4
> billion
> lines.
>
> Similarly the patch also changes the length of argument names and class
> typehints to 64bit:
>
> From zend_arg_info:
>     - zend_uint name_len;
>     + zend_size_t name_len;
>        const char *class_name;
>     - zend_uint class_name_len;
>     + zend_size_t class_name_len;
>
> This once again seem rather unnecessary and wasteful. Another case are the
> number of arguments of a function:
>
> From zend_op_array:
>     - zend_uint num_args;
>     - zend_uint required_num_args;
>     + zend_size_t num_args;
>     + zend_size_t required_num_args;
>
> Again it seems doubtful whether functions with more than 4 billion
> arguments
> are quite relevant.
>
> While there are more examples than these, I'll not go into this point any
> further,
> as the pattern should be clear. Instead we'll be moving on to the
> zend_string and
> HashTable structures.
>
> For the HashTable structure, the diff currently looks as follows:
>
>     - zend_uint         nTableSize;
>     - zend_uint         nTableMask;
>     - zend_uint         nNumUsed;
>     - zend_uint         nNumOfElements;
>     - long              nNextFreeElement;
>     + zend_uint_t       nTableSize;
>     + zend_uint_t       nTableMask;
>     + zend_uint_t       nNumUsed;
>     + zend_uint_t       nNumOfElements;
>     + zend_int_t        nNextFreeElement;
>      Bucket           *arData;
>     - zend_uint        *arHash;
>     + zend_uint_t      *arHash;
>       dtor_func_t       pDestructor;
>       zend_uint         nInternalPointer;
>
> This actually misses changing nInternalPointer. If we change that one as
> well
> and account for differences in alignment, the total increase for the
> HashTable
> structure is 6*4 = 24 bytes. For a heavily used structure, that's a lot.
> However,
> even worse are the per-bucket memory increases:
>
> From the diff above you can already see that the arHash array will be
> twice as
> large, i.e. you'll get an additional 4 bytes per bucket. However - and
> this is
> currently not represented in the patch stub - changing hashtables to 64bit
> lengths will make the use of Z_NEXT(bucket->val) to store the next
> collision
> resolution index impossible. As such another 8 bytes per bucket will be
> needed.
>
> This brings us to a total increase of 24 bytes per hashtable and 12 bytes
> per
> bucket. Especially the latter seems unacceptable to me.
>
> As such, I would suggest to *not* change sizes for the hashtable structure
> and
> limit these changes to other parts of the engine (strings in particular).
>
> To put this into context, consider what it actually means to have an array
> with
> more than 4 billion elements (which is the point where the different size
> type
> becomes relevant). With hashtables as implemented in master with
> 144 bytes per bucket, this would require 2^32 * 144 = 576 GB of memory. If
> we
> introduced 64bit hashtable sizes in phpng, you'd still need 206 GB of
> memory for
> to create an array that can make use of those sizes.
>
> I hope this visualizes that supporting 64bit sizes for hashtables is not
> necessary.
> We can easily implement a size restriction in zend_hash_do_resize. As
> hashtables go through a centralized API controlling the size limitation is
> a lot
> more straightforward than for strings.
>
> Now that we have covered the HashTable changes, we're only left with one
> significant change. The diff for the zend_string structure is as follows:
>
>       zend_refcounted   gc;
>     - zend_ulong        h;                /* hash value */
>     - int               len;
>     + zend_uint_t       h;                /* hash value */
>     + zend_size_t       len;
>       char              val[1];
>
> This means that a zend_string will on average be approximately 4 bytes
> longer.
> (Approximately because the exact difference is subject to details of the
> allocation process and depends on the string length distribution.)
>
> I think that this difference might be acceptable, if this is the cost of
> uniform
> string size usage in the codebase. I tested the impact that adding
> 4 bytes on the string structure has on one application and the result was
> a
> change from 38.2 to 38.5 MiB, which translates to a 0.7% difference. That
> difference is acceptable to me.
>
> However I don't know how representative that result is and I don't
> currently
> have time to set up a testing environment for something like wordpress on
> a
> 64bit vm. I would appreciate it if somebody who already has such a setup
> could
> run a test for this, to confirm that there is indeed no large difference
> in memory
> consumption. Just add a "int unused" after the len member and compare
> memory usage.
>
> So, what I'm suggesting here is the following:
>
> We implement 64bit integers on LLP64, we implement unsigned sizes, we
> implement size_t for strings lengths. I honestly still don't fully
> understand the
> necessity for the latter, but if the impact in phpng is indeed low (in the
> 1%
> range), then I think I can compromise on that.
>
> What we don't implement is indiscriminate usage of size_t or other 64bit
> types
> all over the place (like for linenos, argument counts, argument lengths
> etc) and
> usage of 64bit sizes for hashtables.
>
> This would be acceptable for me and I hope that it would also be
> acceptable for
> others.
>
> While we're at it, I would also like to quickly touch on the topic of
> naming and
> renaming. I hope it is clear now that we need both 64bit types for some
> things
> and 32bit types for others (like HT sizes, linenos, etc).
> As such I would suggest to use zend_(u)long_t as the 64bit type (on 64bit
> platforms of course) and zend_(u)int_t as the 32bit type. This more or
> less
> matches our current naming and hopefully avoids confusion.
>
> With that naming convention, we should also continue using IS_LONG and
> Z_LVAL instead of the suggested IS_INT and Z_IVAL. I'd also keep using
> Z_STRLEN instead of Z_STRSIZE and wouldn't do changes to zpp, especially
> as
> they conflict with other zpp changes in phpng. I understand why these
> changes
> were beneficial in the original 64bit patch, however considering that the
> usage
> of many APIs changed in phpng and extensions need to be carefully reviewed
> anyway, I don't think these renames make a lot of sense anymore.
>
> So, that's my opinion on how we should proceed with the 64bit patch. I
> very
> much hope that we can reach some consensus about this.
>
> Credits: This mail is the result of a discussion between Anthony, Bob and
> myself.
>
> Nikita
>
>   [1]: https://gist.github.com/weltling/a941d8cf6c731640b51f


Thread (34 messages)

« previous php.internals (#74287) next »