Hi Nikita,
Thanks for this deep analyzes.
I hope, this will lead us to agreement.
I see no problems with accepting IS_LONG part of the patch.
Then we may think about string length.
As you pointed, it makes no sense to use size_t everywhere, and may be
using it only in "right" places won't make serious harm.
Thanks. Dmitry.
On Sat, May 17, 2014 at 10:27 PM, Nikita Popov <nikita.ppv@gmail.com> wrote:
> Hi internals!
>
> The discussions around the patch for improved 64bit support have
> deteriorated from a semi-reasonable discussions to pointless accusations
> and name-calling. I'd like to get back to discussing this issue from a
> technical point of view and see which parts of the patch can be salvaged
> and which can not.
>
> As the current voting results show, the 64bit changes, if accepted, will be
> integrated into phpng rather than into master. As such I will argue purely
> from a phpng perspective.
>
> Before going any further, it should be established that two aspects of the
> 64 bit patch are, as far as I can see, acceptable to everyone: The
> introduction of 64bit integers on Win64 and other LLP64 architectures and
> the use of an unsigned integer type for lengths and sizes.
>
> The disputed aspect of the patch is the use of 64 bit lengths and sizes. It
> has been argued that the use of 64bit sizes will significantly increase
> memory consumption of the phpng branch and may also negatively impact
> performance.
>
> If you take a look at an early patch for porting the 64bit changes for
> phpng [1] you'll find that these are very real concerns, as the changes
> increase the sizes of many commonly used data structures.
>
> However, this is only the case if 64bit sizes are blindly used in every
> possible place. In the following I'll argue how a more careful choice of
> the used size type in key places can make this patch a lot more realistic.
>
> First of all it should be noted that the patch introduces size_t usage in
> several places where, as far as I can see, it is not necessary. For example
> it changes the storage type of line numbers from zend_uint to zend_size_t:
>
> From zend_opline:
> - uint lineno;
> + zend_size_t lineno;
>
> From zend_class_entry:
> - zend_uint line_start;
> - zend_uint line_end;
> + zend_size_t line_start;
> + zend_size_t line_end;
>
> I don't think we need to concern ourselves about files with more than 4
> billion lines.
>
> Similarly the patch also changes the length of argument names and class
> typehints to 64bit:
>
> From zend_arg_info:
> - zend_uint name_len;
> + zend_size_t name_len;
> const char *class_name;
> - zend_uint class_name_len;
> + zend_size_t class_name_len;
>
> This once again seem rather unnecessary and wasteful. Another case are the
> number of arguments of a function:
>
> From zend_op_array:
> - zend_uint num_args;
> - zend_uint required_num_args;
> + zend_size_t num_args;
> + zend_size_t required_num_args;
>
> Again it seems doubtful whether functions with more than 4 billion
> arguments are quite relevant.
>
> While there are more examples than these, I'll not go into this point any
> further, as the pattern should be clear. Instead we'll be moving on to the
> zend_string and HashTable structures.
>
> For the HashTable structure, the diff currently looks as follows:
>
> - zend_uint nTableSize;
> - zend_uint nTableMask;
> - zend_uint nNumUsed;
> - zend_uint nNumOfElements;
> - long nNextFreeElement;
> + zend_uint_t nTableSize;
> + zend_uint_t nTableMask;
> + zend_uint_t nNumUsed;
> + zend_uint_t nNumOfElements;
> + zend_int_t nNextFreeElement;
> Bucket *arData;
> - zend_uint *arHash;
> + zend_uint_t *arHash;
> dtor_func_t pDestructor;
> zend_uint nInternalPointer;
>
> This actually misses changing nInternalPointer. If we change that one as
> well and account for differences in alignment, the total increase for the
> HashTable structure is 6*4 = 24 bytes. For a heavily used structure, that's
> a lot. However, even worse are the per-bucket memory increases:
>
> From the diff above you can already see that the arHash array will be twice
> as large, i.e. you'll get an additional 4 bytes per bucket. However - and
> this is currently not represented in the patch stub - changing hashtables
> to 64bit lengths will make the use of Z_NEXT(bucket->val) to store the next
> collision resolution index impossible. As such another 8 bytes per bucket
> will be needed.
>
> This brings us to a total increase of 24 bytes per hashtable and 12 bytes
> per bucket. Especially the latter seems unacceptable to me.
>
> As such, I would suggest to *not* change sizes for the hashtable structure
> and limit these changes to other parts of the engine (strings in
> particular).
>
> To put this into context, consider what it actually means to have an array
> with more than 4 billion elements (which is the point where the different
> size type becomes relevant). With hashtables as implemented in master with
> 144 bytes per bucket, this would require 2^32 * 144 = 576 GB of memory. If
> we introduced 64bit hashtable sizes in phpng, you'd still need 206 GB of
> memory for to create an array that can make use of those sizes.
>
> I hope this visualizes that supporting 64bit sizes for hashtables is not
> necessary. We can easily implement a size restriction in
> zend_hash_do_resize. As hashtables go through a centralized API controlling
> the size limitation is a lot more straightforward than for strings.
>
> Now that we have covered the HashTable changes, we're only left with one
> significant change. The diff for the zend_string structure is as follows:
>
> zend_refcounted gc;
> - zend_ulong h; /* hash value */
> - int len;
> + zend_uint_t h; /* hash value */
> + zend_size_t len;
> char val[1];
>
> This means that a zend_string will on average be approximately 4 bytes
> longer. (Approximately because the exact difference is subject to details
> of the allocation process and depends on the string length distribution.)
>
> I think that this difference might be acceptable, if this is the cost of
> uniform string size usage in the codebase. I tested the impact that adding
> 4 bytes on the string structure has on one application and the result was a
> change from 38.2 to 38.5 MiB, which translates to a 0.7% difference. That
> difference is acceptable to me.
>
> However I don't know how representative that result is and I don't
> currently have time to set up a testing environment for something like
> wordpress on a 64bit vm. I would appreciate it if somebody who already has
> such a setup could run a test for this, to confirm that there is indeed no
> large difference in memory consumption. Just add a "int unused" after the
> len member and compare memory usage.
>
> So, what I'm suggesting here is the following:
>
> We implement 64bit integers on LLP64, we implement unsigned sizes, we
> implement size_t for strings lengths. I honestly still don't fully
> understand the necessity for the latter, but if the impact in phpng is
> indeed low (in the 1% range), then I think I can compromise on that.
>
> What we don't implement is indiscriminate usage of size_t or other 64bit
> types all over the place (like for linenos, argument counts, argument
> lengths etc) and usage of 64bit sizes for hashtables.
>
> This would be acceptable for me and I hope that it would also be acceptable
> for others.
>
> While we're at it, I would also like to quickly touch on the topic of
> naming and renaming. I hope it is clear now that we need both 64bit types
> for some things and 32bit types for others (like HT sizes, linenos, etc).
> As such I would suggest to use zend_(u)long_t as the 64bit type (on 64bit
> platforms of course) and zend_(u)int_t as the 32bit type. This more or less
> matches our current naming and hopefully avoids confusion.
>
> With that naming convention, we should also continue using IS_LONG and
> Z_LVAL instead of the suggested IS_INT and Z_IVAL. I'd also keep using
> Z_STRLEN instead of Z_STRSIZE and wouldn't do changes to zpp, especially as
> they conflict with other zpp changes in phpng. I understand why these
> changes were beneficial in the original 64bit patch, however considering
> that the usage of many APIs changed in phpng and extensions need to be
> carefully reviewed anyway, I don't think these renames make a lot of sense
> anymore.
>
> So, that's my opinion on how we should proceed with the 64bit patch. I very
> much hope that we can reach some consensus about this.
>
> Credits: This mail is the result of a discussion between Anthony, Bob and
> myself.
>
> Nikita
>
> [1]: https://gist.github.com/weltling/a941d8cf6c731640b51f
>