Hi,
On Sat, May 17, 2014 22:12, Dmitry Stogov wrote:
> Hi Nikita,
>
>
> Thanks for this deep analyzes.
> I hope, this will lead us to agreement.
>
>
> I see no problems with accepting IS_LONG part of the patch.
> Then we may think about string length.
> As you pointed, it makes no sense to use size_t everywhere, and may be
> using it only in "right" places won't make serious harm.
>
> Thanks. Dmitry.
>
>
>
>
> On Sat, May 17, 2014 at 10:27 PM, Nikita Popov <nikita.ppv@gmail.com>
> wrote:
>
>
>> Hi internals!
>>
>>
>> The discussions around the patch for improved 64bit support have
>> deteriorated from a semi-reasonable discussions to pointless accusations
>> and name-calling. I'd like to get back to discussing this issue from a
>> technical point of view and see which parts of the patch can be
>> salvaged and which can not.
>>
>> As the current voting results show, the 64bit changes, if accepted,
>> will be integrated into phpng rather than into master. As such I will
>> argue purely from a phpng perspective.
>>
>> Before going any further, it should be established that two aspects of
>> the 64 bit patch are, as far as I can see, acceptable to everyone: The
>> introduction of 64bit integers on Win64 and other LLP64 architectures
>> and the use of an unsigned integer type for lengths and sizes.
>>
>> The disputed aspect of the patch is the use of 64 bit lengths and
>> sizes. It has been argued that the use of 64bit sizes will significantly
>> increase memory consumption of the phpng branch and may also negatively
>> impact performance.
>>
>> If you take a look at an early patch for porting the 64bit changes for
>> phpng [1] you'll find that these are very real concerns, as the changes
>> increase the sizes of many commonly used data structures.
>>
>> However, this is only the case if 64bit sizes are blindly used in every
>> possible place. In the following I'll argue how a more careful choice
>> of the used size type in key places can make this patch a lot more
>> realistic.
>>
>> First of all it should be noted that the patch introduces size_t usage
>> in several places where, as far as I can see, it is not necessary. For
>> example it changes the storage type of line numbers from zend_uint to
>> zend_size_t:
>>
>>
>> From zend_opline:
>> - uint lineno;
>> + zend_size_t lineno;
>>
>>
>> From zend_class_entry:
>> - zend_uint line_start;
>> - zend_uint line_end;
>> + zend_size_t line_start;
>> + zend_size_t line_end;
>>
>>
>> I don't think we need to concern ourselves about files with more than 4
>> billion lines.
>>
>> Similarly the patch also changes the length of argument names and class
>> typehints to 64bit:
>>
>> From zend_arg_info:
>> - zend_uint name_len;
>> + zend_size_t name_len;
>> const char *class_name; - zend_uint class_name_len;
>> + zend_size_t class_name_len;
>>
>>
>> This once again seem rather unnecessary and wasteful. Another case are
>> the number of arguments of a function:
>>
>> From zend_op_array:
>> - zend_uint num_args;
>> - zend_uint required_num_args;
>> + zend_size_t num_args;
>> + zend_size_t required_num_args;
>>
>>
>> Again it seems doubtful whether functions with more than 4 billion
>> arguments are quite relevant.
>>
>> While there are more examples than these, I'll not go into this point
>> any further, as the pattern should be clear. Instead we'll be moving on
>> to the zend_string and HashTable structures.
>>
>> For the HashTable structure, the diff currently looks as follows:
>>
>>
>> - zend_uint nTableSize;
>> - zend_uint nTableMask;
>> - zend_uint nNumUsed;
>> - zend_uint nNumOfElements;
>> - long nNextFreeElement;
>> + zend_uint_t nTableSize;
>> + zend_uint_t nTableMask;
>> + zend_uint_t nNumUsed;
>> + zend_uint_t nNumOfElements;
>> + zend_int_t nNextFreeElement;
>> Bucket *arData;
>> - zend_uint *arHash;
>> + zend_uint_t *arHash;
>> dtor_func_t pDestructor; zend_uint nInternalPointer;
>>
>> This actually misses changing nInternalPointer. If we change that one
>> as well and account for differences in alignment, the total increase for
>> the HashTable structure is 6*4 = 24 bytes. For a heavily used structure,
>> that's a lot. However, even worse are the per-bucket memory increases:
>>
>> From the diff above you can already see that the arHash array will be
>> twice as large, i.e. you'll get an additional 4 bytes per bucket.
>> However - and
>> this is currently not represented in the patch stub - changing
>> hashtables to 64bit lengths will make the use of Z_NEXT(bucket->val) to
>> store the next collision resolution index impossible. As such another 8
>> bytes per bucket will be needed.
>>
>> This brings us to a total increase of 24 bytes per hashtable and 12
>> bytes per bucket. Especially the latter seems unacceptable to me.
>>
>> As such, I would suggest to *not* change sizes for the hashtable
>> structure and limit these changes to other parts of the engine (strings
>> in particular).
>>
>> To put this into context, consider what it actually means to have an
>> array with more than 4 billion elements (which is the point where the
>> different size type becomes relevant). With hashtables as implemented in
>> master with 144 bytes per bucket, this would require 2^32 * 144 = 576 GB
>> of memory. If we introduced 64bit hashtable sizes in phpng, you'd still
>> need 206 GB of memory for to create an array that can make use of those
>> sizes.
>>
>> I hope this visualizes that supporting 64bit sizes for hashtables is
>> not necessary. We can easily implement a size restriction in
>> zend_hash_do_resize. As hashtables go through a centralized API
>> controlling the size limitation is a lot more straightforward than for
>> strings.
>>
>> Now that we have covered the HashTable changes, we're only left with
>> one significant change. The diff for the zend_string structure is as
>> follows:
>>
>>
>> zend_refcounted gc; - zend_ulong h; /* hash
>> value */ - int len;
>> + zend_uint_t h; /* hash value */
>> + zend_size_t len;
>> char val[1];
>>
>> This means that a zend_string will on average be approximately 4 bytes
>> longer. (Approximately because the exact difference is subject to
>> details of the allocation process and depends on the string length
>> distribution.)
>>
>> I think that this difference might be acceptable, if this is the cost
>> of uniform string size usage in the codebase. I tested the impact that
>> adding 4 bytes on the string structure has on one application and the
>> result was a change from 38.2 to 38.5 MiB, which translates to a 0.7%
>> difference. That difference is acceptable to me.
>>
>> However I don't know how representative that result is and I don't
>> currently have time to set up a testing environment for something like
>> wordpress on a 64bit vm. I would appreciate it if somebody who already
>> has such a setup could run a test for this, to confirm that there is
>> indeed no large difference in memory consumption. Just add a "int
>> unused" after the len member and compare memory usage.
>>
>> So, what I'm suggesting here is the following:
>>
>>
>> We implement 64bit integers on LLP64, we implement unsigned sizes, we
>> implement size_t for strings lengths. I honestly still don't fully
>> understand the necessity for the latter, but if the impact in phpng is
>> indeed low (in the 1% range), then I think I can compromise on that.
>>
>> What we don't implement is indiscriminate usage of size_t or other
>> 64bit
>> types all over the place (like for linenos, argument counts, argument
>> lengths etc) and usage of 64bit sizes for hashtables.
>>
>> This would be acceptable for me and I hope that it would also be
>> acceptable for others.
>>
>> While we're at it, I would also like to quickly touch on the topic of
>> naming and renaming. I hope it is clear now that we need both 64bit
>> types for some things and 32bit types for others (like HT sizes,
>> linenos, etc). As such I would suggest to use zend_(u)long_t as the
>> 64bit type (on 64bit
>> platforms of course) and zend_(u)int_t as the 32bit type. This more or
>> less matches our current naming and hopefully avoids confusion.
>>
>> With that naming convention, we should also continue using IS_LONG and
>> Z_LVAL instead of the suggested IS_INT and Z_IVAL. I'd also keep using
>> Z_STRLEN instead of Z_STRSIZE and wouldn't do changes to zpp, especially
>> as they conflict with other zpp changes in phpng. I understand why these
>> changes were beneficial in the original 64bit patch, however
>> considering that the usage of many APIs changed in phpng and extensions
>> need to be carefully reviewed anyway, I don't think these renames make a
>> lot of sense anymore.
>>
>> So, that's my opinion on how we should proceed with the 64bit patch. I
>> very much hope that we can reach some consensus about this.
>>
>> Credits: This mail is the result of a discussion between Anthony, Bob
>> and myself.
>>
>> Nikita
>>
I really appreciate you guys have looked inside the str_size_and_int64
patch. Dmitry even has compiled and tested it.
While the Nikitas notes sound pretty meaningful, let me ask you a one
simple question before proceeding with any details - how much time you
personally would spend on the integration?
Thanks
Anatol