I may take a part of the work. It's not a problem.
Thanks. Dmitry.
On Sun, May 18, 2014 at 12:37 AM, Anatol Belski <anatol.php@belski.net>wrote:
> Hi,
>
> On Sat, May 17, 2014 22:12, Dmitry Stogov wrote:
> > Hi Nikita,
> >
> >
> > Thanks for this deep analyzes.
> > I hope, this will lead us to agreement.
> >
> >
> > I see no problems with accepting IS_LONG part of the patch.
> > Then we may think about string length.
> > As you pointed, it makes no sense to use size_t everywhere, and may be
> > using it only in "right" places won't make serious harm.
> >
> > Thanks. Dmitry.
> >
> >
> >
> >
> > On Sat, May 17, 2014 at 10:27 PM, Nikita Popov <nikita.ppv@gmail.com>
> > wrote:
> >
> >
> >> Hi internals!
> >>
> >>
> >> The discussions around the patch for improved 64bit support have
> >> deteriorated from a semi-reasonable discussions to pointless accusations
> >> and name-calling. I'd like to get back to discussing this issue from a
> >> technical point of view and see which parts of the patch can be
> >> salvaged and which can not.
> >>
> >> As the current voting results show, the 64bit changes, if accepted,
> >> will be integrated into phpng rather than into master. As such I will
> >> argue purely from a phpng perspective.
> >>
> >> Before going any further, it should be established that two aspects of
> >> the 64 bit patch are, as far as I can see, acceptable to everyone: The
> >> introduction of 64bit integers on Win64 and other LLP64 architectures
> >> and the use of an unsigned integer type for lengths and sizes.
> >>
> >> The disputed aspect of the patch is the use of 64 bit lengths and
> >> sizes. It has been argued that the use of 64bit sizes will significantly
> >> increase memory consumption of the phpng branch and may also negatively
> >> impact performance.
> >>
> >> If you take a look at an early patch for porting the 64bit changes for
> >> phpng [1] you'll find that these are very real concerns, as the changes
> >> increase the sizes of many commonly used data structures.
> >>
> >> However, this is only the case if 64bit sizes are blindly used in every
> >> possible place. In the following I'll argue how a more careful choice
> >> of the used size type in key places can make this patch a lot more
> >> realistic.
> >>
> >> First of all it should be noted that the patch introduces size_t usage
> >> in several places where, as far as I can see, it is not necessary. For
> >> example it changes the storage type of line numbers from zend_uint to
> >> zend_size_t:
> >>
> >>
> >> From zend_opline:
> >> - uint lineno;
> >> + zend_size_t lineno;
> >>
> >>
> >> From zend_class_entry:
> >> - zend_uint line_start;
> >> - zend_uint line_end;
> >> + zend_size_t line_start;
> >> + zend_size_t line_end;
> >>
> >>
> >> I don't think we need to concern ourselves about files with more than 4
> >> billion lines.
> >>
> >> Similarly the patch also changes the length of argument names and class
> >> typehints to 64bit:
> >>
> >> From zend_arg_info:
> >> - zend_uint name_len;
> >> + zend_size_t name_len;
> >> const char *class_name; - zend_uint class_name_len;
> >> + zend_size_t class_name_len;
> >>
> >>
> >> This once again seem rather unnecessary and wasteful. Another case are
> >> the number of arguments of a function:
> >>
> >> From zend_op_array:
> >> - zend_uint num_args;
> >> - zend_uint required_num_args;
> >> + zend_size_t num_args;
> >> + zend_size_t required_num_args;
> >>
> >>
> >> Again it seems doubtful whether functions with more than 4 billion
> >> arguments are quite relevant.
> >>
> >> While there are more examples than these, I'll not go into this point
> >> any further, as the pattern should be clear. Instead we'll be moving on
> >> to the zend_string and HashTable structures.
> >>
> >> For the HashTable structure, the diff currently looks as follows:
> >>
> >>
> >> - zend_uint nTableSize;
> >> - zend_uint nTableMask;
> >> - zend_uint nNumUsed;
> >> - zend_uint nNumOfElements;
> >> - long nNextFreeElement;
> >> + zend_uint_t nTableSize;
> >> + zend_uint_t nTableMask;
> >> + zend_uint_t nNumUsed;
> >> + zend_uint_t nNumOfElements;
> >> + zend_int_t nNextFreeElement;
> >> Bucket *arData;
> >> - zend_uint *arHash;
> >> + zend_uint_t *arHash;
> >> dtor_func_t pDestructor; zend_uint nInternalPointer;
> >>
> >> This actually misses changing nInternalPointer. If we change that one
> >> as well and account for differences in alignment, the total increase for
> >> the HashTable structure is 6*4 = 24 bytes. For a heavily used structure,
> >> that's a lot. However, even worse are the per-bucket memory increases:
> >>
> >> From the diff above you can already see that the arHash array will be
> >> twice as large, i.e. you'll get an additional 4 bytes per bucket.
> >> However - and
> >> this is currently not represented in the patch stub - changing
> >> hashtables to 64bit lengths will make the use of Z_NEXT(bucket->val) to
> >> store the next collision resolution index impossible. As such another 8
> >> bytes per bucket will be needed.
> >>
> >> This brings us to a total increase of 24 bytes per hashtable and 12
> >> bytes per bucket. Especially the latter seems unacceptable to me.
> >>
> >> As such, I would suggest to *not* change sizes for the hashtable
> >> structure and limit these changes to other parts of the engine (strings
> >> in particular).
> >>
> >> To put this into context, consider what it actually means to have an
> >> array with more than 4 billion elements (which is the point where the
> >> different size type becomes relevant). With hashtables as implemented in
> >> master with 144 bytes per bucket, this would require 2^32 * 144 = 576 GB
> >> of memory. If we introduced 64bit hashtable sizes in phpng, you'd still
> >> need 206 GB of memory for to create an array that can make use of those
> >> sizes.
> >>
> >> I hope this visualizes that supporting 64bit sizes for hashtables is
> >> not necessary. We can easily implement a size restriction in
> >> zend_hash_do_resize. As hashtables go through a centralized API
> >> controlling the size limitation is a lot more straightforward than for
> >> strings.
> >>
> >> Now that we have covered the HashTable changes, we're only left with
> >> one significant change. The diff for the zend_string structure is as
> >> follows:
> >>
> >>
> >> zend_refcounted gc; - zend_ulong h; /* hash
> >> value */ - int len;
> >> + zend_uint_t h; /* hash value */
> >> + zend_size_t len;
> >> char val[1];
> >>
> >> This means that a zend_string will on average be approximately 4 bytes
> >> longer. (Approximately because the exact difference is subject to
> >> details of the allocation process and depends on the string length
> >> distribution.)
> >>
> >> I think that this difference might be acceptable, if this is the cost
> >> of uniform string size usage in the codebase. I tested the impact that
> >> adding 4 bytes on the string structure has on one application and the
> >> result was a change from 38.2 to 38.5 MiB, which translates to a 0.7%
> >> difference. That difference is acceptable to me.
> >>
> >> However I don't know how representative that result is and I don't
> >> currently have time to set up a testing environment for something like
> >> wordpress on a 64bit vm. I would appreciate it if somebody who already
> >> has such a setup could run a test for this, to confirm that there is
> >> indeed no large difference in memory consumption. Just add a "int
> >> unused" after the len member and compare memory usage.
> >>
> >> So, what I'm suggesting here is the following:
> >>
> >>
> >> We implement 64bit integers on LLP64, we implement unsigned sizes, we
> >> implement size_t for strings lengths. I honestly still don't fully
> >> understand the necessity for the latter, but if the impact in phpng is
> >> indeed low (in the 1% range), then I think I can compromise on that.
> >>
> >> What we don't implement is indiscriminate usage of size_t or other
> >> 64bit
> >> types all over the place (like for linenos, argument counts, argument
> >> lengths etc) and usage of 64bit sizes for hashtables.
> >>
> >> This would be acceptable for me and I hope that it would also be
> >> acceptable for others.
> >>
> >> While we're at it, I would also like to quickly touch on the topic of
> >> naming and renaming. I hope it is clear now that we need both 64bit
> >> types for some things and 32bit types for others (like HT sizes,
> >> linenos, etc). As such I would suggest to use zend_(u)long_t as the
> >> 64bit type (on 64bit
> >> platforms of course) and zend_(u)int_t as the 32bit type. This more or
> >> less matches our current naming and hopefully avoids confusion.
> >>
> >> With that naming convention, we should also continue using IS_LONG and
> >> Z_LVAL instead of the suggested IS_INT and Z_IVAL. I'd also keep using
> >> Z_STRLEN instead of Z_STRSIZE and wouldn't do changes to zpp, especially
> >> as they conflict with other zpp changes in phpng. I understand why these
> >> changes were beneficial in the original 64bit patch, however
> >> considering that the usage of many APIs changed in phpng and extensions
> >> need to be carefully reviewed anyway, I don't think these renames make a
> >> lot of sense anymore.
> >>
> >> So, that's my opinion on how we should proceed with the 64bit patch. I
> >> very much hope that we can reach some consensus about this.
> >>
> >> Credits: This mail is the result of a discussion between Anthony, Bob
> >> and myself.
> >>
> >> Nikita
> >>
> I really appreciate you guys have looked inside the str_size_and_int64
> patch. Dmitry even has compiled and tested it.
>
> While the Nikitas notes sound pretty meaningful, let me ask you a one
> simple question before proceeding with any details - how much time you
> personally would spend on the integration?
>
> Thanks
>
> Anatol
>
>