Re: signed long hash index for PHP7?

From: Date: Wed, 30 Jul 2014 07:46:37 +0000
Subject: Re: signed long hash index for PHP7?
References: 1 2  Groups: php.internals 
Request: Send a blank email to internals+get-76266@lists.php.net to get a copy of this message
On 30 Jul 2014, at 07:50, Tjerk Meesters <tjerk.meesters@gmail.com> wrote:

>> That would make sense, but doesn't solve all edge cases as your maximum array
>> index is still more than 2 times the largest positive integer on 32-bit.
> 
> Is that by design, a bug or something else entirely? Could you explain this edge case with some
> code?

On a 32-bit platform, the maximum signed long is 0x7FFFFFFF, but the maximum unsigned long is
0xFFFFFFFF, slightly more than twice as big.

For example, this does what you’d expect on my machine (OS X 64-bit Intel Core i5):

andreas-air:~ ajf$ php -r '$x = [0xFFFFFFFF => 1]; $x[] = 2; var_dump($x);'
array(2) {
  [4294967295]=>
  int(1)
  [4294967296]=>
  int(2)
}

On my 32-bit Ubuntu VM (which I use precisely to test this kind of issue when working on bigints),
however, it wraps around:

ajf@andrea-VirtualBox:~$ php -r '$x = [0xFFFFFFFF => 1]; $x[] = 2; var_dump($x);'
array(2) {
  [-1]=>
  int(1)
  [0]=>
  int(2)
}

I think we should probably use an unsigned long internally, but prevent negative values.

> Forbidding negative indices is a bit harsh and imho quite unnecessary;

Actually, I missed the bit of your email suggesting treating them as strings the first time I read
it. I’d be fine with that.

> turning “out of range” indices into strings should work just fine afaict. Is there a reason
> why it shouldn’t?

Well… there is one issue. Basically, some array functions treat integer and string keys completely
differently. 

> A compromise could be to allow string keys that would otherwise have converted into a negative
> integer, but disallow negative int/float explicitly.

It’d be a complete BC break, but we could make negative indices work like they do in Python and
grab the (length + index)th item (i.e. -1 returns item 4 in a list of 5, -2 returns item 3, and so
on). However, because our arrays are weird semi-indexed semi-hashmap things, this probably isn’t
good, as it’d prevent you from using strings like “-1” as keys. Alas, I can dream.

To actually respond to your suggestion, I don’t like the idea of blocking -1 but allowing
“-1”. In PHP, numeric strings, integers and floats are supposed to be equivalent, and I’m
already unhappy that large integer indexes and large numeric string indexes work differently.
Whatever we do, I’d like PHP 7’s arrays to treat integer, float and numeric string indexes
consistently.


Thinking about it a little more, if we use a long for indexes, we don’t even need to make them
strings. It would fit the principle of least astonishment IMO if any valid PHP int is a valid index
and won’t be a string. I was going to say that negative indexes don’t work right internally, but
then I realised they could work fine for indexing into the buckets if we just cast them to unsigned
longs internally (hence getting the 2’s complement representation on modern CPUs) for indexing and
hashing, but only expose signed longs to the outside world, including through the API.

So in summary, I think we should use signed longs for indexes (or at least whatever type PHP’s
basic int is), and anything outside of the range of one should be treated as a string. This would
make numeric strings and ints consistent, would solve all the weird overflow issues, and is the most
intuitive approach IMO.

--
Andrea Faulds
http://ajf.me/






Thread (7 messages)

« previous php.internals (#76266) next »