Hi,
On Tue, Feb 11, 2014 at 3:50 AM, Yasuo Ohgaki <yohgaki@ohgaki.net> wrote:
> Hi Tjerk,
>
> On Mon, Feb 10, 2014 at 7:36 PM, Tjerk Meesters <tjerk.meesters@gmail.com>wrote:
>
>> I've given this problem some more thought and the following looked
>> promising; it would be great to get your feedback on it.
>>
>> Basically, to remove the modulo and MAX(), I've used a combined buffer
>> that's used for the comparison, in two ways:
>>
>> 1. PAD := known (padded with \0 based on given_len)
>> 2. PAD := known CONCAT given
>>
>> The comparison is straightforward because we already know the buffer is at
>> least given_len.
>>
>> Implementations of above can be found here:
>> 1. https://github.com/datibbaw/php-src/compare/master...const-cmp
>> 2. https://github.com/datibbaw/php-src/compare/master...const-cmp-2
>>
>> Bottom line is that this attempts to compromise memory against time
>> stability.
>>
>
> I think length leak does not matter much.
> Therefore, I may ignore length leak as it could know it from other codes
> using strlen(), etc.
>
> Our main concern would be performance.
>
This problem isn't about performance, not necessarily anyway; the primary
problem that this rfc is attempting to solve is a string comparison
function with the tightest possible Theta(given-length) runtime.
If this can be accomplished with reading double words first, followed by a
byte wise suffix comparison, then great. If not, the first requirement
should be honoured.
>
> https://github.com/yohgaki/php-src/compare/PHP-5.6-rfc-hash-compare
>
> and the bench mark result of
>
> https://gist.github.com/yohgaki/ede544f290c6cf9fa90d
>
> is
>
> [yohgaki@dev github-php-src]$ ./php-bin t.php
> str_siphash_compare Elapsed: 9.450211 Iterations: 1000000 DataSize: 1024
> str_xxhash32_compare Elapsed: 3.849933 Iterations: 1000000 DataSize: 1024
> str_md5_compare Elapsed: 27.174614 Iterations: 1000000 DataSize:
> 1024
> str_byte_compare Elapsed: 5.874092 Iterations: 1000000 DataSize: 1024
> str_byte_compare2 Elapsed: 8.761152 Iterations: 1000000 DataSize: 1024
> str_word_compare Elapsed: 1.867914 Iterations: 1000000 DataSize: 1024
> str_compare Elapsed: 1.178738 Iterations: 1000000 DataSize: 1024
>
> I had the same idea, but I didn't try as it requires additional memory
> allocation.
>
> str_byte_compare Elapsed: 5.874092 Iterations: 1000000 DataSize: 1024
> str_byte_compare2 Elapsed: 8.761152 Iterations: 1000000 DataSize: 1024
>
> 1st one is simple byte by byte XOR, 2nd one is using modulo as original
> patch
> proposed. There is obvious difference even with additional modulo.
>
> str_compare Elapsed: 1.178738 Iterations: 1000000 DataSize: 1024
>
> This is the result of simple strncmp() as a reference.
> How well your version perform compare to simple strncmp?
>
> Regards,
>
> --
> Yasuo Ohgaki
> yohgaki@ohgaki.net
>
--
--
Tjerk