Hi all,
On Tue, Feb 11, 2014 at 4:43 PM, Solar Designer <solar@openwall.com> wrote:
> On Tue, Feb 11, 2014 at 07:55:41AM +0800, Tjerk Meesters wrote:
> > This problem isn't about performance, not necessarily anyway; the primary
> > problem that this rfc is attempting to solve is a string comparison
> > function with the tightest possible Theta(given-length) runtime.
>
> Not exactly. An actual goal is not leaking info via timings, and it
> may be achieved in a variety of ways. (A secondary goal is not leaking
> info via memory access pattern, but it's trickier.)
I think memory access pattern cannot be hidden. Code is known,
compiler/system can be guessed nicely. Therefore, it's better to
have code that would not depends on memory access pattern.
> > If this can be accomplished with reading double words first, followed
> by a
> > byte wise suffix comparison, then great.
>
> It can be, but this will add code complexity and thus potential bugs.
> I think this is not worth it in a function focused on security. Let's
> not waste time (and increase risk) trying to implement this.
>
I was curious about performance difference. Especially, because someone
told that Python 3.4 uses SipHash for all ==/=== comparison. I thought
we may have the same.
> > If not, the first requirement should be honoured.
>
> This should be the case regardless (and the actual first requirement is
> different from what you wrote).
>
> I think we should focus mostly on implementation correctness, not speed.
>
Correctness is mandatory. I agree.
That's the reason why OpenBSD people use simple timingsafe_bcmp(),
probably. It cannot be wrong.
This is a more general problem with your codebase, but things like use
> of signed ints for string lengths are not great. Ditto about use of
> potentially signed chars with implicit promotion to potentially signed
> int (depending on whether char is signed on a given platform) and then
> assignment to definitely signed int. Now recall that signed integer
> overflow has undefined behavior in C. I think you got away with it this
> time (although one has to consider both possibilities for char's
> signedness when reviewing code like this), but chances are that you have
> quite some instances of signed integer overflow UB in the PHP tree.
> I think you should be using size_t and other explicitly unsigned types
> where appropriate.
>
The cast is not forbidden by spec AFAIK. I could be wrong, though.
I agree that cast to other data type is tricky. IIRC, MySQL's admin password
was cracked due to the difference of "signed char" and "unsigned char".
IIRC, bcrypt had similar issue also. str_word_compare() uses cast and I
suppose it does what we need. We have to verify assembler code on
various platforms if it is okay or not.
Much of my own older code has similar drawbacks/risks, some of it
> coming from pre-C99 portability concerns. These days, I think we should
> focus on correctness on modern systems, especially when we're talking
> not about reusable code snippets, but about the PHP codebase, where you
> can reasonably stipulate C99 as a requirement (perhaps you already do?)
I agree.
I think str_byte_compare() (Simple fixed length byte by byte compare)
is the total winner unless str_word_compare() is proven to be a correct
implementation.
I'll vote for str_byte_compare() implementation. (OpenBSD one)
Could we consider to add timing safe string comparison to ==/===
operator? It assures comparison safety of any PHP scripts, including
existing one.
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net