Re: AW: [PHP-DEV] FILTER_VALIDATE_INT and +0/-0

From: Date: Fri, 08 Feb 2013 21:09:22 +0000
Subject: Re: AW: [PHP-DEV] FILTER_VALIDATE_INT and +0/-0
References: 1 2 3 4 5 6 7  Groups: php.internals 
Request: Send a blank email to internals+get-65733@lists.php.net to get a copy of this message


----- Ursprüngliche Message -----
> Von: Gustavo Lopes <glopes@nebm.ist.utl.pt>
> An: 'Patrick Schaaf' <php@bof.de>; "internals@lists.php.net"
> <internals@lists.php.net>; Frank Liepert <Frank.Liepert@gmx.de>; hakre
> <hanskrentel@yahoo.de>
> CC: 'Derick Rethans' <derick@php.net>; 'Martin Jansen'
> <martin@divbyzero.net>
> Gesendet: 21:19 Freitag, 8.Februar 2013
> Betreff: Re: AW: [PHP-DEV] FILTER_VALIDATE_INT and +0/-0
> 

>>  A special case still left is "±0". It is with the 'PLUS-MINUS 
> SIGN' (U+00B1).
> 
> By special case, I meant a deviation to the general rule on how the code handles 
> the input. The code handles the characters 0-9 prefixed by an optional sign.

The general rule is to either allow + 'PLUS SIGN' (U+002B) and -  'HYPHEN-MINUS'
(U+002D) for all positive natural numbers excluding zero.

The discussion is about to allow those as well for zero.

The 'PLUS-MINUS SIGN' (U+00B1) is a relevant sign for the number zero in this context but
it got unnoticed so far in the discussion.

To not deviate from the general rule to allow signs in front of all positive natural numbers
excluding zero for the missing zero, all valid plus and minus signs including *both at once* as
possible for zero should be properly filtered as valid integers.

If you aim to have UTF-8 compatibility with the input, you should also consider 'MINUS
SIGN' (U+2212), I didn't mention it so far because PHP by default targets ISO-8859-1 (at
least commonly, historically and by popularity), so I only covered the sign in Latin-1.

>  The 
> PLUS-MINUS SIGN -- or, for that matter, all the other numeric characters in the 
> Unicode repertoire -- are irrelevant.

Unicode is never irrelevant, it's used to communicate clearly and specifically about which
signs I'm concerned about.

Unicode does not classify "numeric characters", you probably meant 'Number, Decimal
Digit', 'Symbol, Math [Sm]', 'Punctuation, Dash' or 'Number,
Other' but it remains unspecified in your email. Would you please elaborate?

> 
>>  It's an equally incorrect sign for the number 0 as "-" or 
> "+" is incorrect. Available in internet standards ISO-8859-1 and more 
> as "\xB1"  (UTF-8 as "\xC2\xB1"), 
> FILTER_VALIDATE_INT should reflect hidden dependency of input encoding here.
> 
> I'm not sure what you're arguing for here..
To make the feature complete, the input encoding needs to be hinted for those signs otherwise the
FILTER_VALIDATE_INT won't work properly with strings with an unexpected encoding (UTF-8 since
PHP 5.4 (?!); ISO-8859-1 in the past).

Otherwise I'd say it's important to document a note that the function is US-ASCII /
ISO-8859-1 safe (only?) as this is string input validation.

-- hakre


Thread (14 messages)

« previous php.internals (#65733) next »