Re: Where are we ACTUALLY on Unicode?

From: Date: Tue, 16 Mar 2010 11:48:00 +0000
Subject: Re: Where are we ACTUALLY on Unicode?
References: 1 2 3  Groups: php.internals 
Request: Send a blank email to internals+get-47300@lists.php.net to get a copy of this message
On Tue, Mar 16, 2010 at 8:30 AM, Lester Caine <lester@lsces.co.uk> wrote:
> '3' is not a very processor friendly number, so working with 4 even though
> wasteful on memory, does make perfect sense. How long is it since we had a
> 640k limit on working memory? SERVERS should have a good amount of memory
> for caching information anyway. SO is UTF-16 the right approach for
> processing wide strings? It needs special code to handle everything wider
> than 16 bits, but at what gain really? If all core functionality is handled
> as 32 bit characters is there that much of an overhead over the additional
> processing to get around strings of dissimilar sizes in UTF-16 ?

Just to re-enforce some of Lester's points above here.

4-byte per character is never slower that 2-bytes per character... its
faster if anything. Bear in mind that 4-byte has been the defacto size
for all modern cpu registers / 32-bit microarchitectures since....
like... Forever. Give a c compiler 4bytes of data... it'll say: thank
you very much, and more of the same please! It keeps em happy ;)

Sure UTF-16 can make sense. But only if your external representations
are also in UTF-16. So whats the default Unicode settings for MYSQL,
POSTGRE, etc? Well, are they always set to UTF-8, or UTF-16?

Just do the same as them.


Thread (27 messages)

« previous php.internals (#47300) next »