Re: Where are we ACTUALLY on Unicode?

From: dreamcat four Date: Tue, 16 Mar 2010 11:48:00 +0000

Subject: Re: Where are we ACTUALLY on Unicode?

References: 1 2 3 Groups: php.internals

Request: Send a blank email to internals+get-47300@lists.php.net to get a copy of this message

On Tue, Mar 16, 2010 at 8:30 AM, Lester Caine <lester@lsces.co.uk> wrote:
> '3' is not a very processor friendly number, so working with 4 even though
> wasteful on memory, does make perfect sense. How long is it since we had a
> 640k limit on working memory? SERVERS should have a good amount of memory
> for caching information anyway. SO is UTF-16 the right approach for
> processing wide strings? It needs special code to handle everything wider
> than 16 bits, but at what gain really? If all core functionality is handled
> as 32 bit characters is there that much of an overhead over the additional
> processing to get around strings of dissimilar sizes in UTF-16 ?

Just to re-enforce some of Lester's points above here.

4-byte per character is never slower that 2-bytes per character... its
faster if anything. Bear in mind that 4-byte has been the defacto size
for all modern cpu registers / 32-bit microarchitectures since....
like... Forever. Give a c compiler 4bytes of data... it'll say: thank
you very much, and more of the same please! It keeps em happy ;)

Sure UTF-16 can make sense. But only if your external representations
are also in UTF-16. So whats the default Unicode settings for MYSQL,
POSTGRE, etc? Well, are they always set to UTF-8, or UTF-16?

Just do the same as them.

Thread (27 messages)

Lester CaineSun, 14 Mar 2010 07:28:07 +0000
William A. Rowe Jr.Sun, 14 Mar 2010 07:35:51 +0000
Stan VassilevSun, 14 Mar 2010 11:03:47 +0000
Pierre JoyeSun, 14 Mar 2010 14:00:59 +0000
Jordi BoggianoSun, 14 Mar 2010 14:23:26 +0000
Pierre JoyeSun, 14 Mar 2010 14:33:19 +0000
Moriyoshi KoizumiSun, 14 Mar 2010 14:34:24 +0000
dreamcat fourSun, 14 Mar 2010 14:43:00 +0000
Alexey ZakhlestinMon, 15 Mar 2010 06:20:15 +0000
Stanislav MalyshevMon, 15 Mar 2010 23:33:06 +0000
Lester CaineTue, 16 Mar 2010 08:30:14 +0000
dreamcat fourTue, 16 Mar 2010 11:48:00 +0000
Andrey HristovTue, 16 Mar 2010 12:15:14 +0000
dreamcat fourTue, 16 Mar 2010 17:40:37 +0000
Andrey HristovTue, 16 Mar 2010 18:25:18 +0000
Rasmus LerdorfTue, 16 Mar 2010 18:32:07 +0000
Lester CaineTue, 16 Mar 2010 19:03:38 +0000
dreamcat fourTue, 16 Mar 2010 19:05:39 +0000
Rasmus LerdorfTue, 16 Mar 2010 19:34:56 +0000
Lester CaineTue, 16 Mar 2010 20:39:18 +0000
Pierre JoyeTue, 16 Mar 2010 19:10:56 +0000
William A. Rowe Jr.Tue, 16 Mar 2010 20:42:35 +0000
Stanislav MalyshevTue, 16 Mar 2010 19:05:47 +0000
Ferenc KovacsTue, 16 Mar 2010 20:04:24 +0000
dreamcat fourTue, 16 Mar 2010 20:43:29 +0000
Ferenc KovacsTue, 16 Mar 2010 21:50:51 +0000
Lukas Kahwe SmithWed, 17 Mar 2010 15:29:43 +0000

« previous	php.internals (#47300)	next »

From:	dreamcat four	Date:	Tue, 16 Mar 2010 11:48:00 +0000
Subject:	Re: Where are we ACTUALLY on Unicode?
References:	1 2 3	Groups:	php.internals
Request:	Send a blank email to internals+get-47300@lists.php.net to get a copy of this message