On Sun, Mar 14, 2010 at 12:03 PM, Stan Vassilev <sv_forums@fmethod.com> wrote:
> UTF8 also takes 4 bytes for representing characters in the higher bit
> planes, as quite a lot of bits are lost for every char in order to describe
> how long the code point is, and when it ends and so on. This means
> memory-wise it may not be of big benefit to asian countries.
I remember Brian Aker saying that they chose to work internally with
UTF-8 for Drizzle. His explanation of it was that asian countries have
so much english content mixed in that on average even for them UTF-8
still had a lower footprint than UTF-16/32. I do not know where the
stats came from, but if it holds any truth it is worth considering.
Cheers,
Jordi