Makoto,
ok, thanks. Now I see. You are saying that the multi-byte extension took the opposite approach and made existing str*() byte oriented and the mb_str*() character oriented.
So we have:
s2m) Migration from singlebyte to multibyte: change code that needs character units to use mb*()
s2u) Migration from singlebyte to (new) Unicode: change code that needs byte units to use new functions.
m2u) Migration from multibyte to Unicode: change str*() code that needs byte units to use new functions and (perhaps) change mb*() functions back to str*().
ugh!
and some folks are using the multibyte extension as their current unicode solution so that the case m2u additionally represents migration from unicode to unicode when the php version changes.
That would bear some additional consideration.
But, in looking at the mb* doc, the str*() functions can be overloaded to use the mb*() character semantics. If a good number of users do that, then it isn't much consequence either way (ie it is no-win), and that puts us back to the original proposal.
Is that right?
Tex Texin
Internationalization Architect, Yahoo! Inc.
-----Original Message-----
From: Makoto Tozawa [mailto:makoto.tozawa@oracle.com] Sent: Thursday, August 25, 2005 5:30 PM
To: Tex Texin
Cc: 'Andrei Zmievski'; christopher.jones@oracle.com; 'PHP Developers Mailing List'
Subject: Re: [PHP-DEV] Re: PHP Unicode support design document-keeping existing functionality
If we don't make the functions provide reasonable behavior for
unicode, then every program needs to be rewritten to change function names.
I agree. I asked it because the Backwards Compatibility section states the following:
"... the upgrade to Unicode-enabled PHP has to be transparent. This means that the existing data types and functions must work as they have always done."
For those functions written for single byte encoding, the upgrade to Unicode-enabled PHP will be transparent because the character semantics remains same. For those functions written for multi byte encoding using mb_str*() functions, it will be also transparent.
It is okay if there is no way to save those functions written for multi byte encoding abusing the str*() functions.
Makoto
Tex Texin wrote:
1) sorry I am compelled to change the subject so all threads
don’t look