Re: character/byte semantics was Re: PHP Unicode support design document-keeping existing functionality

From: Makoto Tozawa Date: Fri, 26 Aug 2005 18:02:02 +0000

Subject: Re: character/byte semantics was Re: PHP Unicode support design document-keeping existing functionality

References: 1 Groups: php.internals

Request: Send a blank email to internals+get-18478@lists.php.net to get a copy of this message

Yes.

Makoto

Tex Texin wrote:

Makoto,

ok, thanks. Now I see. You are saying that the multi-byte extension took the opposite approach and made existing str*() byte oriented and the mb_str*() character oriented.

So we have:
s2m) Migration from singlebyte to multibyte: change code that needs character units to use mb*()

s2u) Migration from singlebyte to (new) Unicode: change code that needs byte units to use new functions.

m2u) Migration from multibyte to Unicode: change str*() code that needs byte units to use new functions and (perhaps) change mb*() functions back to str*().

ugh!
and some folks are using the multibyte extension as their current unicode solution so that the case m2u additionally represents migration from unicode to unicode when the php version changes.

That would bear some additional consideration.

But, in looking at the mb* doc, the str*() functions can be overloaded to use the mb*() character semantics. If a good number of users do that, then it isn't much consequence either way (ie it is no-win), and that puts us back to the original proposal.

Is that right?

Tex Texin
Internationalization Architect,   Yahoo! Inc.
  
-----Original Message-----
From: Makoto Tozawa [mailto:makoto.tozawa@oracle.com] Sent: Thursday, August 25, 2005 5:30 PM
To: Tex Texin
Cc: 'Andrei Zmievski'; christopher.jones@oracle.com; 'PHP Developers Mailing List'
Subject: Re: [PHP-DEV] Re: PHP Unicode support design document-keeping existing functionality


If we don't make the functions provide reasonable behavior for unicode, then every program needs to be rewritten to change function names.

I agree. I asked it because the Backwards Compatibility section states the following:

"... the upgrade to Unicode-enabled PHP has to be transparent. This means that the existing data types and functions must work as they have always done."

For those functions written for single byte encoding, the upgrade to Unicode-enabled PHP will be transparent because the character semantics remains same. For those functions written for multi byte encoding using mb_str*() functions, it will be also transparent.

It is okay if there is no way to save those functions written for multi byte encoding abusing the str*() functions.

Makoto


Tex Texin wrote:

   
1) sorry I am compelled to change the subject so all threads      
don’t look

the same. 2) It's a no-win situation. If we don't make the functions provide reasonable behavior for unicode, then every program needs to be rewritten to change function names. The number of places where hard coded constants (6) are used is probably much smaller. At least this way most code does the right thing as-is. Also, if you don't want functions to show unicode behavior, leave unicode off, and just convert the data to utf-8. We do need to have functions that provide the raw byte length, so it will be available. Tex Texin Internationalization Architect, Yahoo! Inc.

Thread (44 messages)

Andrei ZmievskiWed, 10 Aug 2005 07:31:30 +0000
Ron KorvingWed, 10 Aug 2005 10:45:27 +0000Re: PHP Unicode support design document
Antony DovgalWed, 10 Aug 2005 10:54:07 +0000Re: Re: PHP Unicode support design document
Andrei ZmievskiWed, 10 Aug 2005 20:15:28 +0000
Derick RethansWed, 10 Aug 2005 11:01:23 +0000Re: Re: PHP Unicode support design document
Christian SchneiderWed, 10 Aug 2005 13:26:28 +0000
Andi GutmansWed, 10 Aug 2005 14:26:33 +0000Re: Re: PHP Unicode support design document
Rasmus LerdorfWed, 10 Aug 2005 14:30:38 +0000
George SchlossnagleWed, 10 Aug 2005 14:36:15 +0000
Ron KorvingWed, 10 Aug 2005 14:50:17 +0000
Marcus BoergerWed, 10 Aug 2005 19:26:21 +0000
Ron KorvingWed, 10 Aug 2005 19:29:36 +0000
Adam Maccabee TrachtenbergWed, 10 Aug 2005 20:04:20 +0000
Antony DovgalWed, 10 Aug 2005 15:11:15 +0000
Andrei ZmievskiWed, 10 Aug 2005 15:35:22 +0000
Rasmus LerdorfWed, 10 Aug 2005 16:06:51 +0000
Ron KorvingWed, 10 Aug 2005 16:57:26 +0000
Andrei ZmievskiWed, 10 Aug 2005 15:34:38 +0000
Andrei ZmievskiWed, 10 Aug 2005 19:37:45 +0000
Andi GutmansMon, 15 Aug 2005 22:03:41 +0000
Rasmus LerdorfMon, 15 Aug 2005 22:09:01 +0000
Andi GutmansMon, 15 Aug 2005 22:13:01 +0000
Andrei ZmievskiTue, 16 Aug 2005 17:28:29 +0000
Andrei ZmievskiTue, 16 Aug 2005 17:19:05 +0000
Andrei ZmievskiWed, 10 Aug 2005 20:10:29 +0000Re: Re: PHP Unicode support design document
Ondrej IvaničTue, 16 Aug 2005 08:16:22 +0000
cshmoove@bellsouth.netTue, 16 Aug 2005 01:56:11 +0000
Andrey HristovTue, 16 Aug 2005 20:07:54 +0000
l0t3kTue, 16 Aug 2005 20:31:43 +0000
Peter BrodersenTue, 16 Aug 2005 21:57:27 +0000Re: PHP Unicode support design document
Andrei ZmievskiTue, 16 Aug 2005 22:14:46 +0000
Makoto TozawaWed, 24 Aug 2005 02:30:04 +0000Re: PHP Unicode support design document
Andrei ZmievskiWed, 24 Aug 2005 23:23:19 +0000Re: Re: PHP Unicode support design document
Makoto TozawaThu, 25 Aug 2005 02:41:14 +0000
Tex TexinThu, 25 Aug 2005 08:20:26 +0000RE: [PHP-DEV] Re: PHP Unicode support design document-keeping existing functionality
Makoto TozawaFri, 26 Aug 2005 00:30:23 +0000Re: Re: PHP Unicode support design document-keeping existing functionality
Tex TexinFri, 26 Aug 2005 10:27:22 +0000RE: [PHP-DEV] character/byte semantics was Re: PHP Unicode support design document-keeping existing functionality
Makoto TozawaFri, 26 Aug 2005 18:02:02 +0000Re: character/byte semantics was Re: PHP Unicode support design document-keeping existing functionality
Tex TexinThu, 25 Aug 2005 08:07:48 +0000RE: [PHP-DEV] Re: PHP Unicode support design document
Makoto TozawaThu, 25 Aug 2005 23:46:58 +0000Re: Re: PHP Unicode support design document
Tex TexinFri, 26 Aug 2005 10:27:22 +0000FORM accept-charset was: PHP Unicode support design document
James AylettFri, 26 Aug 2005 11:26:20 +0000
Adam Maccabee TrachtenbergThu, 25 Aug 2005 15:06:08 +0000
Tex TexinThu, 25 Aug 2005 23:37:21 +0000RE: [PHP-DEV] Re: PHP Unicode support design document- encoding negotiation

« previous	php.internals (#18478)	next »

From:	Makoto Tozawa	Date:	Fri, 26 Aug 2005 18:02:02 +0000
Subject:	Re: character/byte semantics was Re: PHP Unicode support design document-keeping existing functionality
References:	1	Groups:	php.internals
Request:	Send a blank email to internals+get-18478@lists.php.net to get a copy of this message