Re: Re: PHP Unicode support design document

From: Andi Gutmans Date: Mon, 15 Aug 2005 22:13:01 +0000

Subject: Re: Re: PHP Unicode support design document

References: 1 2 3 4 5 6 7 8 Groups: php.internals

Request: Send a blank email to internals+get-18149@lists.php.net to get a copy of this message

If you want to optimize then I guess "remembering" the script_encoding is the only way to do it. We could do it similar to the way we "cache" script file names.
Another option is to just optimize for UTF-8 and use BOMs for UTF-8/UTF-16...

Andi

At 03:09 PM 8/15/2005 -0700, Rasmus Lerdorf wrote:
I think the main issue here is that if your script encoding is set to
UTF-8 and you do everything in UTF-8 then these large blocks of UTF-8
are going to make a UTF-8 -> UTF-16 -> UTF-8 conversion roundtrip on
every request.  It would be nice if we could somehow avoid that.

-Rasmus

Andi Gutmans wrote:
Wouldn't it be easiest to have inline html become IS_UNICODE and then
not deal with the problem of remember what the script encoding was? I
thought that's what we already do today.

Andi

At 12:37 PM 8/10/2005 -0700, Andrei Zmievski wrote:

I did not have time to write the full reply earlier so here goes.

Even if we modify the output layer to be aware of various types of
strings coming down the pipe, it would still need to know the encoding
of IS_STRING's in order to convert them to the output encoding. This
presents a particular problem for inline HTML blocks, as they are
supposed to be in the script encoding, but by the time the HTML is
sent to the output layer, we don't know what the source script
encoding was for these HTML blocks. This problem exists in the current
implementation also, because the ZEND_ECHO opcode does not keep track
of what the script encoding was. This needs to be fixed, obviously.

One approach could be to implement a separate opcode for inline HTML
blocks and store the name of the script encoding it came from in the
opcode. Then when the output layer (or whatever else) gets to it, we
can check the encoding name in the opcode vs. the output encoding and
perform transcoding if necessary. This does mean that we may need to
dynamically open and close converters on each output (if there were
different script encodings floating around), but can be alleviated by
keeping some sort of converter cache around.

I am open to other ideas.

-Andrei

On Aug 10, 2005, at 8:34 AM, Andrei Zmievski wrote:

That's not true, actually. 'echo' and 'print' resolve to ZEND_ECHO
opcode which calls zend_print_variable(), which in turn calls
zend_make_printable_zval(). Now, this last function is supposed to
take a zval and turn it into a printable string, of course, which is
then output using utility_functions->write_function aka
php_body_write(). All that function cares about is how to output a
binary string. So, if we want to bubble the conversion down to the
output layer, we probably need to change the write function so that
it takes a void* and a type and knows how to deal with them
appropriately.

Thread (44 messages)

Andrei ZmievskiWed, 10 Aug 2005 07:31:30 +0000
Ron KorvingWed, 10 Aug 2005 10:45:27 +0000Re: PHP Unicode support design document
Antony DovgalWed, 10 Aug 2005 10:54:07 +0000Re: Re: PHP Unicode support design document
Andrei ZmievskiWed, 10 Aug 2005 20:15:28 +0000
Derick RethansWed, 10 Aug 2005 11:01:23 +0000Re: Re: PHP Unicode support design document
Christian SchneiderWed, 10 Aug 2005 13:26:28 +0000
Andi GutmansWed, 10 Aug 2005 14:26:33 +0000Re: Re: PHP Unicode support design document
Rasmus LerdorfWed, 10 Aug 2005 14:30:38 +0000
George SchlossnagleWed, 10 Aug 2005 14:36:15 +0000
Ron KorvingWed, 10 Aug 2005 14:50:17 +0000
Marcus BoergerWed, 10 Aug 2005 19:26:21 +0000
Ron KorvingWed, 10 Aug 2005 19:29:36 +0000
Adam Maccabee TrachtenbergWed, 10 Aug 2005 20:04:20 +0000
Antony DovgalWed, 10 Aug 2005 15:11:15 +0000
Andrei ZmievskiWed, 10 Aug 2005 15:35:22 +0000
Rasmus LerdorfWed, 10 Aug 2005 16:06:51 +0000
Ron KorvingWed, 10 Aug 2005 16:57:26 +0000
Andrei ZmievskiWed, 10 Aug 2005 15:34:38 +0000
Andrei ZmievskiWed, 10 Aug 2005 19:37:45 +0000
Andi GutmansMon, 15 Aug 2005 22:03:41 +0000
Rasmus LerdorfMon, 15 Aug 2005 22:09:01 +0000
Andi GutmansMon, 15 Aug 2005 22:13:01 +0000
Andrei ZmievskiTue, 16 Aug 2005 17:28:29 +0000
Andrei ZmievskiTue, 16 Aug 2005 17:19:05 +0000
Andrei ZmievskiWed, 10 Aug 2005 20:10:29 +0000Re: Re: PHP Unicode support design document
Ondrej IvaničTue, 16 Aug 2005 08:16:22 +0000
cshmoove@bellsouth.netTue, 16 Aug 2005 01:56:11 +0000
Andrey HristovTue, 16 Aug 2005 20:07:54 +0000
l0t3kTue, 16 Aug 2005 20:31:43 +0000
Peter BrodersenTue, 16 Aug 2005 21:57:27 +0000Re: PHP Unicode support design document
Andrei ZmievskiTue, 16 Aug 2005 22:14:46 +0000
Makoto TozawaWed, 24 Aug 2005 02:30:04 +0000Re: PHP Unicode support design document
Andrei ZmievskiWed, 24 Aug 2005 23:23:19 +0000Re: Re: PHP Unicode support design document
Makoto TozawaThu, 25 Aug 2005 02:41:14 +0000
Tex TexinThu, 25 Aug 2005 08:20:26 +0000RE: [PHP-DEV] Re: PHP Unicode support design document-keeping existing functionality
Makoto TozawaFri, 26 Aug 2005 00:30:23 +0000Re: Re: PHP Unicode support design document-keeping existing functionality
Tex TexinFri, 26 Aug 2005 10:27:22 +0000RE: [PHP-DEV] character/byte semantics was Re: PHP Unicode support design document-keeping existing functionality
Makoto TozawaFri, 26 Aug 2005 18:02:02 +0000Re: character/byte semantics was Re: PHP Unicode support design document-keeping existing functionality
Tex TexinThu, 25 Aug 2005 08:07:48 +0000RE: [PHP-DEV] Re: PHP Unicode support design document
Makoto TozawaThu, 25 Aug 2005 23:46:58 +0000Re: Re: PHP Unicode support design document
Tex TexinFri, 26 Aug 2005 10:27:22 +0000FORM accept-charset was: PHP Unicode support design document
James AylettFri, 26 Aug 2005 11:26:20 +0000
Adam Maccabee TrachtenbergThu, 25 Aug 2005 15:06:08 +0000
Tex TexinThu, 25 Aug 2005 23:37:21 +0000RE: [PHP-DEV] Re: PHP Unicode support design document- encoding negotiation

« previous	php.internals (#18149)	next »

From:	Andi Gutmans	Date:	Mon, 15 Aug 2005 22:13:01 +0000
Subject:	Re: Re: PHP Unicode support design document
References:	1 2 3 4 5 6 7 8	Groups:	php.internals
Request:	Send a blank email to internals+get-18149@lists.php.net to get a copy of this message