I did not have time to write the full reply earlier so here goes.
Even if we modify the output layer to be aware of various types of
strings coming down the pipe, it would still need to know the encoding
of IS_STRING's in order to convert them to the output encoding. This
presents a particular problem for inline HTML blocks, as they are
supposed to be in the script encoding, but by the time the HTML is
sent to the output layer, we don't know what the source script
encoding was for these HTML blocks. This problem exists in the current
implementation also, because the ZEND_ECHO opcode does not keep track
of what the script encoding was. This needs to be fixed, obviously.
One approach could be to implement a separate opcode for inline HTML
blocks and store the name of the script encoding it came from in the
opcode. Then when the output layer (or whatever else) gets to it, we
can check the encoding name in the opcode vs. the output encoding and
perform transcoding if necessary. This does mean that we may need to
dynamically open and close converters on each output (if there were
different script encodings floating around), but can be alleviated by
keeping some sort of converter cache around.
I am open to other ideas.
-Andrei
On Aug 10, 2005, at 8:34 AM, Andrei Zmievski wrote:
That's not true, actually. 'echo' and 'print' resolve to ZEND_ECHO
opcode which calls zend_print_variable(), which in turn calls
zend_make_printable_zval(). Now, this last function is supposed to
take a zval and turn it into a printable string, of course, which is
then output using utility_functions->write_function aka
php_body_write(). All that function cares about is how to output a
binary string. So, if we want to bubble the conversion down to the
output layer, we probably need to change the write function so that
it takes a void* and a type and knows how to deal with them
appropriately.