Andrei,
it was controlled by an ini setting. there are certain APIs that take or
return offsets, so translation
was done in those instances depending on the setting. Here's an example
(it's not currently implemented
this way, though..) since my concern was only the extension, i didnt touch
the engine itself..
pardon the formatting....
<code>
/* {{{ proto long BreakIterator::next([long offset]) */
static ZEND_BEGIN_ARG_INFO_EX(arginfo_breakiterator_next, 0, 0, 0)
ZEND_ARG_INFO(0, offset)
ZEND_END_ARG_INFO();
BREAKITERATOR_METHOD(next)
{
php_breakiterator_obj *obj =
(php_breakiterator_object *)zend_object_store_get_object(getThis()
TSRMLS_CC);
BreakIterator *iter = (BreakIterator *)obj->ptr;
UnicodeString *text = obj->text;
long offset, result;
if (0 == ZEND_NUM_ARGS()) {
offset = (long)iter->next();
} else {
long start = 0;
if (FAILURE == zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "l",
&start)) {
return;
}
if (ICUG(codepoint_semantics)) {
FROM_CODEPOINT_INDEX(text->getBuffer(), text->length(), start, offset);
offset = (long)iter->next(offset);
} else {
offset = (long)iter->next(start);
}
}
if (ICUG(codepoint_semantics)) {
long result;
TO_CODEPOINT_INDEX(text->getBuffer(), text->length(), offset, result);
RETURN_LONG(result);
} else {
RETURN_LONG(offset);
}
}
/* }}} */
</code>
clayton
"Andrei Zmievski" <andrei@gravitonic.com> wrote in message
news:F269F06B-C34A-4BE0-A486-9C0AAC9CA2DF@gravitonic.com...
> And this was controlled how and from where?
>
> -Andrei
>
>
> On Aug 14, 2005, at 12:29 PM, <cshmoove@bellsouth.net>
> <cshmoove@bellsouth.net> wrote:
>
>> Back in the early days of the extension, i had a request global
>> ICUG(codepoint_semantics) which controlled this. Setting this to false
>> would
>> revert to code-unit indexing (which ICU does internally).
>>
>> clayton
>>
>> "Andrei Zmievski" <andrei@gravitonic.com> wrote in message
>> news:133A42C8-184B-49F3-8EEA-3B595B4BD062@gravitonic.com...
>>
>>>
>>> Then why don't we put our collective brains together and think of a
>>> solution for this that does not involve hacks?
>>>
>>> -Andrei
>>>
>>> On Aug 14, 2005, at 3:51 AM, Derick Rethans wrote:
>>>
>>>>
>>>> In quite some cases for me i'm sure there are no surrogates in the
>>>> text
>>>> I'm parsing. Having to deal with rescanning the string for every
>>>> access to a character is not really wanted.
>>>>
>>>> Derick
>>>>
>>>>
>>
>> --
>> PHP Internals - PHP Runtime Development Mailing List
>> To unsubscribe, visit: http://www.php.net/unsub.php
>>