The main problem is the way trim() function works.
One cannot reliably trim a substring using this function, because it takes its second argument as a collection of characters, each of which would be trimmed from the string. For example, trim("Hello Webbs ", " "); will remove trailing "b" and "s" from the name as well, returning just "Hello We". Therefore, such a task can be reliably done with a regular expression only.
The same goes for multi-byte characters, such as non-breaking space. Each byte in this character is trimmed separately, which may corrupt the original string (e.g. trim("· Hello world", "\xC2\xA0");). The good news, there are options to solve this problem:
- with
umodifier, the\smeta character in PHP regex recognizes the non-breaking space as well - starting from PHP 8.4, there will be
mb_trim()function, which is not only multi-byte safe, but also trims the non-breaking space character by default (along with many other space characters as well).
Below you will find recipes for various use cases
Trimming literal HTML entities from a string
$before = " abc xyz ";
$after = preg_replace('~^(?: )+|(?: )+$~', '', $before);
var_dump($before, $after);
Trimming the "non-breaking space" sequences (\xC2\xA0) only:
$before = html_entity_decode(" abc xyz ");
$after = preg_replace('~^(?:\xC2\xA0)+|(?:\xC2\xA0)+$~', '', $before);
Trimming non-breaking spaces along with other whitespace characters:
PHP < 8.4
$after = preg_replace('~^\s+|\s+$~u', '', $before);
Note the use of the u pattern modifier. As mentioned earlier, one effect of this modifier is that it changes the meaning of the \s escape sequence from "any whitespace character" (equivalent to [\r\n\t\f\v ]) to "any kind of invisible character" (equivalent to [\p{Z}\h\v]), which is significantly broader.
PHP >= 8.4
$after = mb_trim($before);
Trimming both "non-breaking space" sequences and HTML entities
Here, you have to add them both to regex.
It must be understood, that there is no way to reliably chain two trimming functions (e.g. trim(preg_replace(...))) because trimming should be done strictly in one pass. Simply because first function won't notice characters removed by second one and vice versa. Hence preg_replace('~^( )*(.*?)( )*$~', '$2', trim(" abc")); will leave leading spaces intact. Therefore, if you need to remove both substrings and multi-byte characters, regex is still the only option.
$before = html_entity_decode(" ")." abc xyz ";
$pattern = '~^(?:\xC2\xA0| )*(.*?)(?:\xC2\xA0| )*$~';
$after = preg_replace($pattern, '$1', $before);
var_dump($before, $after);
Trimming "non-breaking space" sequences, HTML entities and regular whitespace characters
$before = " ".html_entity_decode(" ")." abc \n ";
$pattern = '~^(?:\xC2\xA0| | |\r|\n|\t)*(.*?)(?:\xC2\xA0| | |\r|\n|\t)*$~';
$after = preg_replace($pattern, '$1', $before);
var_dump($before, $after);
(?:...)would seem more appropriate.$after = preg_replace('~^\s+|\s+$/~u', '$2', $before);Is'$2'meant to be''?| |\r|\n|\twith|\s.