Revisions to How to trim   (or non-breaking space) in PHP?

added 318 characters in body

Source Link

edited May 30, 2025 at 2:36

16.5k
9
62
56

Note the use of the u pattern modifier. As mentioned earlier, one effect of this modifier is that it changes the meaning of the \s escape sequence from "any whitespace character" (equivalent to [\r\n\t\f\v ]) to "any kind of invisible character" (equivalent to [\p{Z}\h\v]), which is significantly broader.

PHP >= 8.4

Note the use of the u pattern modifier. As mentioned earlier, one effect of this modifier is that it changes the meaning of the \s escape sequence from "any whitespace character" (equivalent to [\r\n\t\f\v ]) to "any kind of invisible character" (equivalent to [\p{Z}\h\v]), which is significantly broader.

PHP >= 8.4

added 38 characters in body

Source Link

edited May 19, 2025 at 19:49

Gras Double

16.5k
9
62
56

with u modifier, the \s meta character in PHP regex recognizes the non-breaking-space space as well
starting from PHP 8.4, there will be mb_trim() function, which is not only multi-byte safe, but also trims the non-breaking space character by default (along with many other space characters as well).

Trimming literal ` ` HTML entityentities from a string

$before = "&nbsp;abc&nbsp;xyz&nbsp;"; 
$after = preg_replace('~^&nbsp;|&nbsp;$~''~^(?:&nbsp;)+|(?:&nbsp;)+$~', '', $before);
var_dump($before, $after);

Trimming the "non-breaking space" sequencesequences (`\xC2\xA0`) only:

$before = html_entity_decode("&nbsp;abc&nbsp;xyz&nbsp;"); 
$after = preg_replace('~^\xC2\xA0|\xC2\xA0$~''~^(?:\xC2\xA0)+|(?:\xC2\xA0)+$~', '', $before);

Trimming non-breaking spacespaces along with other whitespace characters:

Trimming both "non-breaking space" sequencesequences and HTML entityentities

$before = html_entity_decode("&nbsp;")."&nbsp;abc&nbsp;xyz&nbsp;"; 
$pattern = '~^(?:\xC2\xA0|&nbsp;)*(.*?)(?:\xC2\xA0|&nbsp;)*$~';
$after = preg_replace($pattern, '$2''$1', $before);
var_dump($before, $after);

Trimming "non-breaking space" sequencesequences, HTML entityentities and regular whitespace characters

$before = "&nbsp;&nbsp; ".html_entity_decode("&nbsp;")."&nbsp; abc \n&nbsp; ";
$pattern = '~^(?:\xC2\xA0|&nbsp;| |\r|\n|\t)*(.*?)(?:\xC2\xA0|&nbsp;| |\r|\n|\t)*$~';
$after = preg_replace($pattern, '$2''$1', $before);
var_dump($before, $after);

with u modifier, the \s meta character in PHP regex recognizes the non-breaking-space as well
starting from PHP 8.4, there will be mb_trim() function, which is not only multi-byte safe, but also trims the non-breaking space character by default (along with many other space characters as well).

Trimming literal ` ` HTML entity from a string

$before = "&nbsp;abc&nbsp;xyz&nbsp;"; 
$after = preg_replace('~^&nbsp;|&nbsp;$~', '', $before);
var_dump($before, $after);

Trimming the "non-breaking space" sequence (`\xC2\xA0`) only:

$before = html_entity_decode("&nbsp;abc&nbsp;xyz&nbsp;"); 
$after = preg_replace('~^\xC2\xA0|\xC2\xA0$~', '', $before);

Trimming non-breaking space along with other whitespace characters:

Trimming both "non-breaking space" sequence and HTML entity

$before = html_entity_decode("&nbsp;")."&nbsp;abc&nbsp;xyz&nbsp;"; 
$pattern = '~^(\xC2\xA0|&nbsp;)*(.*?)(\xC2\xA0|&nbsp;)*$~';
$after = preg_replace($pattern, '$2', $before);
var_dump($before, $after);

Trimming "non-breaking space" sequence, HTML entity and regular whitespace characters

$before = "&nbsp;&nbsp; ".html_entity_decode("&nbsp;")."&nbsp; abc \n&nbsp; ";
$pattern = '~^(\xC2\xA0|&nbsp;| |\r|\n|\t)*(.*?)(\xC2\xA0|&nbsp;| |\r|\n|\t)*$~';
$after = preg_replace($pattern, '$2', $before);
var_dump($before, $after);

with u modifier, the \s meta character in PHP regex recognizes the non-breaking space as well
starting from PHP 8.4, there will be mb_trim() function, which is not only multi-byte safe, but also trims the non-breaking space character by default (along with many other space characters as well).

Trimming literal ` ` HTML entities from a string

$before = "&nbsp;abc&nbsp;xyz&nbsp;"; 
$after = preg_replace('~^(?:&nbsp;)+|(?:&nbsp;)+$~', '', $before);
var_dump($before, $after);

Trimming the "non-breaking space" sequences (`\xC2\xA0`) only:

$before = html_entity_decode("&nbsp;abc&nbsp;xyz&nbsp;"); 
$after = preg_replace('~^(?:\xC2\xA0)+|(?:\xC2\xA0)+$~', '', $before);

Trimming non-breaking spaces along with other whitespace characters:

Trimming both "non-breaking space" sequences and HTML entities

$before = html_entity_decode("&nbsp;")."&nbsp;abc&nbsp;xyz&nbsp;"; 
$pattern = '~^(?:\xC2\xA0|&nbsp;)*(.*?)(?:\xC2\xA0|&nbsp;)*$~';
$after = preg_replace($pattern, '$1', $before);
var_dump($before, $after);

Trimming "non-breaking space" sequences, HTML entities and regular whitespace characters

$before = "&nbsp;&nbsp; ".html_entity_decode("&nbsp;")."&nbsp; abc \n&nbsp; ";
$pattern = '~^(?:\xC2\xA0|&nbsp;| |\r|\n|\t)*(.*?)(?:\xC2\xA0|&nbsp;| |\r|\n|\t)*$~';
$after = preg_replace($pattern, '$1', $before);
var_dump($before, $after);

deleted 3 characters in body

Source Link

edited May 19, 2025 at 19:35

Gras Double

16.5k
9
62
56

The main problem is the way trim() function works.

One cannot reliably trim a substring using this function, because it takes its second argument as a collection of characters, each of which would be trimmed from the string. For example, trim("Hello Webbs ", " "); will remove trailing "b" and "s" from the name as well, returning just "Hello We". Therefore, such a task can be reliably done with a regular expression only.

The same goes for multi-byte characters, such as non-breaking space. Each byte in this character is trimmed separately, which may corrupt the original string (e.g. trim("· Hello world", "\xC2\xA0");). The good news, there are options to solve this problem:

with u modifier, the \s meta character in PHP regex recognizes the non-breaking-space as well
starting from PHP 8.4, there will be mb_trim() function, which is not only multi-byte safe, but also trims the non-breaking space character by default (along with many other space characters as well).

Below you will find recipes for various use cases

Trimming literal ` ` HTML entity from a string

$before = "&nbsp;abc&nbsp;xyz&nbsp;"; 
$after = preg_replace('~^&nbsp;|&nbsp;$~', '', $before);
var_dump($before, $after);

Trimming the "non-breaking space" sequence (`\xC2\xA0`) only:

$before = html_entity_decode("&nbsp;abc&nbsp;xyz&nbsp;"); 
$after = preg_replace('~^\xC2\xA0|\xC2\xA0$~', '', $before);

Trimming non-breaking space along with other whitespace characters:

PHP < 8.4

$after = preg_replace('~^\s+|\s+$/~u''~^\s+|\s+$~u', '$2''', $before);

PHP >= 8.4

$after = mb_trim($before);

Trimming both "non-breaking space" sequence and HTML entity

Here, you have to add them both to regex.

It must be understood, that there is no way to reliably chain two trimming functions (e.g. trim(preg_replace(...))) because trimming should be done strictly in one pass. Simply because first function won't notice characters removed by second one and vice versa. Hence preg_replace('~^( )*(.*?)( )*$~', '$2', trim("  abc")); will leave leading spaces intact. Therefore, if you need to remove both substrings and multi-byte characters, regex is still the only option.

$before = html_entity_decode("&nbsp;")."&nbsp;abc&nbsp;xyz&nbsp;"; 
$pattern = '~^(\xC2\xA0|&nbsp;)*(.*?)(\xC2\xA0|&nbsp;)*$~';
$after = preg_replace($pattern, '$2', $before);
var_dump($before, $after);

Trimming "non-breaking space" sequence, HTML entity and regular whitespace characters

$before = "&nbsp;&nbsp; ".html_entity_decode("&nbsp;")."&nbsp; abc \n&nbsp; ";
$pattern = '~^(\xC2\xA0|&nbsp;| |\r|\n|\t)*(.*?)(\xC2\xA0|&nbsp;| |\r|\n|\t)*$~';
$after = preg_replace($pattern, '$2', $before);
var_dump($before, $after);

The main problem is the way trim() function works.

One cannot reliably trim a substring using this function, because it takes its second argument as a collection of characters, each of which would be trimmed from the string. For example, trim("Hello Webbs ", " "); will remove trailing "b" and "s" from the name as well, returning just "Hello We". Therefore, such a task can be reliably done with a regular expression only.

The same goes for multi-byte characters, such as non-breaking space. Each byte in this character is trimmed separately, which may corrupt the original string (e.g. trim("· Hello world", "\xC2\xA0");). The good news, there are options to solve this problem:

with u modifier, the \s meta character in PHP regex recognizes the non-breaking-space as well
starting from PHP 8.4, there will be mb_trim() function, which is not only multi-byte safe, but also trims the non-breaking space character by default (along with many other space characters as well).

Below you will find recipes for various use cases

Trimming literal ` ` HTML entity from a string

$before = "&nbsp;abc&nbsp;xyz&nbsp;"; 
$after = preg_replace('~^&nbsp;|&nbsp;$~', '', $before);
var_dump($before, $after);

Trimming the "non-breaking space" sequence (`\xC2\xA0`) only:

$before = html_entity_decode("&nbsp;abc&nbsp;xyz&nbsp;"); 
$after = preg_replace('~^\xC2\xA0|\xC2\xA0$~', '', $before);

Trimming non-breaking space along with other whitespace characters:

PHP < 8.4

$after = preg_replace('~^\s+|\s+$/~u', '$2', $before);

PHP >= 8.4

$after = mb_trim($before);

Trimming both "non-breaking space" sequence and HTML entity

Here, you have to add them both to regex.

It must be understood, that there is no way to reliably chain two trimming functions (e.g. trim(preg_replace(...))) because trimming should be done strictly in one pass. Simply because first function won't notice characters removed by second one and vice versa. Hence preg_replace('~^( )*(.*?)( )*$~', '$2', trim("  abc")); will leave leading spaces intact. Therefore, if you need to remove both substrings and multi-byte characters, regex is still the only option.

$before = html_entity_decode("&nbsp;")."&nbsp;abc&nbsp;xyz&nbsp;"; 
$pattern = '~^(\xC2\xA0|&nbsp;)*(.*?)(\xC2\xA0|&nbsp;)*$~';
$after = preg_replace($pattern, '$2', $before);
var_dump($before, $after);

Trimming "non-breaking space" sequence, HTML entity and regular whitespace characters

$before = "&nbsp;&nbsp; ".html_entity_decode("&nbsp;")."&nbsp; abc \n&nbsp; ";
$pattern = '~^(\xC2\xA0|&nbsp;| |\r|\n|\t)*(.*?)(\xC2\xA0|&nbsp;| |\r|\n|\t)*$~';
$after = preg_replace($pattern, '$2', $before);
var_dump($before, $after);

The main problem is the way trim() function works.

One cannot reliably trim a substring using this function, because it takes its second argument as a collection of characters, each of which would be trimmed from the string. For example, trim("Hello Webbs ", " "); will remove trailing "b" and "s" from the name as well, returning just "Hello We". Therefore, such a task can be reliably done with a regular expression only.

The same goes for multi-byte characters, such as non-breaking space. Each byte in this character is trimmed separately, which may corrupt the original string (e.g. trim("· Hello world", "\xC2\xA0");). The good news, there are options to solve this problem:

with u modifier, the \s meta character in PHP regex recognizes the non-breaking-space as well
starting from PHP 8.4, there will be mb_trim() function, which is not only multi-byte safe, but also trims the non-breaking space character by default (along with many other space characters as well).

Below you will find recipes for various use cases

Trimming literal ` ` HTML entity from a string

$before = "&nbsp;abc&nbsp;xyz&nbsp;"; 
$after = preg_replace('~^&nbsp;|&nbsp;$~', '', $before);
var_dump($before, $after);

Trimming the "non-breaking space" sequence (`\xC2\xA0`) only:

$before = html_entity_decode("&nbsp;abc&nbsp;xyz&nbsp;"); 
$after = preg_replace('~^\xC2\xA0|\xC2\xA0$~', '', $before);

Trimming non-breaking space along with other whitespace characters:

PHP < 8.4

$after = preg_replace('~^\s+|\s+$~u', '', $before);

PHP >= 8.4

$after = mb_trim($before);

Trimming both "non-breaking space" sequence and HTML entity

Here, you have to add them both to regex.

It must be understood, that there is no way to reliably chain two trimming functions (e.g. trim(preg_replace(...))) because trimming should be done strictly in one pass. Simply because first function won't notice characters removed by second one and vice versa. Hence preg_replace('~^( )*(.*?)( )*$~', '$2', trim("  abc")); will leave leading spaces intact. Therefore, if you need to remove both substrings and multi-byte characters, regex is still the only option.

$before = html_entity_decode("&nbsp;")."&nbsp;abc&nbsp;xyz&nbsp;"; 
$pattern = '~^(\xC2\xA0|&nbsp;)*(.*?)(\xC2\xA0|&nbsp;)*$~';
$after = preg_replace($pattern, '$2', $before);
var_dump($before, $after);

Trimming "non-breaking space" sequence, HTML entity and regular whitespace characters

$before = "&nbsp;&nbsp; ".html_entity_decode("&nbsp;")."&nbsp; abc \n&nbsp; ";
$pattern = '~^(\xC2\xA0|&nbsp;| |\r|\n|\t)*(.*?)(\xC2\xA0|&nbsp;| |\r|\n|\t)*$~';
$after = preg_replace($pattern, '$2', $before);
var_dump($before, $after);

added 32 characters in body; added 10 characters in body

Source Link

edited Aug 12, 2024 at 14:11

Your Common Sense

158k
42
227
374

Loading

deleted 3 characters in body

Source Link

edited May 23, 2024 at 20:54

hakre

200k
55
455
870

Loading

added 393 characters in body

Source Link

edited May 23, 2024 at 9:52

Your Common Sense

158k
42
227
374

Loading

added 820 characters in body

Source Link

edited May 23, 2024 at 8:56

Your Common Sense

158k
42
227
374

Loading

added 9 characters in body; added 28 characters in body

Source Link

edited May 23, 2024 at 7:41

Your Common Sense

158k
42
227
374

Loading

deleted 44 characters in body; deleted 2 characters in body; deleted 16 characters in body

Source Link

edited May 22, 2024 at 12:04

Your Common Sense

158k
42
227
374

Loading

Source Link

created May 22, 2024 at 10:06

Your Common Sense

158k
42
227
374

Loading

Collectives™ on Stack Overflow

Return to article

Post Timeline

Trimming literal ` ` HTML entityentities from a string

Trimming the "non-breaking space" sequencesequences (`\xC2\xA0`) only:

Trimming non-breaking spacespaces along with other whitespace characters:

Trimming both "non-breaking space" sequencesequences and HTML entityentities

Trimming "non-breaking space" sequencesequences, HTML entityentities and regular whitespace characters

Trimming literal ` ` HTML entity from a string

Trimming the "non-breaking space" sequence (`\xC2\xA0`) only:

Trimming non-breaking space along with other whitespace characters:

Trimming both "non-breaking space" sequence and HTML entity

Trimming "non-breaking space" sequence, HTML entity and regular whitespace characters

Trimming literal ` ` HTML entities from a string

Trimming the "non-breaking space" sequences (`\xC2\xA0`) only:

Trimming non-breaking spaces along with other whitespace characters:

Trimming both "non-breaking space" sequences and HTML entities

Trimming "non-breaking space" sequences, HTML entities and regular whitespace characters

Trimming literal ` ` HTML entity from a string

Trimming the "non-breaking space" sequence (`\xC2\xA0`) only:

Trimming non-breaking space along with other whitespace characters:

Trimming both "non-breaking space" sequence and HTML entity

Trimming "non-breaking space" sequence, HTML entity and regular whitespace characters

Trimming literal ` ` HTML entity from a string

Trimming the "non-breaking space" sequence (`\xC2\xA0`) only:

Trimming non-breaking space along with other whitespace characters:

Trimming both "non-breaking space" sequence and HTML entity

Trimming "non-breaking space" sequence, HTML entity and regular whitespace characters

Trimming literal ` ` HTML entity from a string

Trimming the "non-breaking space" sequence (`\xC2\xA0`) only:

Trimming non-breaking space along with other whitespace characters:

Trimming both "non-breaking space" sequence and HTML entity

Trimming "non-breaking space" sequence, HTML entity and regular whitespace characters

Collectives™ on Stack Overflow

Trimming literal &nbsp; HTML entityentities from a string

Trimming the "non-breaking space" sequencesequences (\xC2\xA0) only:

Trimming non-breaking spacespaces along with other whitespace characters:

Trimming both "non-breaking space" sequencesequences and HTML entityentities

Trimming "non-breaking space" sequencesequences, HTML entityentities and regular whitespace characters

Trimming literal &nbsp; HTML entity from a string

Trimming the "non-breaking space" sequence (\xC2\xA0) only:

Trimming non-breaking space along with other whitespace characters:

Trimming both "non-breaking space" sequence and HTML entity

Trimming "non-breaking space" sequence, HTML entity and regular whitespace characters

Trimming literal &nbsp; HTML entities from a string

Trimming the "non-breaking space" sequences (\xC2\xA0) only:

Trimming non-breaking spaces along with other whitespace characters:

Trimming both "non-breaking space" sequences and HTML entities

Trimming "non-breaking space" sequences, HTML entities and regular whitespace characters

Trimming literal &nbsp; HTML entity from a string

Trimming the "non-breaking space" sequence (\xC2\xA0) only:

Trimming non-breaking space along with other whitespace characters:

Trimming both "non-breaking space" sequence and HTML entity

Trimming "non-breaking space" sequence, HTML entity and regular whitespace characters

Trimming literal &nbsp; HTML entity from a string

Trimming the "non-breaking space" sequence (\xC2\xA0) only:

Trimming non-breaking space along with other whitespace characters:

Trimming both "non-breaking space" sequence and HTML entity

Trimming "non-breaking space" sequence, HTML entity and regular whitespace characters

Trimming literal &nbsp; HTML entity from a string

Trimming the "non-breaking space" sequence (\xC2\xA0) only:

Trimming non-breaking space along with other whitespace characters:

Trimming both "non-breaking space" sequence and HTML entity

Trimming "non-breaking space" sequence, HTML entity and regular whitespace characters

Trimming literal ` ` HTML entityentities from a string

Trimming the "non-breaking space" sequencesequences (`\xC2\xA0`) only:

Trimming literal ` ` HTML entity from a string

Trimming the "non-breaking space" sequence (`\xC2\xA0`) only:

Trimming literal ` ` HTML entities from a string

Trimming the "non-breaking space" sequences (`\xC2\xA0`) only:

Trimming literal ` ` HTML entity from a string

Trimming the "non-breaking space" sequence (`\xC2\xA0`) only:

Trimming literal ` ` HTML entity from a string

Trimming the "non-breaking space" sequence (`\xC2\xA0`) only:

Trimming literal ` ` HTML entity from a string

Trimming the "non-breaking space" sequence (`\xC2\xA0`) only: