Skip to main content
added 318 characters in body
Source Link
Gras Double
  • 16.5k
  • 9
  • 62
  • 56

Note the use of the u pattern modifier. As mentioned earlier, one effect of this modifier is that it changes the meaning of the \s escape sequence from "any whitespace character" (equivalent to [\r\n\t\f\v ]) to "any kind of invisible character" (equivalent to [\p{Z}\h\v]), which is significantly broader.

PHP >= 8.4

PHP >= 8.4

Note the use of the u pattern modifier. As mentioned earlier, one effect of this modifier is that it changes the meaning of the \s escape sequence from "any whitespace character" (equivalent to [\r\n\t\f\v ]) to "any kind of invisible character" (equivalent to [\p{Z}\h\v]), which is significantly broader.

PHP >= 8.4

added 38 characters in body
Source Link
Gras Double
  • 16.5k
  • 9
  • 62
  • 56

Trimming literal   HTML entityentities from a string

$before = " abc xyz "; 
$after = preg_replace('~^ | $~''~^(?: )+|(?: )+$~', '', $before);
var_dump($before, $after);

Trimming the "non-breaking space" sequencesequences (\xC2\xA0) only:

$before = html_entity_decode(" abc xyz "); 
$after = preg_replace('~^\xC2\xA0|\xC2\xA0$~''~^(?:\xC2\xA0)+|(?:\xC2\xA0)+$~', '', $before);

Trimming non-breaking spacespaces along with other whitespace characters:

Trimming both "non-breaking space" sequencesequences and HTML entityentities

$before = html_entity_decode(" ")." abc xyz "; 
$pattern = '~^(?:\xC2\xA0| )*(.*?)(?:\xC2\xA0| )*$~';
$after = preg_replace($pattern, '$2''$1', $before);
var_dump($before, $after);

Trimming "non-breaking space" sequencesequences, HTML entityentities and regular whitespace characters

$before = "   ".html_entity_decode(" ")."  abc \n  ";
$pattern = '~^(?:\xC2\xA0| | |\r|\n|\t)*(.*?)(?:\xC2\xA0| | |\r|\n|\t)*$~';
$after = preg_replace($pattern, '$2''$1', $before);
var_dump($before, $after);

Trimming literal   HTML entity from a string

$before = " abc xyz "; 
$after = preg_replace('~^ | $~', '', $before);
var_dump($before, $after);

Trimming the "non-breaking space" sequence (\xC2\xA0) only:

$before = html_entity_decode(" abc xyz "); 
$after = preg_replace('~^\xC2\xA0|\xC2\xA0$~', '', $before);

Trimming non-breaking space along with other whitespace characters:

Trimming both "non-breaking space" sequence and HTML entity

$before = html_entity_decode(" ")." abc xyz "; 
$pattern = '~^(\xC2\xA0| )*(.*?)(\xC2\xA0| )*$~';
$after = preg_replace($pattern, '$2', $before);
var_dump($before, $after);

Trimming "non-breaking space" sequence, HTML entity and regular whitespace characters

$before = "   ".html_entity_decode(" ")."  abc \n  ";
$pattern = '~^(\xC2\xA0| | |\r|\n|\t)*(.*?)(\xC2\xA0| | |\r|\n|\t)*$~';
$after = preg_replace($pattern, '$2', $before);
var_dump($before, $after);

Trimming literal   HTML entities from a string

$before = " abc xyz "; 
$after = preg_replace('~^(?: )+|(?: )+$~', '', $before);
var_dump($before, $after);

Trimming the "non-breaking space" sequences (\xC2\xA0) only:

$before = html_entity_decode(" abc xyz "); 
$after = preg_replace('~^(?:\xC2\xA0)+|(?:\xC2\xA0)+$~', '', $before);

Trimming non-breaking spaces along with other whitespace characters:

Trimming both "non-breaking space" sequences and HTML entities

$before = html_entity_decode(" ")." abc xyz "; 
$pattern = '~^(?:\xC2\xA0| )*(.*?)(?:\xC2\xA0| )*$~';
$after = preg_replace($pattern, '$1', $before);
var_dump($before, $after);

Trimming "non-breaking space" sequences, HTML entities and regular whitespace characters

$before = "   ".html_entity_decode(" ")."  abc \n  ";
$pattern = '~^(?:\xC2\xA0| | |\r|\n|\t)*(.*?)(?:\xC2\xA0| | |\r|\n|\t)*$~';
$after = preg_replace($pattern, '$1', $before);
var_dump($before, $after);
deleted 3 characters in body
Source Link
Gras Double
  • 16.5k
  • 9
  • 62
  • 56

The main problem is the way trim() function works.

One cannot reliably trim a substring using this function, because it takes its second argument as a collection of characters, each of which would be trimmed from the string. For example, trim("Hello Webbs ", " "); will remove trailing "b" and "s" from the name as well, returning just "Hello We". Therefore, such a task can be reliably done with a regular expression only.

The same goes for multi-byte characters, such as non-breaking space. Each byte in this character is trimmed separately, which may corrupt the original string (e.g. trim("· Hello world", "\xC2\xA0");). The good news, there are options to solve this problem:

Below you will find recipes for various use cases

Trimming literal   HTML entity from a string

$before = " abc xyz "; 
$after = preg_replace('~^ | $~', '', $before);
var_dump($before, $after);

Trimming the "non-breaking space" sequence (\xC2\xA0) only:

$before = html_entity_decode(" abc xyz "); 
$after = preg_replace('~^\xC2\xA0|\xC2\xA0$~', '', $before);

Trimming non-breaking space along with other whitespace characters:

PHP < 8.4

$after = preg_replace('~^\s+|\s+$/~u''~^\s+|\s+$~u', '$2''', $before);

PHP >= 8.4

$after = mb_trim($before);

Trimming both "non-breaking space" sequence and HTML entity

Here, you have to add them both to regex.

It must be understood, that there is no way to reliably chain two trimming functions (e.g. trim(preg_replace(...))) because trimming should be done strictly in one pass. Simply because first function won't notice characters removed by second one and vice versa. Hence preg_replace('~^(&nbsp;)*(.*?)(&nbsp;)*$~', '$2', trim("&nbsp; abc")); will leave leading spaces intact. Therefore, if you need to remove both substrings and multi-byte characters, regex is still the only option.

$before = html_entity_decode("&nbsp;")."&nbsp;abc&nbsp;xyz&nbsp;"; 
$pattern = '~^(\xC2\xA0|&nbsp;)*(.*?)(\xC2\xA0|&nbsp;)*$~';
$after = preg_replace($pattern, '$2', $before);
var_dump($before, $after);

Trimming "non-breaking space" sequence, HTML entity and regular whitespace characters

$before = "&nbsp;&nbsp; ".html_entity_decode("&nbsp;")."&nbsp; abc \n&nbsp; ";
$pattern = '~^(\xC2\xA0|&nbsp;| |\r|\n|\t)*(.*?)(\xC2\xA0|&nbsp;| |\r|\n|\t)*$~';
$after = preg_replace($pattern, '$2', $before);
var_dump($before, $after);

The main problem is the way trim() function works.

One cannot reliably trim a substring using this function, because it takes its second argument as a collection of characters, each of which would be trimmed from the string. For example, trim("Hello Webbs&nbsp;", "&nbsp;"); will remove trailing "b" and "s" from the name as well, returning just "Hello We". Therefore, such a task can be reliably done with a regular expression only.

The same goes for multi-byte characters, such as non-breaking space. Each byte in this character is trimmed separately, which may corrupt the original string (e.g. trim("· Hello world", "\xC2\xA0");). The good news, there are options to solve this problem:

Below you will find recipes for various use cases

Trimming literal &nbsp; HTML entity from a string

$before = "&nbsp;abc&nbsp;xyz&nbsp;"; 
$after = preg_replace('~^&nbsp;|&nbsp;$~', '', $before);
var_dump($before, $after);

Trimming the "non-breaking space" sequence (\xC2\xA0) only:

$before = html_entity_decode("&nbsp;abc&nbsp;xyz&nbsp;"); 
$after = preg_replace('~^\xC2\xA0|\xC2\xA0$~', '', $before);

Trimming non-breaking space along with other whitespace characters:

PHP < 8.4

$after = preg_replace('~^\s+|\s+$/~u', '$2', $before);

PHP >= 8.4

$after = mb_trim($before);

Trimming both "non-breaking space" sequence and HTML entity

Here, you have to add them both to regex.

It must be understood, that there is no way to reliably chain two trimming functions (e.g. trim(preg_replace(...))) because trimming should be done strictly in one pass. Simply because first function won't notice characters removed by second one and vice versa. Hence preg_replace('~^(&nbsp;)*(.*?)(&nbsp;)*$~', '$2', trim("&nbsp; abc")); will leave leading spaces intact. Therefore, if you need to remove both substrings and multi-byte characters, regex is still the only option.

$before = html_entity_decode("&nbsp;")."&nbsp;abc&nbsp;xyz&nbsp;"; 
$pattern = '~^(\xC2\xA0|&nbsp;)*(.*?)(\xC2\xA0|&nbsp;)*$~';
$after = preg_replace($pattern, '$2', $before);
var_dump($before, $after);

Trimming "non-breaking space" sequence, HTML entity and regular whitespace characters

$before = "&nbsp;&nbsp; ".html_entity_decode("&nbsp;")."&nbsp; abc \n&nbsp; ";
$pattern = '~^(\xC2\xA0|&nbsp;| |\r|\n|\t)*(.*?)(\xC2\xA0|&nbsp;| |\r|\n|\t)*$~';
$after = preg_replace($pattern, '$2', $before);
var_dump($before, $after);

The main problem is the way trim() function works.

One cannot reliably trim a substring using this function, because it takes its second argument as a collection of characters, each of which would be trimmed from the string. For example, trim("Hello Webbs&nbsp;", "&nbsp;"); will remove trailing "b" and "s" from the name as well, returning just "Hello We". Therefore, such a task can be reliably done with a regular expression only.

The same goes for multi-byte characters, such as non-breaking space. Each byte in this character is trimmed separately, which may corrupt the original string (e.g. trim("· Hello world", "\xC2\xA0");). The good news, there are options to solve this problem:

Below you will find recipes for various use cases

Trimming literal &nbsp; HTML entity from a string

$before = "&nbsp;abc&nbsp;xyz&nbsp;"; 
$after = preg_replace('~^&nbsp;|&nbsp;$~', '', $before);
var_dump($before, $after);

Trimming the "non-breaking space" sequence (\xC2\xA0) only:

$before = html_entity_decode("&nbsp;abc&nbsp;xyz&nbsp;"); 
$after = preg_replace('~^\xC2\xA0|\xC2\xA0$~', '', $before);

Trimming non-breaking space along with other whitespace characters:

PHP < 8.4

$after = preg_replace('~^\s+|\s+$~u', '', $before);

PHP >= 8.4

$after = mb_trim($before);

Trimming both "non-breaking space" sequence and HTML entity

Here, you have to add them both to regex.

It must be understood, that there is no way to reliably chain two trimming functions (e.g. trim(preg_replace(...))) because trimming should be done strictly in one pass. Simply because first function won't notice characters removed by second one and vice versa. Hence preg_replace('~^(&nbsp;)*(.*?)(&nbsp;)*$~', '$2', trim("&nbsp; abc")); will leave leading spaces intact. Therefore, if you need to remove both substrings and multi-byte characters, regex is still the only option.

$before = html_entity_decode("&nbsp;")."&nbsp;abc&nbsp;xyz&nbsp;"; 
$pattern = '~^(\xC2\xA0|&nbsp;)*(.*?)(\xC2\xA0|&nbsp;)*$~';
$after = preg_replace($pattern, '$2', $before);
var_dump($before, $after);

Trimming "non-breaking space" sequence, HTML entity and regular whitespace characters

$before = "&nbsp;&nbsp; ".html_entity_decode("&nbsp;")."&nbsp; abc \n&nbsp; ";
$pattern = '~^(\xC2\xA0|&nbsp;| |\r|\n|\t)*(.*?)(\xC2\xA0|&nbsp;| |\r|\n|\t)*$~';
$after = preg_replace($pattern, '$2', $before);
var_dump($before, $after);
added 32 characters in body; added 10 characters in body
Source Link
Your Common Sense
  • 158k
  • 42
  • 227
  • 374
Loading
deleted 3 characters in body
Source Link
hakre
  • 200k
  • 55
  • 455
  • 870
Loading
added 393 characters in body
Source Link
Your Common Sense
  • 158k
  • 42
  • 227
  • 374
Loading
added 820 characters in body
Source Link
Your Common Sense
  • 158k
  • 42
  • 227
  • 374
Loading
added 9 characters in body; added 28 characters in body
Source Link
Your Common Sense
  • 158k
  • 42
  • 227
  • 374
Loading
deleted 44 characters in body; deleted 2 characters in body; deleted 16 characters in body
Source Link
Your Common Sense
  • 158k
  • 42
  • 227
  • 374
Loading
Source Link
Your Common Sense
  • 158k
  • 42
  • 227
  • 374
Loading