Skip to main content
edited tags; edited tags
Link
200_success
  • 145.7k
  • 22
  • 191
  • 481
added 38 characters in body
Source Link
Tim Seguine
  • 213
  • 2
  • 10

Like anything that shouldn't be done, I decided to see if it is possible to match <script> tags robustly using regexes in PHP. Since there is no arbitrary nesting, I figured it should at least be possible.

This is what I came up with. It is designed to handle every edge case I could think of, including:

  • arbitrary attributes in the opening script tag
  • single and multiline comments and single and double-quoted strings (which might include arbitrary escape sequences) in the javascript which may contain the characters </script>
  • Captures the smallest script tag it finds.

Did I miss anything? Ideally, I want it to match exclusively anything a browser would consider a script element (might not be possible), but at the very least, I would like it to match only well-formed script tags with well-formed javascript.

Here is the string for the regex that I am passing to preg_match:

'#<script(?:[^>"]*(?:"[^"]*")?)*>((?:"(?:[^\\\\\\n"]*(?:\\\\.)*)*"|\'(?:[^\\\\\\n\']*(?:[^\\\\.)*)*\'|<[^/]?|/\\*(?:[^*]|\\*[^/]?)*\\*/|//.*|/[^/*]|[^\'"</])*)</script>#';

Note: I am not using this in production.

Like anything that shouldn't be done, I decided to see if it is possible to match <script> tags robustly using regexes in PHP. Since there is no arbitrary nesting, I figured it should at least be possible.

This is what I came up with. It is designed to handle every edge case I could think of, including:

  • arbitrary attributes in the opening script tag
  • single and multiline comments and single and double-quoted strings (which might include arbitrary escape sequences) in the javascript which may contain the characters </script>
  • Captures the smallest script tag it finds.

Did I miss anything? Ideally, I want it to match exclusively anything a browser would consider a script element (might not be possible), but at the very least, I would like it to match only well-formed script tags with well-formed javascript.

Here is the string for the regex that I am passing to preg_match:

'#<script(?:[^>"]*(?:"[^"]*")?)*>("(?:[^\\\\\\n"]*(?:\\\\.)*)*"|\'(?:[^\\\\\\n\']*(?:[^/])*\\*/|//.*|/[^/*]|[^\'"</])*)</script>#';

Note: I am not using this in production.

Like anything that shouldn't be done, I decided to see if it is possible to match <script> tags robustly using regexes in PHP. Since there is no arbitrary nesting, I figured it should at least be possible.

This is what I came up with. It is designed to handle every edge case I could think of, including:

  • arbitrary attributes in the opening script tag
  • single and multiline comments and single and double-quoted strings (which might include arbitrary escape sequences) in the javascript which may contain the characters </script>
  • Captures the smallest script tag it finds.

Did I miss anything? Ideally, I want it to match exclusively anything a browser would consider a script element (might not be possible), but at the very least, I would like it to match only well-formed script tags with well-formed javascript.

Here is the string for the regex that I am passing to preg_match:

'#<script(?:[^>"]*(?:"[^"]*")?)*>((?:"(?:[^\\\\\\n"]*(?:\\\\.)*)*"|\'(?:[^\\\\\\n\']*(?:\\\\.)*)*\'|<[^/]?|/\\*(?:[^*]|\\*[^/]?)*\\*/|//.*|/[^/*]|[^\'"</])*)</script>#';

Note: I am not using this in production.

fixed parenthesis
Source Link
Tim Seguine
  • 213
  • 2
  • 10

Like anything that shouldn't be done, I decided to see if it is possible to match <script> tags robustly using regexes in PHP. Since there is no arbitrary nesting, I figured it should at least be possible.

This is what I came up with. It is designed to handle every edge case I could think of, including:

  • arbitrary attributes in the opening script tag
  • single and multiline comments and single and double-quoted strings (which might include arbitrary escape sequences) in the javascript which may contain the characters </script>
  • Captures the smallest script tag it finds.

Did I miss anything? Ideally, I want it to match exclusively anything a browser would consider a script element (might not be possible), but at the very least, I would like it to match only well-formed script tags with well-formed javascript.

Here is the string for the regex that I am passing to preg_match:

'#<script(?:[^>"]*(?:"[^"]*")?)*>((?:"(?:[^\\\\\\n"]*(?:\\\\.)*)*"|\'(?:[^\\\\\\n\']*(?:[^/])*\\*/|//.*|/[^/*]|[^\'"</])*)</script>#';

Note: I am not using this in production.

Like anything that shouldn't be done, I decided to see if it is possible to match <script> tags robustly using regexes in PHP. Since there is no arbitrary nesting, I figured it should at least be possible.

This is what I came up with. It is designed to handle every edge case I could think of, including:

  • arbitrary attributes in the opening script tag
  • single and multiline comments and single and double-quoted strings (which might include arbitrary escape sequences) in the javascript which may contain the characters </script>
  • Captures the smallest script tag it finds.

Did I miss anything? Ideally, I want it to match exclusively anything a browser would consider a script element (might not be possible), but at the very least, I would like it to match only well-formed script tags with well-formed javascript.

Here is the string for the regex that I am passing to preg_match:

'#<script(?:[^>"]*(?:"[^"]*")?)*>((?:"(?:[^\\\\\\n"]*(?:\\\\.)*)*"|\'(?:[^\\\\\\n\']*(?:[^/])*\\*/|//.*|/[^/*]|[^\'"</])*)</script>#';

Note: I am not using this in production.

Like anything that shouldn't be done, I decided to see if it is possible to match <script> tags robustly using regexes in PHP. Since there is no arbitrary nesting, I figured it should at least be possible.

This is what I came up with. It is designed to handle every edge case I could think of, including:

  • arbitrary attributes in the opening script tag
  • single and multiline comments and single and double-quoted strings (which might include arbitrary escape sequences) in the javascript which may contain the characters </script>
  • Captures the smallest script tag it finds.

Did I miss anything? Ideally, I want it to match exclusively anything a browser would consider a script element (might not be possible), but at the very least, I would like it to match only well-formed script tags with well-formed javascript.

Here is the string for the regex that I am passing to preg_match:

'#<script(?:[^>"]*(?:"[^"]*")?)*>("(?:[^\\\\\\n"]*(?:\\\\.)*)*"|\'(?:[^\\\\\\n\']*(?:[^/])*\\*/|//.*|/[^/*]|[^\'"</])*)</script>#';

Note: I am not using this in production.

added 48 characters in body
Source Link
Tim Seguine
  • 213
  • 2
  • 10
Loading
Source Link
Tim Seguine
  • 213
  • 2
  • 10
Loading