1

I'm looking for a regular expression to find all instances of a CSS class name in HTML markup. So far I have this, assuming row is the class name that I'm looking for:

class=\"[a-zA-Z0-9\-_\s]*row[a-zA-Z0-9\-_\s]*\"

It correctly matches all of the following:

class="foo_bar bar row test"
class="row"
class="hello foo bar  row"
class=" foo bar  row test "

And correctly doesn't match this:

class="hello"  row

Unfortunately it incorrectly matches these (false positives):

class="narrow"
class="rowdy"

What regex will find a specific CSS class name in HTML?

Update There are lots of comments about how I shouldn't parse the DOM with regex. My use case is to do a 'find all' in a large project with thousands of HTML files to find where specific CSS classes are being used. I'm not operating inside of a browser or have access to a DOM.

10
  • 2
    Just to be sure: do you have to use regex as opposed to a DOM parser here? If you have to, I'd say adding \b (word boundary) before and after row should do it, though I didn't really think this through so might be better ways. Commented Mar 26, 2019 at 22:09
  • Try class="(?:row|[^"]* row)(?![^" ])[^"]*" if _row_ is not allowed too. See live demo here regex101.com/r/Xq4sT9/1 Commented Mar 26, 2019 at 22:14
  • Also what about "hello a-row"? Commented Mar 26, 2019 at 22:19
  • Oh yeah a word boundary isn't enough because of dashes (at least). Commented Mar 26, 2019 at 22:22
  • You forgot that class = " (notice the spaces) is also a legit syntax. And that a text class="row is also a legit text. Stop using regex to parse DOM. Use what browsers already use. A DOMParser. Tony the Pony he comes... Commented Mar 26, 2019 at 22:31

2 Answers 2

1

You have to make boundaries but \b isn't enough since it matches the position between - and r in a-row which is expected but not intended. To define this boundary to only allow spaces or the position right after or before " of class attribute, you will need to write a pattern with two branches:

class="(?:row|[^"]* row)(?![^" ])[^"]*"

The above could be shorten to (but not preferred):

class="(?:[^"]* )?row(?![^" ])[^"]*"

Shorter but the same as longer one (talking performance-wise):

class="(?:[^"]* )??row(?: [^"]*)?"

Regex breakdown:

  • class=" Match class=" literally
  • (?: Start of non-capturing group
    • row Match row
    • | Or
    • [^"]* row Match row preceded by a space character
  • ) End of capturing group
  • (?![^" ]) The next immediate character should be space or "
  • [^"]*" Match up to and including "

See live demo here

Sign up to request clarification or add additional context in comments.

2 Comments

This is great and has saved me lots of time cleaning up CSS/HTML in a large project.
Glad to hear. I just looked at the regex and realized it could be written shorter without affecting performance. So I added it.
1

Try the below regex

(class\s?=\s?)\"([\d\w\s-])(\brow\b)([\d\w\s])\"

Tested all the cases you mentioned

https://regex101.com

1 Comment

Thanks, that's pretty good. It fails this test though: class="flex-mt90 foo bar row" row, but I realize that I didn't have it in my list of examples. regex101.com/r/jeos4r/2

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.