Skip to main content
added 93 characters in body
Source Link
terdon
  • 253.8k
  • 69
  • 484
  • 724

A greedy matching system just means it will try to find the largest matching string (meaning the first largest, it will stop searching at the first match for the whole regex), not that it will stop at a non-matching string even if matching strings exist. Think of it as the order to "find me the largest possible match, but do find me a match!". Since allowing the first .\+ to eat up the entire string would mean that the regex doesn't match, the engine will go back and try something else.

In your case, it is even simpler since you are anchoring the regex to the beginning and end of the line (^ and $), so the .+ can never reach the end of the line because there are other things in the regular expression after it.

Here's an example that might help explain greedy matching:

$ echo aaaaaaa | sed 's/a*/B/'
B

Here, since the regular expression a* means "match 0 or more consecutive a characters", the greedy match will find the largest possible matching string. A non-greedy match, using PCRE for example, would return:

$ echo aaaaaaa | perl -pe 's/a*?/B/'
Baaaaaaa

That's because non-greedy will find the shortest matching string instead of the longest.

I don't understand why you mention alphanumeric or how that is relevant. Perhaps you have misunderstood . and think that only matches alphanumeric strings, but it doesn't; . will match everything (depending on which flavor of regular expressions you are using and what options you give it, it can even match newline characters). If you want alphanumeric strings, you can use the POSIX character class [[:alnum:]] which matches [a-zA-Z0-9].

A greedy matching system just means it will try to find the largest matching string, not that it will stop at a non-matching string even if matching strings exist. Think of it as the order to "find me the largest possible match, but do find me a match!". Since allowing the first .\+ to eat up the entire string would mean that the regex doesn't match, the engine will go back and try something else.

In your case, it is even simpler since you are anchoring the regex to the beginning and end of the line (^ and $), so the .+ can never reach the end of the line because there are other things in the regular expression after it.

Here's an example that might help explain greedy matching:

$ echo aaaaaaa | sed 's/a*/B/'
B

Here, since the regular expression a* means "match 0 or more consecutive a characters", the greedy match will find the largest possible matching string. A non-greedy match, using PCRE for example, would return:

$ echo aaaaaaa | perl -pe 's/a*?/B/'
Baaaaaaa

That's because non-greedy will find the shortest matching string instead of the longest.

I don't understand why you mention alphanumeric or how that is relevant. Perhaps you have misunderstood . and think that only matches alphanumeric strings, but it doesn't; . will match everything (depending on which flavor of regular expressions you are using and what options you give it, it can even match newline characters). If you want alphanumeric strings, you can use the POSIX character class [[:alnum:]] which matches [a-zA-Z0-9].

A greedy matching system just means it will try to find the largest matching string (meaning the first largest, it will stop searching at the first match for the whole regex), not that it will stop at a non-matching string even if matching strings exist. Think of it as the order to "find me the largest possible match, but do find me a match!". Since allowing the first .\+ to eat up the entire string would mean that the regex doesn't match, the engine will go back and try something else.

In your case, it is even simpler since you are anchoring the regex to the beginning and end of the line (^ and $), so the .+ can never reach the end of the line because there are other things in the regular expression after it.

Here's an example that might help explain greedy matching:

$ echo aaaaaaa | sed 's/a*/B/'
B

Here, since the regular expression a* means "match 0 or more consecutive a characters", the greedy match will find the largest possible matching string. A non-greedy match, using PCRE for example, would return:

$ echo aaaaaaa | perl -pe 's/a*?/B/'
Baaaaaaa

That's because non-greedy will find the shortest matching string instead of the longest.

I don't understand why you mention alphanumeric or how that is relevant. Perhaps you have misunderstood . and think that only matches alphanumeric strings, but it doesn't; . will match everything (depending on which flavor of regular expressions you are using and what options you give it, it can even match newline characters). If you want alphanumeric strings, you can use the POSIX character class [[:alnum:]] which matches [a-zA-Z0-9].

Source Link
terdon
  • 253.8k
  • 69
  • 484
  • 724

A greedy matching system just means it will try to find the largest matching string, not that it will stop at a non-matching string even if matching strings exist. Think of it as the order to "find me the largest possible match, but do find me a match!". Since allowing the first .\+ to eat up the entire string would mean that the regex doesn't match, the engine will go back and try something else.

In your case, it is even simpler since you are anchoring the regex to the beginning and end of the line (^ and $), so the .+ can never reach the end of the line because there are other things in the regular expression after it.

Here's an example that might help explain greedy matching:

$ echo aaaaaaa | sed 's/a*/B/'
B

Here, since the regular expression a* means "match 0 or more consecutive a characters", the greedy match will find the largest possible matching string. A non-greedy match, using PCRE for example, would return:

$ echo aaaaaaa | perl -pe 's/a*?/B/'
Baaaaaaa

That's because non-greedy will find the shortest matching string instead of the longest.

I don't understand why you mention alphanumeric or how that is relevant. Perhaps you have misunderstood . and think that only matches alphanumeric strings, but it doesn't; . will match everything (depending on which flavor of regular expressions you are using and what options you give it, it can even match newline characters). If you want alphanumeric strings, you can use the POSIX character class [[:alnum:]] which matches [a-zA-Z0-9].