Skip to main content
Became Hot Network Question
added 113 characters in body
Source Link
Stéphane Chazelas
  • 594.5k
  • 97
  • 1.1k
  • 1.7k

According to chapter "9. Regular Expressions"9. Regular Expressions, subchapter "9.3.6 BREs Matching Multiple Characters"9.3.6 BREs Matching Multiple Characters, number "3.": "... the expression "\(a\(b\)*\)*\2" fails to match 'abab' ...".

But when I tried this using "grep"grep and "expr"expr:

echo "abab" | grep "\(a\(b\)*\)*\2"
expr "abab" : "\(a\(b\)*\)*\2"

they both matched matched the string "abab".

I guess according to the documentation, the first time the expression "\(a\(b\)*\)*" matches "ab" and the second time it matches "a" and according to 3. "...when a subexpression matches more than one string, a back-reference expression corresponding to the subexpression shall refer to the last matched string...", the "\2" back-references to the "b" from the first match. And therefore the expression "\(a\(b\)*\)*\2" actually matches the string 'abab'.

Could it be that the sample in the POSIX documentation is incorrect?

According to chapter "9. Regular Expressions", subchapter "9.3.6 BREs Matching Multiple Characters", number "3.": "... the expression "\(a\(b\)*\)*\2" fails to match 'abab' ...".

But when I tried this using "grep" and "expr":

echo "abab" | grep "\(a\(b\)*\)*\2"
expr "abab" : "\(a\(b\)*\)*\2"

they both matched matched the string "abab".

I guess according to the documentation, the first time the expression "\(a\(b\)*\)*" matches "ab" and the second time it matches "a" and according to 3. "...when a subexpression matches more than one string, a back-reference expression corresponding to the subexpression shall refer to the last matched string...", the "\2" back-references to the "b" from the first match. And therefore the expression "\(a\(b\)*\)*\2" actually matches the string 'abab'.

Could it be that the sample in the POSIX documentation is incorrect?

According to chapter 9. Regular Expressions, subchapter 9.3.6 BREs Matching Multiple Characters, number "3.": "... the expression "\(a\(b\)*\)*\2" fails to match 'abab' ...".

But when I tried this using grep and expr:

echo "abab" | grep "\(a\(b\)*\)*\2"
expr "abab" : "\(a\(b\)*\)*\2"

they both matched matched the string "abab".

I guess according to the documentation, the first time the expression "\(a\(b\)*\)*" matches "ab" and the second time it matches "a" and according to 3. "...when a subexpression matches more than one string, a back-reference expression corresponding to the subexpression shall refer to the last matched string...", the "\2" back-references to the "b" from the first match. And therefore the expression "\(a\(b\)*\)*\2" actually matches the string 'abab'.

Could it be that the sample in the POSIX documentation is incorrect?

Source Link
becke-ch
  • 223
  • 2
  • 6

POSIX: Regular Expression: "\(a\(b\)*\)*\2" matches "abab"

According to chapter "9. Regular Expressions", subchapter "9.3.6 BREs Matching Multiple Characters", number "3.": "... the expression "\(a\(b\)*\)*\2" fails to match 'abab' ...".

But when I tried this using "grep" and "expr":

echo "abab" | grep "\(a\(b\)*\)*\2"
expr "abab" : "\(a\(b\)*\)*\2"

they both matched matched the string "abab".

I guess according to the documentation, the first time the expression "\(a\(b\)*\)*" matches "ab" and the second time it matches "a" and according to 3. "...when a subexpression matches more than one string, a back-reference expression corresponding to the subexpression shall refer to the last matched string...", the "\2" back-references to the "b" from the first match. And therefore the expression "\(a\(b\)*\)*\2" actually matches the string 'abab'.

Could it be that the sample in the POSIX documentation is incorrect?