Skip to main content
added 395 characters in body
Source Link
aviro
  • 7k
  • 18
  • 36

I'm trying to understand why grep -w (version 3.1 of the GNU implementation) matches only the first occurrence of a certain pattern in a line.

Here's an example. I would expect that it would match n1, n2 and n3, but it matches only the first one.

$ echo 'n1=1 n2=2 n3=3' | grep -ow "n[0-9]=*"
n1

Or if I tell it to match only n2 or n3, again it matches the first one, and ignores n3.

$ echo 'n1=1 n2=2 n3=3' | grep -ow "n[23]=*" 
n2

What am I missing here? Is there any explanation for this behavior, or is it some sort of bug in grep?

The idea is to match either:

  1. n[0-9]n[0-9] preceded and followed by a non-word character.
  2. A substring that begin with n[0-9]n[0-9] followed by any number of = characters and ends with a non-word character.

What am I missing here? Is there any explanationSo for this behaviorinstance, orif the string is it some sort of bug in grep?n1=1 n2=== n3=3 n4== n5, the expected result should be:

n1
n2===
n3
n4==
n5

Clarification: I know that the goal can be achieved by grep -ow -e 'n[0-9]' -e "n[0-9]=*", but that's beside the point. The goal of the question is to understand how grep works.

Addition tests

If I add n<num>= to different places in the line (without a following word character after the equal sign), it will match those as well, but again it will ignore n3=3.

$ echo 'n1=1 n2= n3=3 n4=' | grep -ow "n[0-9]=*"
n1
n2=
n4=

Last thing that I've found is that if I add -P to interpret the pattern as a Perl-compatible regular expression, it doesn't seem to keep the -w description that says that the substring "must be either at the end of the line or followed by a non-word constituent character", since it matches n1= even though it's followed by the character 1, which is a word constituent character ("letters, digits, and the underscore").

$ echo 'n1=1 n2= n3=3 n4=' | grep -owP "n[0-9]=*"
n1=
n2
n3=
n4

So it seems that grep -wP searches for a word boundary at the end of the substring rather than a non-word constituent character. It seems equivalent to:

$ echo 'n1=1 n2= n3=3 n4=' | grep -o "\bn[0-9]=*\b"
n1=
n2
n3=
n4

I'm trying to understand why grep -w (version 3.1 of the GNU implementation) matches only the first occurrence of a certain pattern in a line.

Here's an example. I would expect that it would match n1, n2 and n3, but it matches only the first one.

$ echo 'n1=1 n2=2 n3=3' | grep -ow "n[0-9]=*"
n1

Or if I tell it to match only n2 or n3, again it matches the first one, and ignores n3.

$ echo 'n1=1 n2=2 n3=3' | grep -ow "n[23]=*" 
n2

The idea is to match either:

  1. n[0-9] preceded and followed by a non-word character
  2. substring that begin with n[0-9] followed by any number of = characters and ends with a non-word character.

What am I missing here? Is there any explanation for this behavior, or is it some sort of bug in grep?

Clarification: I know that the goal can be achieved by grep -ow -e 'n[0-9]' -e "n[0-9]=*", but that's beside the point. The goal of the question is to understand how grep works.

Addition tests

If I add n<num>= to different places in the line (without a following word character after the equal sign), it will match those as well, but again it will ignore n3=3.

$ echo 'n1=1 n2= n3=3 n4=' | grep -ow "n[0-9]=*"
n1
n2=
n4=

Last thing that I've found is that if I add -P to interpret the pattern as a Perl-compatible regular expression, it doesn't seem to keep the -w description that says that the substring "must be either at the end of the line or followed by a non-word constituent character", since it matches n1= even though it's followed by the character 1, which is a word constituent character ("letters, digits, and the underscore").

$ echo 'n1=1 n2= n3=3 n4=' | grep -owP "n[0-9]=*"
n1=
n2
n3=
n4

So it seems that grep -wP searches for a word boundary at the end of the substring rather than a non-word constituent character. It seems equivalent to:

$ echo 'n1=1 n2= n3=3 n4=' | grep -o "\bn[0-9]=*\b"
n1=
n2
n3=
n4

I'm trying to understand why grep -w (version 3.1 of the GNU implementation) matches only the first occurrence of a certain pattern in a line.

Here's an example. I would expect that it would match n1, n2 and n3, but it matches only the first one.

$ echo 'n1=1 n2=2 n3=3' | grep -ow "n[0-9]=*"
n1

Or if I tell it to match only n2 or n3, again it matches the first one, and ignores n3.

$ echo 'n1=1 n2=2 n3=3' | grep -ow "n[23]=*" 
n2

What am I missing here? Is there any explanation for this behavior, or is it some sort of bug in grep?

The idea is to match either:

  1. n[0-9] preceded and followed by a non-word character.
  2. A substring that begin with n[0-9] followed by any number of = characters and ends with a non-word character.

So for instance, if the string is n1=1 n2=== n3=3 n4== n5, the expected result should be:

n1
n2===
n3
n4==
n5

Clarification: I know that the goal can be achieved by grep -ow -e 'n[0-9]' -e "n[0-9]=*", but that's beside the point. The goal of the question is to understand how grep works.

Addition tests

If I add n<num>= to different places in the line (without a following word character after the equal sign), it will match those as well, but again it will ignore n3=3.

$ echo 'n1=1 n2= n3=3 n4=' | grep -ow "n[0-9]=*"
n1
n2=
n4=

Last thing that I've found is that if I add -P to interpret the pattern as a Perl-compatible regular expression, it doesn't seem to keep the -w description that says that the substring "must be either at the end of the line or followed by a non-word constituent character", since it matches n1= even though it's followed by the character 1, which is a word constituent character ("letters, digits, and the underscore").

$ echo 'n1=1 n2= n3=3 n4=' | grep -owP "n[0-9]=*"
n1=
n2
n3=
n4

So it seems that grep -wP searches for a word boundary at the end of the substring rather than a non-word constituent character. It seems equivalent to:

$ echo 'n1=1 n2= n3=3 n4=' | grep -o "\bn[0-9]=*\b"
n1=
n2
n3=
n4
added 395 characters in body
Source Link
aviro
  • 7k
  • 18
  • 36

I'm trying to understand why grep -w (version 3.1 of the GNU implementation) matches only the first occurrence of a certain pattern in a line.

Here's an example. I would expect that it would match n1, n2 and n3, but it matches only the first one.

$ echo 'n1=1 n2=2 n3=3' | grep -ow "n[0-9]=*"
n1

Or if I tell it to match only n2 or n3, again it matches the first one, and ignores n3.

$ echo 'n1=1 n2=2 n3=3' | grep -ow "n[23]=*" 
n2

The idea is to match either:

  1. n[0-9] preceded and followed by a non-word character
  2. substring that begin with n[0-9] followed by any number of = characters and ends with a non-word character.

What am I missing here? Is there any explanation for this behavior, or is it some sort of bug in grep?

Clarification: I know that the goal can be achieved by grep -ow -e 'n[0-9]' -e "n[0-9]=*", but that's beside the point. The goal of the question is to understand how grep works.

Addition tests

If I add n<num>= to different places in the line (without a following word character after the equal sign), it will match those as well, but again it will ignore n3=3.

$ echo 'n1=1 n2= n3=3 n4=' | grep -ow "n[0-9]=*"
n1
n2=
n4=

Last thing that I've found is that if I add -P to interpret the pattern as a Perl-compatible regular expression, it doesn't seem to keep the -w description that says that the substring "must be either at the end of the line or followed by a non-word constituent character", since it matches n1= even though it's followed by the character 1, which is a word constituent character ("letters, digits, and the underscore").

$ echo 'n1=1 n2= n3=3 n4=' | grep -owP "n[0-9]=*"
n1=
n2
n3=
n4

So it seems that grep -wP searches for a word boundary at the end of the substring rather than a non-word constituent character. It seems equivalent to:

$ echo 'n1=1 n2= n3=3 n4=' | grep -o "\bn[0-9]=*\b"
n1=
n2
n3=
n4

I'm trying to understand why grep -w (version 3.1 of the GNU implementation) matches only the first occurrence of a certain pattern in a line.

Here's an example. I would expect that it would match n1, n2 and n3, but it matches only the first one.

$ echo 'n1=1 n2=2 n3=3' | grep -ow "n[0-9]=*"
n1

Or if I tell it to match only n2 or n3, again it matches the first one, and ignores n3.

$ echo 'n1=1 n2=2 n3=3' | grep -ow "n[23]=*" 
n2

What am I missing here? Is there any explanation for this behavior, or is it some sort of bug in grep?

Addition tests

If I add n<num>= to different places in the line (without a following word character after the equal sign), it will match those as well, but again it will ignore n3=3.

$ echo 'n1=1 n2= n3=3 n4=' | grep -ow "n[0-9]=*"
n1
n2=
n4=

Last thing that I've found is that if I add -P to interpret the pattern as a Perl-compatible regular expression, it doesn't seem to keep the -w description that says that the substring "must be either at the end of the line or followed by a non-word constituent character", since it matches n1= even though it's followed by the character 1, which is a word constituent character ("letters, digits, and the underscore").

$ echo 'n1=1 n2= n3=3 n4=' | grep -owP "n[0-9]=*"
n1=
n2
n3=
n4

So it seems that grep -wP searches for a word boundary at the end of the substring rather than a non-word constituent character. It seems equivalent to:

$ echo 'n1=1 n2= n3=3 n4=' | grep -o "\bn[0-9]=*\b"
n1=
n2
n3=
n4

I'm trying to understand why grep -w (version 3.1 of the GNU implementation) matches only the first occurrence of a certain pattern in a line.

Here's an example. I would expect that it would match n1, n2 and n3, but it matches only the first one.

$ echo 'n1=1 n2=2 n3=3' | grep -ow "n[0-9]=*"
n1

Or if I tell it to match only n2 or n3, again it matches the first one, and ignores n3.

$ echo 'n1=1 n2=2 n3=3' | grep -ow "n[23]=*" 
n2

The idea is to match either:

  1. n[0-9] preceded and followed by a non-word character
  2. substring that begin with n[0-9] followed by any number of = characters and ends with a non-word character.

What am I missing here? Is there any explanation for this behavior, or is it some sort of bug in grep?

Clarification: I know that the goal can be achieved by grep -ow -e 'n[0-9]' -e "n[0-9]=*", but that's beside the point. The goal of the question is to understand how grep works.

Addition tests

If I add n<num>= to different places in the line (without a following word character after the equal sign), it will match those as well, but again it will ignore n3=3.

$ echo 'n1=1 n2= n3=3 n4=' | grep -ow "n[0-9]=*"
n1
n2=
n4=

Last thing that I've found is that if I add -P to interpret the pattern as a Perl-compatible regular expression, it doesn't seem to keep the -w description that says that the substring "must be either at the end of the line or followed by a non-word constituent character", since it matches n1= even though it's followed by the character 1, which is a word constituent character ("letters, digits, and the underscore").

$ echo 'n1=1 n2= n3=3 n4=' | grep -owP "n[0-9]=*"
n1=
n2
n3=
n4

So it seems that grep -wP searches for a word boundary at the end of the substring rather than a non-word constituent character. It seems equivalent to:

$ echo 'n1=1 n2= n3=3 n4=' | grep -o "\bn[0-9]=*\b"
n1=
n2
n3=
n4
clarifaction as to what implementation of grep we're talking about (from comments)
Source Link
Stéphane Chazelas
  • 594.5k
  • 97
  • 1.1k
  • 1.7k

I'm trying to understand why grep -w (version 3.1 of the GNU implementation) matches only the first occurrence of a certain pattern in a line.

Here's an example. I would expect that it would match n1, n2 and n3, but it matches only the first one.

$ echo 'n1=1 n2=2 n3=3' | grep -ow "n[0-9]=*"
n1

Or if I tell it to match only n2 or n3, again it matches the first one, and ignores n3.

$ echo 'n1=1 n2=2 n3=3' | grep -ow "n[23]=*" 
n2

What am I missing here? Is there any explanation for this behavior, or is it some sort of bug in grep?

Addition tests

If I add n<num>= to different places in the line (without a following word character after the equal sign), it will match those as well, but again it will ignore n3=3.

$ echo 'n1=1 n2= n3=3 n4=' | grep -ow "n[0-9]=*"
n1
n2=
n4=

Last thing that I've found is that if I add -P to interpret the pattern as a Perl-compatible regular expression, it doesn't seem to keep the -w description that says that the substring "must be either at the end of the line or followed by a non-word constituent character", since it matches n1= even though it's followed by the character 1, which is a word constituent character ("letters, digits, and the underscore").

$ echo 'n1=1 n2= n3=3 n4=' | grep -owP "n[0-9]=*"
n1=
n2
n3=
n4

So it seems that grep -wP searches for a word boundary at the end of the substring rather than a non-word constituent character. It seems equivalent to:

$ echo 'n1=1 n2= n3=3 n4=' | grep -o "\bn[0-9]=*\b"
n1=
n2
n3=
n4

I'm trying to understand why grep -w matches only the first occurrence of a certain pattern in a line.

Here's an example. I would expect that it would match n1, n2 and n3, but it matches only the first one.

$ echo 'n1=1 n2=2 n3=3' | grep -ow "n[0-9]=*"
n1

Or if I tell it to match only n2 or n3, again it matches the first one, and ignores n3.

$ echo 'n1=1 n2=2 n3=3' | grep -ow "n[23]=*" 
n2

What am I missing here? Is there any explanation for this behavior, or is it some sort of bug in grep?

Addition tests

If I add n<num>= to different places in the line (without a following word character after the equal sign), it will match those as well, but again it will ignore n3=3.

$ echo 'n1=1 n2= n3=3 n4=' | grep -ow "n[0-9]=*"
n1
n2=
n4=

Last thing that I've found is that if I add -P to interpret the pattern as a Perl-compatible regular expression, it doesn't seem to keep the -w description that says that the substring "must be either at the end of the line or followed by a non-word constituent character", since it matches n1= even though it's followed by the character 1, which is a word constituent character ("letters, digits, and the underscore").

$ echo 'n1=1 n2= n3=3 n4=' | grep -owP "n[0-9]=*"
n1=
n2
n3=
n4

So it seems that grep -wP searches for a word boundary at the end of the substring rather than a non-word constituent character. It seems equivalent to:

$ echo 'n1=1 n2= n3=3 n4=' | grep -o "\bn[0-9]=*\b"
n1=
n2
n3=
n4

I'm trying to understand why grep -w (version 3.1 of the GNU implementation) matches only the first occurrence of a certain pattern in a line.

Here's an example. I would expect that it would match n1, n2 and n3, but it matches only the first one.

$ echo 'n1=1 n2=2 n3=3' | grep -ow "n[0-9]=*"
n1

Or if I tell it to match only n2 or n3, again it matches the first one, and ignores n3.

$ echo 'n1=1 n2=2 n3=3' | grep -ow "n[23]=*" 
n2

What am I missing here? Is there any explanation for this behavior, or is it some sort of bug in grep?

Addition tests

If I add n<num>= to different places in the line (without a following word character after the equal sign), it will match those as well, but again it will ignore n3=3.

$ echo 'n1=1 n2= n3=3 n4=' | grep -ow "n[0-9]=*"
n1
n2=
n4=

Last thing that I've found is that if I add -P to interpret the pattern as a Perl-compatible regular expression, it doesn't seem to keep the -w description that says that the substring "must be either at the end of the line or followed by a non-word constituent character", since it matches n1= even though it's followed by the character 1, which is a word constituent character ("letters, digits, and the underscore").

$ echo 'n1=1 n2= n3=3 n4=' | grep -owP "n[0-9]=*"
n1=
n2
n3=
n4

So it seems that grep -wP searches for a word boundary at the end of the substring rather than a non-word constituent character. It seems equivalent to:

$ echo 'n1=1 n2= n3=3 n4=' | grep -o "\bn[0-9]=*\b"
n1=
n2
n3=
n4
Source Link
aviro
  • 7k
  • 18
  • 36
Loading