Revisions to grep -w matches only the first occurrence of a pattern in a line

added 395 characters in body

Source Link

edited Mar 3, 2022 at 17:41

aviro

7k
18
36

I'm trying to understand why grep -w (version 3.1 of the GNU implementation) matches only the first occurrence of a certain pattern in a line.

Here's an example. I would expect that it would match n1, n2 and n3, but it matches only the first one.

$ echo 'n1=1 n2=2 n3=3' | grep -ow "n[0-9]=*"
n1

Or if I tell it to match only n2 or n3, again it matches the first one, and ignores n3.

$ echo 'n1=1 n2=2 n3=3' | grep -ow "n[23]=*" 
n2

What am I missing here? Is there any explanation for this behavior, or is it some sort of bug in grep?

The idea is to match either:

n[0-9]n[0-9] preceded and followed by a non-word character.
A substring that begin with n[0-9]n[0-9] followed by any number of = characters and ends with a non-word character.

What am I missing here? Is there any explanationSo for this behaviorinstance, orif the string is it some sort of bug in grep?n1=1 n2=== n3=3 n4== n5, the expected result should be:

n1
n2===
n3
n4==
n5

Clarification: I know that the goal can be achieved by grep -ow -e 'n[0-9]' -e "n[0-9]=*", but that's beside the point. The goal of the question is to understand how grep works.

Addition tests

If I add n<num>= to different places in the line (without a following word character after the equal sign), it will match those as well, but again it will ignore n3=3.

$ echo 'n1=1 n2= n3=3 n4=' | grep -ow "n[0-9]=*"
n1
n2=
n4=

Last thing that I've found is that if I add -P to interpret the pattern as a Perl-compatible regular expression, it doesn't seem to keep the -w description that says that the substring "must be either at the end of the line or followed by a non-word constituent character", since it matches n1= even though it's followed by the character 1, which is a word constituent character ("letters, digits, and the underscore").

$ echo 'n1=1 n2= n3=3 n4=' | grep -owP "n[0-9]=*"
n1=
n2
n3=
n4

So it seems that grep -wP searches for a word boundary at the end of the substring rather than a non-word constituent character. It seems equivalent to:

$ echo 'n1=1 n2= n3=3 n4=' | grep -o "\bn[0-9]=*\b"
n1=
n2
n3=
n4

I'm trying to understand why grep -w (version 3.1 of the GNU implementation) matches only the first occurrence of a certain pattern in a line.

Here's an example. I would expect that it would match n1, n2 and n3, but it matches only the first one.

$ echo 'n1=1 n2=2 n3=3' | grep -ow "n[0-9]=*"
n1

Or if I tell it to match only n2 or n3, again it matches the first one, and ignores n3.

$ echo 'n1=1 n2=2 n3=3' | grep -ow "n[23]=*" 
n2

The idea is to match either:

n[0-9] preceded and followed by a non-word character
substring that begin with n[0-9] followed by any number of = characters and ends with a non-word character.

What am I missing here? Is there any explanation for this behavior, or is it some sort of bug in grep?

Clarification: I know that the goal can be achieved by grep -ow -e 'n[0-9]' -e "n[0-9]=*", but that's beside the point. The goal of the question is to understand how grep works.

Addition tests

If I add n<num>= to different places in the line (without a following word character after the equal sign), it will match those as well, but again it will ignore n3=3.

$ echo 'n1=1 n2= n3=3 n4=' | grep -ow "n[0-9]=*"
n1
n2=
n4=

Last thing that I've found is that if I add -P to interpret the pattern as a Perl-compatible regular expression, it doesn't seem to keep the -w description that says that the substring "must be either at the end of the line or followed by a non-word constituent character", since it matches n1= even though it's followed by the character 1, which is a word constituent character ("letters, digits, and the underscore").

$ echo 'n1=1 n2= n3=3 n4=' | grep -owP "n[0-9]=*"
n1=
n2
n3=
n4

So it seems that grep -wP searches for a word boundary at the end of the substring rather than a non-word constituent character. It seems equivalent to:

$ echo 'n1=1 n2= n3=3 n4=' | grep -o "\bn[0-9]=*\b"
n1=
n2
n3=
n4

I'm trying to understand why grep -w (version 3.1 of the GNU implementation) matches only the first occurrence of a certain pattern in a line.

Here's an example. I would expect that it would match n1, n2 and n3, but it matches only the first one.

$ echo 'n1=1 n2=2 n3=3' | grep -ow "n[0-9]=*"
n1

Or if I tell it to match only n2 or n3, again it matches the first one, and ignores n3.

$ echo 'n1=1 n2=2 n3=3' | grep -ow "n[23]=*" 
n2

What am I missing here? Is there any explanation for this behavior, or is it some sort of bug in grep?

The idea is to match either:

n[0-9] preceded and followed by a non-word character.
A substring that begin with n[0-9] followed by any number of = characters and ends with a non-word character.

So for instance, if the string is n1=1 n2=== n3=3 n4== n5, the expected result should be:

n1
n2===
n3
n4==
n5

Clarification: I know that the goal can be achieved by grep -ow -e 'n[0-9]' -e "n[0-9]=*", but that's beside the point. The goal of the question is to understand how grep works.

Addition tests

If I add n<num>= to different places in the line (without a following word character after the equal sign), it will match those as well, but again it will ignore n3=3.

$ echo 'n1=1 n2= n3=3 n4=' | grep -ow "n[0-9]=*"
n1
n2=
n4=

Last thing that I've found is that if I add -P to interpret the pattern as a Perl-compatible regular expression, it doesn't seem to keep the -w description that says that the substring "must be either at the end of the line or followed by a non-word constituent character", since it matches n1= even though it's followed by the character 1, which is a word constituent character ("letters, digits, and the underscore").

$ echo 'n1=1 n2= n3=3 n4=' | grep -owP "n[0-9]=*"
n1=
n2
n3=
n4

So it seems that grep -wP searches for a word boundary at the end of the substring rather than a non-word constituent character. It seems equivalent to:

$ echo 'n1=1 n2= n3=3 n4=' | grep -o "\bn[0-9]=*\b"
n1=
n2
n3=
n4

added 395 characters in body

Source Link

edited Mar 3, 2022 at 17:37

aviro

7k
18
36

I'm trying to understand why grep -w (version 3.1 of the GNU implementation) matches only the first occurrence of a certain pattern in a line.

Here's an example. I would expect that it would match n1, n2 and n3, but it matches only the first one.

$ echo 'n1=1 n2=2 n3=3' | grep -ow "n[0-9]=*"
n1

Or if I tell it to match only n2 or n3, again it matches the first one, and ignores n3.

$ echo 'n1=1 n2=2 n3=3' | grep -ow "n[23]=*" 
n2

The idea is to match either:

n[0-9] preceded and followed by a non-word character

substring that begin with n[0-9] followed by any number of = characters and ends with a non-word character.

What am I missing here? Is there any explanation for this behavior, or is it some sort of bug in grep?

Clarification: I know that the goal can be achieved by grep -ow -e 'n[0-9]' -e "n[0-9]=*", but that's beside the point. The goal of the question is to understand how grep works.

Addition tests

If I add n<num>= to different places in the line (without a following word character after the equal sign), it will match those as well, but again it will ignore n3=3.

$ echo 'n1=1 n2= n3=3 n4=' | grep -ow "n[0-9]=*"
n1
n2=
n4=

Last thing that I've found is that if I add -P to interpret the pattern as a Perl-compatible regular expression, it doesn't seem to keep the -w description that says that the substring "must be either at the end of the line or followed by a non-word constituent character", since it matches n1= even though it's followed by the character 1, which is a word constituent character ("letters, digits, and the underscore").

$ echo 'n1=1 n2= n3=3 n4=' | grep -owP "n[0-9]=*"
n1=
n2
n3=
n4

So it seems that grep -wP searches for a word boundary at the end of the substring rather than a non-word constituent character. It seems equivalent to:

$ echo 'n1=1 n2= n3=3 n4=' | grep -o "\bn[0-9]=*\b"
n1=
n2
n3=
n4

I'm trying to understand why grep -w (version 3.1 of the GNU implementation) matches only the first occurrence of a certain pattern in a line.

Here's an example. I would expect that it would match n1, n2 and n3, but it matches only the first one.

$ echo 'n1=1 n2=2 n3=3' | grep -ow "n[0-9]=*"
n1

Or if I tell it to match only n2 or n3, again it matches the first one, and ignores n3.

$ echo 'n1=1 n2=2 n3=3' | grep -ow "n[23]=*" 
n2

What am I missing here? Is there any explanation for this behavior, or is it some sort of bug in grep?

Addition tests

If I add n<num>= to different places in the line (without a following word character after the equal sign), it will match those as well, but again it will ignore n3=3.

$ echo 'n1=1 n2= n3=3 n4=' | grep -ow "n[0-9]=*"
n1
n2=
n4=

Last thing that I've found is that if I add -P to interpret the pattern as a Perl-compatible regular expression, it doesn't seem to keep the -w description that says that the substring "must be either at the end of the line or followed by a non-word constituent character", since it matches n1= even though it's followed by the character 1, which is a word constituent character ("letters, digits, and the underscore").

$ echo 'n1=1 n2= n3=3 n4=' | grep -owP "n[0-9]=*"
n1=
n2
n3=
n4

So it seems that grep -wP searches for a word boundary at the end of the substring rather than a non-word constituent character. It seems equivalent to:

$ echo 'n1=1 n2= n3=3 n4=' | grep -o "\bn[0-9]=*\b"
n1=
n2
n3=
n4

I'm trying to understand why grep -w (version 3.1 of the GNU implementation) matches only the first occurrence of a certain pattern in a line.

Here's an example. I would expect that it would match n1, n2 and n3, but it matches only the first one.

$ echo 'n1=1 n2=2 n3=3' | grep -ow "n[0-9]=*"
n1

Or if I tell it to match only n2 or n3, again it matches the first one, and ignores n3.

$ echo 'n1=1 n2=2 n3=3' | grep -ow "n[23]=*" 
n2

The idea is to match either:

n[0-9] preceded and followed by a non-word character

substring that begin with n[0-9] followed by any number of = characters and ends with a non-word character.

What am I missing here? Is there any explanation for this behavior, or is it some sort of bug in grep?

Clarification: I know that the goal can be achieved by grep -ow -e 'n[0-9]' -e "n[0-9]=*", but that's beside the point. The goal of the question is to understand how grep works.

Addition tests

If I add n<num>= to different places in the line (without a following word character after the equal sign), it will match those as well, but again it will ignore n3=3.

$ echo 'n1=1 n2= n3=3 n4=' | grep -ow "n[0-9]=*"
n1
n2=
n4=

Last thing that I've found is that if I add -P to interpret the pattern as a Perl-compatible regular expression, it doesn't seem to keep the -w description that says that the substring "must be either at the end of the line or followed by a non-word constituent character", since it matches n1= even though it's followed by the character 1, which is a word constituent character ("letters, digits, and the underscore").

$ echo 'n1=1 n2= n3=3 n4=' | grep -owP "n[0-9]=*"
n1=
n2
n3=
n4

So it seems that grep -wP searches for a word boundary at the end of the substring rather than a non-word constituent character. It seems equivalent to:

$ echo 'n1=1 n2= n3=3 n4=' | grep -o "\bn[0-9]=*\b"
n1=
n2
n3=
n4

clarifaction as to what implementation of grep we're talking about (from comments)

Source Link

edited Mar 3, 2022 at 16:23

Stéphane Chazelas

594.5k
97
1.1k
1.7k

I'm trying to understand why grep -w (version 3.1 of the GNU implementation) matches only the first occurrence of a certain pattern in a line.

Here's an example. I would expect that it would match n1, n2 and n3, but it matches only the first one.

$ echo 'n1=1 n2=2 n3=3' | grep -ow "n[0-9]=*"
n1

Or if I tell it to match only n2 or n3, again it matches the first one, and ignores n3.

$ echo 'n1=1 n2=2 n3=3' | grep -ow "n[23]=*" 
n2

What am I missing here? Is there any explanation for this behavior, or is it some sort of bug in grep?

Addition tests

If I add n<num>= to different places in the line (without a following word character after the equal sign), it will match those as well, but again it will ignore n3=3.

$ echo 'n1=1 n2= n3=3 n4=' | grep -ow "n[0-9]=*"
n1
n2=
n4=

Last thing that I've found is that if I add -P to interpret the pattern as a Perl-compatible regular expression, it doesn't seem to keep the -w description that says that the substring "must be either at the end of the line or followed by a non-word constituent character", since it matches n1= even though it's followed by the character 1, which is a word constituent character ("letters, digits, and the underscore").

$ echo 'n1=1 n2= n3=3 n4=' | grep -owP "n[0-9]=*"
n1=
n2
n3=
n4

So it seems that grep -wP searches for a word boundary at the end of the substring rather than a non-word constituent character. It seems equivalent to:

$ echo 'n1=1 n2= n3=3 n4=' | grep -o "\bn[0-9]=*\b"
n1=
n2
n3=
n4

I'm trying to understand why grep -w matches only the first occurrence of a certain pattern in a line.

Here's an example. I would expect that it would match n1, n2 and n3, but it matches only the first one.

$ echo 'n1=1 n2=2 n3=3' | grep -ow "n[0-9]=*"
n1

Or if I tell it to match only n2 or n3, again it matches the first one, and ignores n3.

$ echo 'n1=1 n2=2 n3=3' | grep -ow "n[23]=*" 
n2

What am I missing here? Is there any explanation for this behavior, or is it some sort of bug in grep?

Addition tests

If I add n<num>= to different places in the line (without a following word character after the equal sign), it will match those as well, but again it will ignore n3=3.

$ echo 'n1=1 n2= n3=3 n4=' | grep -ow "n[0-9]=*"
n1
n2=
n4=

Last thing that I've found is that if I add -P to interpret the pattern as a Perl-compatible regular expression, it doesn't seem to keep the -w description that says that the substring "must be either at the end of the line or followed by a non-word constituent character", since it matches n1= even though it's followed by the character 1, which is a word constituent character ("letters, digits, and the underscore").

$ echo 'n1=1 n2= n3=3 n4=' | grep -owP "n[0-9]=*"
n1=
n2
n3=
n4

So it seems that grep -wP searches for a word boundary at the end of the substring rather than a non-word constituent character. It seems equivalent to:

$ echo 'n1=1 n2= n3=3 n4=' | grep -o "\bn[0-9]=*\b"
n1=
n2
n3=
n4

I'm trying to understand why grep -w (version 3.1 of the GNU implementation) matches only the first occurrence of a certain pattern in a line.

Here's an example. I would expect that it would match n1, n2 and n3, but it matches only the first one.

$ echo 'n1=1 n2=2 n3=3' | grep -ow "n[0-9]=*"
n1

Or if I tell it to match only n2 or n3, again it matches the first one, and ignores n3.

$ echo 'n1=1 n2=2 n3=3' | grep -ow "n[23]=*" 
n2

What am I missing here? Is there any explanation for this behavior, or is it some sort of bug in grep?

Addition tests

If I add n<num>= to different places in the line (without a following word character after the equal sign), it will match those as well, but again it will ignore n3=3.

$ echo 'n1=1 n2= n3=3 n4=' | grep -ow "n[0-9]=*"
n1
n2=
n4=

Last thing that I've found is that if I add -P to interpret the pattern as a Perl-compatible regular expression, it doesn't seem to keep the -w description that says that the substring "must be either at the end of the line or followed by a non-word constituent character", since it matches n1= even though it's followed by the character 1, which is a word constituent character ("letters, digits, and the underscore").

$ echo 'n1=1 n2= n3=3 n4=' | grep -owP "n[0-9]=*"
n1=
n2
n3=
n4

So it seems that grep -wP searches for a word boundary at the end of the substring rather than a non-word constituent character. It seems equivalent to:

$ echo 'n1=1 n2= n3=3 n4=' | grep -o "\bn[0-9]=*\b"
n1=
n2
n3=
n4

Source Link

asked Mar 3, 2022 at 13:36

aviro

7k
18
36

Loading

Stack Exchange Network

Return to Question

Post Timeline

Addition tests

Addition tests

Addition tests

Addition tests

Addition tests

Addition tests

Addition tests

Addition tests

Addition tests