1

I want to get a portion of string that contains one of the targeted words. For example, from the following example string:

...
def a:
...
target1
...
def b:
...
def c:
...

I want to get this part:

def a:
...
target1
...

Here is my Java code:

s = "(def\\W(.*)\\W(target1|target2|target3)\\W(.*)def\\W)";
Pattern p = Pattern.compile(s);
Matcher m = p.matcher(sourceString);

while(m.find()){
    System.out.println(m.group(0));
}

The problem is that it does not find out anything.

Thanks so much for your help!

4
  • 1
    And your question? What's the problem with your code so far? Commented Jul 23, 2015 at 16:08
  • @tnw Right now the code does not print out anything. Commented Jul 23, 2015 at 16:10
  • @Ryan .. I am not sure if this will work for you but can you try something like "(def.*(target1|target2|target3))" Commented Jul 23, 2015 at 16:17
  • Are you trying to find targets from one def to the next def? I assume that means no def in between right? Commented Jul 23, 2015 at 17:14

3 Answers 3

1

You can use:

Pattern p = Pattern.compile(
  "(\\bdef\\s((?!\\bdef\\b).)*?\\b(?:target1|target2|target3)\\b.*?(?=sdef))",
  Pattern.DOTALL);

RegEx Demo

Sign up to request clarification or add additional context in comments.

1 Comment

thanks a lot. It looks great, only have one problem now. if the input is the following, it maps multiple "def". ... def b: ... def a: ... target1 ... def b: ... def c: ...
1

By default . represents any character expect line separators. To make dot accept all character add Pattern.DOTALL flag.

Pattern p = Pattern.compile(s,Pattern.DOTALL);

You may also want to make .* reluctant with .*?.

You can use regex like

String s = "(def\\W(.*?)\\W(target1|target2|target3)\\W(.*?))def\\W";
//          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - group 1

and inside loop use m.group(1) instead of m.group(0)

1 Comment

thanks! I made some progress with your solution. Right now, there is another problem. the "def" gets mapped multiple times now. Anywhy I can restrict "def" occurs twice only?
1

Try something like this -

 #  "(?ms)^def\\b(?:(?!^def\\b).)*?\\b(target[123])\\b(?:(?!^def\\b).)*"

 (?ms)                         # Multi-line and Dot-all modes
 ^ def \b                      # 'def'
 (?:
      (?! ^ def \b )                # Not 'def' 
      . 
 )*?
 \b 
 ( target [123] )              # (1), 'target 1 or 2 or 3
 \b 
 (?:
      (?! ^ def \b )                # Not 'def' 
      . 
 )*

Output:

 **  Grp 0 -  ( pos 0 , len 27 ) 
def a:
...
target1
...

 **  Grp 1 -  ( pos 13 , len 7 ) 
target1  

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.