1

Sorry, if this is a lame question, I am quite new to Java development and regex patterns.

Basically I have a long string which has multiple occurrences of substrings like InstanceId: i-1234XYAadsadd, and I want to extract out the i-1234XYAadsadd part in an ArrayList using regex. Please help with the correct regular expression here.

//instanceResultString is the actual string containing occurences of pattern
List<String> instanceIdList = new ArrayList<String>(); 
    Matcher matcher = Pattern.compile("InstanceId:[.]*,").matcher(instanceResultString);
    while(matcher.find())
        instanceIdList.add(matcher.group());
6
  • 1
    Maybe "InstanceId:\\s*(\\S+)," and access .group(1) Commented Sep 26, 2016 at 20:24
  • I guess this will include the whitespace(s) after InstanceId:, I want to exclude those Commented Sep 26, 2016 at 20:25
  • 1
    Ideone is too slow now, can't show the demo. Try Matcher matcher = Pattern.compile("InstanceId:\\s*(\\S+),").matcher(instanceResultString); while(matcher.find()) instanceIdList.add(matcher.group(1)); Commented Sep 26, 2016 at 20:27
  • See ideone.com/LaFmXw Commented Sep 26, 2016 at 20:30
  • Great, thanks a lot :) Commented Sep 26, 2016 at 20:33

1 Answer 1

1

The only point here is that the strings you want to match are made of non-whitespace characters. The \S pattern matches a non-whitespace char.

See this demo:

String instanceResultString = "InstanceId: i-1234XYAadsadd, More text: InstanceId: u-222tttt, dde InstanceId: i-8999UIIIgjkkd,";
List<String> instanceIdList = new ArrayList<String>();
Matcher matcher = Pattern.compile("InstanceId:\\s*(\\S+),").matcher(instanceResultString);
while(matcher.find())
    instanceIdList.add(matcher.group(1));
System.out.println(instanceIdList); // Demo line
// => [i-1234XYAadsadd, u-222tttt, i-8999UIIIgjkkd]

Where

  • InstanceId: - a literal InstanceId: text
  • \\s* - zero or more whitespaces
  • (\\S+) - Group 1 (we grab these contents with .group(1)) capturing 1 or more (but as many as possible) non-whitespace symbols
  • , - a comma.
Sign up to request clarification or add additional context in comments.

1 Comment

Note that you might also consider InstanceId:\\s*(\\S+)\\b pattern to match on the last word boundary after as many non-whitespace symbols as there are after InstanceId: + zero or more whitespaces.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.