1

I am using Java. I need to parse the following line using regex :

<actions>::=<action><action>|X|<game>|alpha

It should give me tokens <action>, <action>,X and <game>

What kind of regex will work?

I was trying sth like: "<[a-zA-Z]>" but that doesn't take care of X or alpha.

2
  • 1
    Should it match alpha or not? Commented Mar 7, 2013 at 5:59
  • yes it should also include alpha. Commented Mar 7, 2013 at 6:14

4 Answers 4

5

You can try something like this:

String str="<actions>::=<action><action>|X|<game>|alpha";
str=str.split("=")[1];
Pattern pattern = Pattern.compile("<.*?>|\\|.*?\\|");
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
    System.out.println(matcher.group());
}
Sign up to request clarification or add additional context in comments.

1 Comment

this includes X with |X|. regex should ignore |
1

You should have something like this:

String input = "<actions>::=<action><action>|X|<game>|alpha";
Matcher matcher = Pattern.compile("(<[^>]+>)(<[^>]+>)\\|([^|]+)\\|(<[^|]+>)").matcher(input);
while (matcher.find()) {
     System.out.println(matcher.group().replaceAll("\\|", ""));
}

You didn't specefied if you want to return alpha or not, in this case, it doesn't return it.

You can return alpha by adding |\\w* to the end of the regex I wrote.

This will return:

<action><action>X<game>

2 Comments

pattern should not include the "|". this spits out:token:<action> token:<action> token:<action> token:|X| token:<game>
Can you also tell me how to tokenize this:<actions>::=<action><action><action>action. Here there are no "|" and you need to get tokens <action>,<action>,<action> and action? Thanks.
0

From the original pattern it is not clear if you mean that literally there are <> in the pattern or not, i'll go with that assumption.

String pattern="<actions>::=<(.*?)><(.+?)>\|(.+)\|<(.*?)\|alpha";

For the java code you can use Pattern and Matcher: here is the basic idea:

   Pattern p = Pattern.compile(pattern, Pattern.DOTALL|Pattern.MULTILINE);
   Matcher m = p.matcher(text);
   m.find();
   for (int g = 1; g <= m.groupCount(); g++) {
      // use your four groups here..
   }

1 Comment

wait, why is alpha hardcoded here. Yes it should include "<" and ">" and also words that do not contain these "<" and ">" endings. In the example above, tokens should be <action>, <action>,X,<game>,alpha.
0

You can use following Java regex:

Pattern pattern = Pattern.compile
       ("::=(<[^>]+>)(<[^>]+>)\\|([^|]+)\\|(<[^>]+>)\\|(\\w+)$");

1 Comment

@Dev: Or see the Java code with above regex running here: ideone.com/8b7DP0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.