6

I have taken from this oracle tutorial on java regex, the following bit:

Intersections

To create a single character class matching only the characters common to all of its nested classes, use &&, as in [0-9&&[345]]. This particular intersection creates a single character class matching only the numbers common to both character classes: 3, 4, and 5.

Enter your regex: [0-9&&[345]] Enter input string to search: 3 I found the text "3" starting at index 0 and ending at index 1.

Why would it be useful? I mean if one wants to pattern only 345 why not only [345] instead of "the intersection"?

Thanks in advance.

2
  • 1
    If you have two groups of numbers and you want to see if a given number is within both ranges, it would useful. Why wouldn't it be useful? Commented Apr 10, 2013 at 15:33
  • 5
    In this trivial case it's not useful. They're just giving a simple example of how intersection works. If you were dynamically generating the regex then this might be useful. Otherwise, I typically find examples of the form [0-9&&[^45]] a more typical use case. Commented Apr 10, 2013 at 15:36

1 Answer 1

4

Let us consider a simple problem: match English consonants in a string. Listing out all consonants (or a list of ranges) would be one way:

[B-DF-HJ-NP-TV-Zb-df-hj-np-tv-z]

Another way is to use look-around:

(?=[A-Za-z])[^AEIOUaeiou]
(?![AEIOUaeiou])[A-Za-z]

Not sure if there is any other way to do this without the use of character class intersection.

Character class intersection solution (Java):

[A-Za-z&&[^AEIOUaeiou]]

For .NET, there is no intersection, but there is character class subtraction:

[A-Za-z-[AEIOUaeiou]]

I don't know the implementation details, but I wouldn't be surprised if character class intersection/subtraction is faster than the use of look-around, which is the cleanest alternative if character class operation is not available.

Another possible usage is when you have a pre-built character class and you want to remove some characters from it. One case that I have come across where class intersection might be applicable would be to match all whitespace characters, except for new line.

Another possible use case as @beerbajay has commented:

I think the built-in character classes are the main use case, e.g. [\p{InGreek}&&\p{Ll}] for lowercase Greek letters.

Sign up to request clarification or add additional context in comments.

4 Comments

I think the built-in character classes are the main use case, e.g. [\p{InGreek}&&\p{Ll}] for lowercase greek letters.
@beerbajay: You are probably right. I have yet to run into that use case myself (Well, whether we runs into some use case or not depends on what we are doing).
This answer has been added to the Stack Overflow Regular Expression FAQ, under "Character Classes".
I can't get [A-Za-z&&[^AEIOUaeiou]] working in RegexBuddy (Java or Perl flavors), or in Debuggex with any flavor. But in Java Arrays.toString(Pattern.compile("[A-Za-z&&[^AEIOUaeiou]]").split("hello")) is returning [, e, , , o] as expected.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.