Revisions to Simplified regular expression engine

deleted 1 characters in body

Source Link

edited Apr 14, 2012 at 0:23

cmh

495
3
9

That being said, this solution seems to me to be going in the correct approach (wrt time complexity)right direction, and for this restricted language we can stop short of actually building the DFA equivalent to the expression. (Which is expensive in space to create, but will be cheaper each time it is run against a candidate string).

One possible improvement would be to cacheWe acheive this by caching the result of each regex-substring for future use (the dynamic programming approach). TheThis prevents the problematic recursive explosion in the original problem and will guarantee an O(nm) runtime, but could cost up to O(nm) space for each regex candidate pair.

deleted 1 characters in body

Source Link

edited Apr 14, 2012 at 0:18

cmh

495
3
9

There are a couple of smalls optimisations that you can apply to your current solution to avoid very pessimistic cases (e.g. ******). Chains of sequential * can be compressed into one *. Similar special cases for more complex regex could be derived.

One possible solutionimprovement would be a dynamic programming approach whereto cache the result of each regex, substring attempt is cached-substring for future use (the dynamic programming approach). The prevents the problematic recursive explosion in the original problem and will guarantee an O(nm) runtime, but could cost up to O(nm) space for each regex candidate pair.

There are a couple of smalls optimisations that you can apply to your current solution to avoid very pessimistic cases (e.g. ******). Chains of sequential * can be compressed into one *. Similar special cases for more complex regex could be derived.

One possible solution would be a dynamic programming approach where the result of each regex, substring attempt is cached for future use. The prevents the problematic recursive explosion in the original problem and will guarantee an O(nm) runtime, but could cost up to O(nm) space for each regex candidate pair.

One possible improvement would be to cache the result of each regex-substring for future use (the dynamic programming approach). The prevents the problematic recursive explosion in the original problem and will guarantee an O(nm) runtime, but could cost up to O(nm) space for each regex candidate pair.

Source Link

answered Apr 14, 2012 at 0:13

cmh

495
3
9

In answer to 1) - No. Consider the regex *********a and the string b. Your code will have to consider all 9! (362880) cases before returning false.

That being said, this seems to me to be the correct approach (wrt time complexity) short of actually building the DFA equivalent to the expression. (Which is expensive in space to create, but will be cheaper each time it is run against a candidate string).

There are a couple of smalls optimisations that you can apply to your current solution to avoid very pessimistic cases (e.g. ******). Chains of sequential * can be compressed into one *. Similar special cases for more complex regex could be derived.

One possible solution would be a dynamic programming approach where the result of each regex, substring attempt is cached for future use. The prevents the problematic recursive explosion in the original problem and will guarantee an O(nm) runtime, but could cost up to O(nm) space for each regex candidate pair.

static Map<Pair<String, String>, boolean> result_cache;

//Assume we have a pair, as in:
//http://stackoverflow.com/a/677248/1331457
public static boolean match(String regx, String candidate)
{
    Pair<String, String> key = Pair(regx, candidate);
    if result_cache.containsKey(key){
        return true;
    }
    boolean result = false;
    // Empty regx will always return false
    if (regx.isEmpty())
    {
        result = false;
    }
    else
    {
        if (regx.charAt(0) == '*')
        {
            // Last * matches everything
            if (regx.length() == 1)
            {
                result = true;
            }
            else
            {
                result = matchStar(regx.substring(1), candidate);
            }
        }
        // Return false if text is empty but pattern is not *
        else if (candidate.isEmpty())
        {
            result = false;
        }
        else if (regx.charAt(0) == '.' || regx.charAt(0) == candidate.charAt(0))
        {
            // If the last regx matches the last text
            if (regx.length() == 1 && candidate.length() == 1)
            {
                result = true;
            }
            // If hasn't reached the end, try to match the rest strings
            else
            {
                result = match(regx.substring(1), candidate.substring(1));
            }
        }
        else
        {
            result = false; 
        }
    }
    result_cache.put(key, result);
    return result;
}

//Similarly for other methods.

Stack Exchange Network

Return to Answer