2

The following is the error message:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/re.py", line 194, in compile
    return _compile(pattern, flags)
  File "/usr/lib/python2.7/re.py", line 251, in _compile
    raise error, v # invalid expression
sre_constants.error: bad character range

This is my object:

>>> re101121=re.compile("""(?i)激[ _]{0,}活[ _]{0,}邮[ _]{0,}箱|(click|clicking)[ _]{1,}[here ]{0,1}to[ _]{1,}verify|stop[ _]{1,}mail[ _]{1,}.{1,16}[ _]{1,}here|(click|clicking|view|update)([ _-]{1,}|\\xc2\\xa0)(on|here|Validate)[^a-z0-9]{1}|(點|点)[ _]{0,}(擊|击)[ _]{0,}(這|这|以)[ _]{0,}(裡|里|下)|DHL[ _]{1,}international|DHL[ _]{1,}Customer[ _]{1,}Service|Online[ _]{1,}Banking|更[ _]{0,}新[ _]{0,}您[ _]{0,}的[ _]{0,}(帐|账)[ _]{0,}户|CONFIRM[ _]{1,}ACCOUNT[ _]{1,}NOW|avoid[ _]{1,}Account[ _]{1,}malfunction|confirm[ _]{1,}this[ _]{1,}request|verify your account IP|Continue to Account security|继[\\s-_]*续[\\s-_]*使[\\s-_]*用|崩[\\s-_]*溃[\\s-_]*信[\\s-_]*息|shipment[\\s]+confirmation|will be shutdown in [0-9]{0,} (hours|days)|DHL Account|保[ ]{0,}留[ ]{0,}密[ ]{0,}码|(Password|password|PASSWORD).*(expired|expiring)|login.*email.*password.*confirm|[0-9]{0,} messages were quarantined|由于.*错误(的)?(送货)?信息|confirm.*(same)? password|keep.*account secure|settings below|loss.*(email|messages)|simply login|quick verification now""")
6
  • Welcome to SO! This code works for me in Python 2.7, which it appears you're using from your error (I took the liberty of tagging it to avoid confusion with 3). Can you show a minimal reproducible example? Thanks. As an aside, {0,} could be simply * and always use raw strings with regex, like r"... stuff ...".
    – ggorlen
    Commented Apr 22, 2021 at 3:01
  • when I delete some rules so that it didn't look tha long, I found that the error disappeared. I didn't understand whether it was because the rules were too long or because there were some illegal sentences in the rules
    – XiaoTian
    Commented Apr 22, 2021 at 3:04
  • Probably the latter. Please show your full failing example or there's not much I can offer here. If the string is too large, you can binary search it to find the minimal failing pattern (or, better yet, please do that anyway even if it's not that large, so the problem is isolated). BTW, I used # -*- coding: utf-8 -*- when I tried to reproduce this.
    – ggorlen
    Commented Apr 22, 2021 at 3:05
  • OK, thanks for your comment, and here is my full example:
    – XiaoTian
    Commented Apr 22, 2021 at 3:09
  • Sorry, the example is too large to be added to comment area, I have put it in the question
    – XiaoTian
    Commented Apr 22, 2021 at 3:13

1 Answer 1

0

After minimization, your error boils down to re.compile("""[\\s-_]"""). This is a bad character range indeed; you probably meant the dash to be literal re.compile(r"[\s\-_]") (always use raw strings for regex r"..."). Moving the dash to the end of the bracket group works too: r"[\s_-]".

In the future, try to binary search to find the minimal failing input: remove the right half of the regex. If it still fails, the problem must have been in the left half. Remove the right half of the remaining substring and repeat until you're down to a minimal failing case. This technique doesn't always work when the problem spans both halves, but it can't hurt to try.

As mentioned in the comments, it's pretty odd to have such a massive regex as this, but I'll assume you know what you're doing.

As another aside, there are some antipatterns in this regex (pardon the pun) like {0,} which can be simplified to *.

1
  • 1
    Thanks for your help. When I delete those staements, he executed successfully
    – XiaoTian
    Commented Apr 22, 2021 at 3:33

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.