Skip to main content

All Questions

1 vote
2 answers
404 views

regex returning true with incorrect Japanese characters

I am checking with a form, whether a postal code fitting the Japanese format has been input or not. I realized today that some information got through, even though it should never have "passed" the ...
FlyingPikachu's user avatar
1 vote
2 answers
195 views

why python2's re module can't identify the u'®' character

I got a string and I want to re.sub this string in Python2, so I tried the following statement, it worked >>> import re >>> re.sub(u"[™®]", "", u"a™b®c") 'abc' But when I tried the ...
calvin's user avatar
  • 3,015
11 votes
4 answers
1k views

Treat an emoji as one character in a regex [duplicate]

Here's a small example: reg = ur"((?P<initial>[+\-👍])(?P<rest>.+?))$" (In both cases the file has -*- coding: utf-8 -*-) In Python 2: re.match(reg, u"👍hello").groupdict() # => {u'...
naiveai's user avatar
  • 883
15 votes
3 answers
5k views

Regex to Match Horizontal White Spaces

I need a regex in Python2 to match only horizontal white spaces not newlines. \s matches all whitespaces including newlines. >>> re.sub(r"\s", "", "line 1.\nline 2\n&...
Memduh's user avatar
  • 866
3 votes
1 answer
972 views

find all the matches for unicodes in a string in python

import re b="united thats weak. See ya 👋" print b.decode('utf-8') #output: u'united thats weak. See ya \U0001f44b' print re.findall(r'[\U0001f600-\U0001f650]',b.decode('utf-8'),flags=re....
Sandeep Chitta's user avatar
1 vote
2 answers
261 views

Python regex is having problems finding a special unicode character

I am currently parsing through some old exams to determine the frequency of the questions (because many questions would resurface at this years exam). I am using pyperclip to get the input for the re....
gloriousCatnip's user avatar
4 votes
2 answers
407 views

How to build a regular vocabulary of emoticons in python?

I have a list of codes of emoticons inside a file UTF32.red.codes in plain text. The plain content of the file is \U0001F600 \U0001F601 \U0001F602 \U0001F603 \U0001F604 \U0001F605 \U0001F606 \...
emanuele's user avatar
  • 2,589
0 votes
1 answer
76 views

Python unicode regex issue

Why does this work: >>> ss u'\U0001f300' >>> r = re.compile(u"[u'\U0001F300-\U0001F5FF']+", re.UNICODE) >>> r.search(ss) # this works <_sre.SRE_Match object at ...
Ankur Agarwal's user avatar
1 vote
4 answers
293 views

Unicode search not working

Consider this. # -*- coding: utf-8 -*- data = "cdbsb \xe2\x80\xa6 abc" print data #prints cdbsb … abc ^ print re.findall(ur"[\u2026]", data ) Why can't re find this unicode character ? ...
vks's user avatar
  • 68k
0 votes
1 answer
847 views

Parsing log file using python and storing its valid value in database using sqlite

Hi i am a newbie to python.i am creating a small program which parses the load log file of a particular website and stores the valid data in particular feild of a database.but some of the feild has ...
Binit Singh's user avatar