2

I need to write a strict regular expression to replace certain values in my pandas dataframe. This is an issue that was raised after solving the question that I posted here.

The issue is that the .replace(idsToReplace, regex=True) is not strict. Therefore if the iDsToReplace are:

NY : New York
NYC : New York City

and the comment in which we are replacing the ID is:

My cat from NYC is large.

The resulting response is:

My cat from New York is large.

Is there a pythonic way within the pandas replace function to make the regular expression stricter to match with NYC and not NY?

4
  • there's no concept of strictness in regex, it just matches what you tell it to. You may be looking for \b word boundaries.
    – Aaron
    Commented Sep 21, 2017 at 13:53
  • Sorry, do you need replace My cat from NYC is large. to My cat from New York City is large. if dict is d = {'NYC': 'New York City', 'NY' : 'New York'} ?
    – jezrael
    Commented Sep 21, 2017 at 14:27
  • The issue was that the Word NYC was being captured by NY, instead of NYC. Thus the correct answer is: 'My cat from New York City is large'. I am doing some tests, but so far it seems as if your below answer is working with the bounds
    – owwoow14
    Commented Sep 21, 2017 at 14:32
  • @owwoow14 - Super, glad can help!
    – jezrael
    Commented Sep 22, 2017 at 5:17

1 Answer 1

2

Add \b for word boundaries to each key of dict:

d = {'UK': 'United Kingdom', 'LA': 'Los Angeles', 'NYC': 'New York City', 'NY' : 'New York'}

data = {'Categories': ['animal','plant','object'],
    'Type': ['tree','dog','rock'],
        'Comment': ['The NYC tree is very big', 'NY The cat from the UK is small',
                    'The rock was found in LA.']
}

d = {r'\b' + k + r'\b':v for k, v in d.items()}

df = pd.DataFrame(data)

df['commentTest'] = df['Comment'].replace(d, regex=True)
print (df)
  Categories                          Comment  Type  \
0     animal         The NYC tree is very big  tree   
1      plant  NY The cat from the UK is small   dog   
2     object        The rock was found in LA.  rock   

                                         commentTest  
0                 The New York City tree is very big  
1  New York The cat from the United Kingdom is small  
2                 The rock was found in Los Angeles.  

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.