0

I have a pandas column like this:

LOD-NY-EP-ADM
LOD-NY-EC-RUL
LOD-NY-EC-WFL
LOD-NY-LSM-SER
LOD-NY-PM-MOB
LOD-NY-PM-MOB
LOD-NY-RMK
LOD-NY-EC-TIM

I want the output in new column as

EP
EC
EC
LSM
PM
PM
RMK
EC

I tried this:

pattern=df.column[0:10].str.extract(r"\w*-NY-(.*?)-\w*",expand=False)

While it works for everything but it fails to get RMK out and gives NaN since there is nothing after that and it looks for -\w zero or more times. But then that should work if there is nothing after RMK.

Any idea whats going wrong?

We can just use a array of these and use regular expression if pandas syntax is not familiar.

1

2 Answers 2

1

Could you just use regular python? Let df be your dataframe, and row be the name of your row.

series = df.row
new_list =  [i.split('-')[2] for i in series]
new_series = pd.Series(new_list)
Sign up to request clarification or add additional context in comments.

1 Comment

can we try with regular expression? I can also use slice method and count the index there. But wanted to check if regular expression makes sense.
1
pattern=df.column[0:10].str.extract(r"\w*-NY-(\w+)",expand=False)

See https://regex101.com/r/3uDpam/3

Your regex meant matching strings must have 3 - characters. I changed it so last -XX could occur 0 or 1 times.

UPDATE: Changed so 2nd group is non-capturing (added ?:)

UPDATE: Thanks to Casimir, removed useless group at end of pattern

6 Comments

something is weird. It works if i just try this regex on a string. But when I am trying on pandas column as above, it is giving even the hyphen along with the extracted string. And still fails for RMK. But that RMK thing works fine if I just use it on a string
Maybe Pandas is using the last matching group. Try this \w*-NY-(\w+)(?:-\w+)? (so the 2nd group is non-capturing)
Ok i got the issue. It is also adding a column for third group (extracting the last letters also). I dont want that. I only want the first group to be extracted. How do we do group(0) in pandas extract?
is there any info on this like how do u know by adding ?: makes it work?
Since it is optional, writing (?:-\w+)? at the end of a pattern is useless. (writing something optional at the end of a pattern is always useless, except if you have to capture something inside this optional part).
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.