group and output on partial column value pandas python

Question

I have a sample data set:

import pandas as pd
import re

df = {'READID': [1,2,3  ,4,5    ,6,7    ,8,9],
  'VG': ['LV5-F*01','LV5-F*01'  ,'LV5-A*02','LV5-D*01','LV5-E*01','LV5-C*01','LV5-D*01','LV5-E*01','LV5-F*01'],
  'Pro': [1,1,1,0.33,0.59,1,0.96,1,1]}

df = pd.DataFrame(df)

it looks like this:

df
Out[12]: 
     Pro    READID        VG
0   1.00       1      LV5-F*01
1   1.00       2      LV5-F*01
2   1.00       3      LV5-A*02
3   0.33       4      LV5-D*01
4   0.59       5      LV5-E*01
5   1.00       6      LV5-C*01
6   0.96       7      LV5-D*01
7   1.00       8      LV5-E*01
8   1.00       9      LV5-F*01

i want to groupby column 'VG' but only the part before '*' for each row, and then group by the same values and output them into separate files.

my concept is:

group the dataset 'df' by column 'VG'
for each row of column 'VG' look at only the part before the '*', e.g. 'LV5-F', 'LV5-A', 'LV5-D', etc.
group the dataset once again but this time for the same values from step 2
output each different grouped set to a separate file.

desire output, individual separate files:

'LV5-F.txt':
     Pro    READID        VG
0   1.00       1      LV5-F*01
1   1.00       2      LV5-F*01
8   1.00       9      LV5-F*01


'LV5-A.txt':
     Pro    READID        VG
2   1.00       3      LV5-A*02


'LV5-D.txt':
     Pro    READID        VG
3   0.33       4      LV5-D*01
6   0.96       7      LV5-D*01


'LV5-E.txt':
     Pro    READID        VG
4   0.59       5      LV5-E*01
7   1.00       8      LV5-E*01


'LV5-C.txt':
    Pro    READID        VG
5   1.00       6      LV5-C*01

my attempt:

(df.groupby('VG')
   .apply(lambda x: re.findall('([0-9A-Z-]+)\*',x) )
   .groupby('VG')
   .apply(lambda gp: gp.to_csv('{}.txt'.format(gp.name), sep='\t',   index=False))
 )

but it failed at the '.apply(lambda x: re.findall('([0-9A-Z-]+)*',x)' step and i'm not sure why it doesn't work because when i ran that code by itself without in the context of being a lambda function, it worked fine.

piRSquared · Accepted Answer · 2016-08-03 19:07:23Z

2

You'll have to adjust the function below to_csv to suit your needs. In particular, instead of printing, just provide a file name somehow.

But I'd structure it this way:

def to_csv(df):
    print df.to_csv()

#    extract
#     within
#     parens
#    /------\
# r'^([^\*]+)'
#   ^ \----/
#   |   \__________________________
# match       |          |         |
# beginning  [^this]    \*        '+'
# of string  matches   have to    match
#            not this  escape *   one or more
#
df.groupby(df.VG.str.extract(r'^([^\*]+)', expand=False)).apply(to_csv)

,Pro,READID,VG
2,1.0,3,LV5-A*02

,Pro,READID,VG
2,1.0,3,LV5-A*02

,Pro,READID,VG
5,1.0,6,LV5-C*01

,Pro,READID,VG
3,0.33,4,LV5-D*01
6,0.96,7,LV5-D*01

,Pro,READID,VG
4,0.59,5,LV5-E*01
7,1.0,8,LV5-E*01

,Pro,READID,VG
0,1.0,1,LV5-F*01
1,1.0,2,LV5-F*01
8,1.0,9,LV5-F*01

edited Aug 3, 2016 at 19:07

answered Aug 3, 2016 at 18:47

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Jessica Over a year ago

i got an error: 'typeError: extract() got an unexpected keyword argument 'expand', also why does the output you show contain commas? is there a way to produce the output that i desired?

piRSquared Over a year ago

@Jessica drop that argument. It'll complain if you don't have it in pandas version 0.18.1. Prior to that, it complains that you have it at all.

Jessica Over a year ago

can you explain to me the regex part? r'^([^*]+)' thanks

Jessica · Accepted Answer · 2016-08-03 19:05:43Z

1

I modified my code with help from @piRSquared and it worked :

df.groupby(df.VG.str.extract(r'^([^\*]+)')).apply(lambda gp: gp.to_csv('{}.txt'.format(gp.name), sep='\t', index=False))

answered Aug 3, 2016 at 19:05

Jessica

3,2139 gold badges30 silver badges48 bronze badges

Collectives™ on Stack Overflow

group and output on partial column value pandas python

2 Answers 2

3 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Related