0

My aim is to read excel data and then classify each first name as first name, second name as second name and domain as domain variables respectively.

2 Answers 2

1

You can iterate over rows with pandas, update data and then save it to excel with pandas again:

import pandas as pd

df = pd.read_excel('input.xlsx', index_col=None)

output = {'0': [], '1': [], '2': [], '3': [], '4': []}
for index, row in df.iterrows():
    output['0'].append(f"{row['First']}@{row['Domain']}")
    output['1'].append(f"{row['Second']}@{row['Domain']}")
    output['2'].append(f"{row['First']}{row['Second']}@{row['Domain']}")
    output['3'].append(f"{row['First']}.{row['Second']}@{row['Domain']}")
    output['4'].append(f"{row['First'][0]}{row['Second']}@{row['Domain']}")

df = pd.DataFrame(output, columns=list(output.keys()))
df.to_excel('output.xlsx')

Output:

enter image description here

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks! but isn't this going to be very inefficient if there are 10,000+ rows? Wouldn't I have to initialize 10k arrays. Is there an faster way for that?
sorry forgot to tag you
sorry, have no idea about faster way of doing it. probably use c++
0

I understand you want something like that :

df = pandas.read_excel("input.xlsx")

def generate(data):
    first,last,domain = data
    return [ fl+'@'+domain for fl in \
        [first,last,first+last,first+'.'+last,first[0]+last]]

df.apply(generate,'columns',result_type='expand').to_excel("output.xlsx")  

the good function to do that is Dataframe.apply. the parameter of generate must be a sequence corresponding to a row.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.