Creating a Pandas DataFrame from a DataFrame using the values in a numpy array to access the data frame index

Question

I have a large data set of 7000 rows with 40 features. I want to create two new data frames with rows from the original. I want to select which rows go into which dataframe using the values from a 1D numpy array, then compare the values in the array against the index of the original dataframe and if they match, I want to take the entire row of the original dataframe and add it to the new dataframe.

#reading in my cleaned customer data and creating the original dataframe.
customer_data = pd.read_excel('Clean Customer Data.xlsx', index_col = 0)
#this is the 1D array that has a single element that corresponds to the index number of customer_data
group_list = np.array([2045,323,41,...,n])
# creating the arrays with a slice from group_list with the values of the row indexes for the groups
group_1 = np.array(group_list[:1972])
group_2 = np.array(group_list[1972:])
for X in range(len(group_list):
    i = 0
    #this is where I get stuck
    if group_1[i] == **the index of the original dataframe**
        group1_df = pd.append(customer_data)
    else:
        group2_df = pd.append(customer_data)
    i = i+1

Obviously, I have some serious syntax and possibly some serious logic issues with what I'm doing, but I've been beating my head against this wall for a week now, and my brain is mush.

What I expect to happen is the row in the original data frame index of 2045 would go into group1_df.

Ultimately, I'm looking to create two data frames (group1_df and group2_df) that have the same features as the original dataset, the first one having 1,972 records and the second having 5,028.

The dataset looks something like this:

Welcome to StackOverflow. Your question is decribed well, only things that's missing is some example data (5-10 rows) and what your expected output looks like based on that example data. — Erfan
– Erfan, Commented Jul 13, 2019 at 19:49

Parfait · Accepted Answer · 2019-07-13 19:56:26Z

1

Consider DataFrame.reindex to align each group values with indices of customer_data.

customer_data = pd.read_excel('Clean Customer Data.xlsx', index_col = 0)

group_list = np.array([2045,323,41,...,n])

group1_df = customer_data.reindex(group_list[:1972], axis = 'index')
group2_df = customer_data.reindex(group_list[1972:], axis = 'index')

answered Jul 13, 2019 at 19:56

Parfait

108k19 gold badges103 silver badges138 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

run-out · Accepted Answer · 2019-07-13 19:58:38Z

0

If your numpy array is a, and your dataframe is df,

group1_df = df.loc[df.index.isin(a[:1972]), :]
group2_df = df.loc[df.index.isin(a[1972:]), :]

answered Jul 13, 2019 at 19:58

run-out

3,2041 gold badge13 silver badges27 bronze badges

Collectives™ on Stack Overflow

Creating a Pandas DataFrame from a DataFrame using the values in a numpy array to access the data frame index

2 Answers 2

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Related