2

I have a large data set of 7000 rows with 40 features. I want to create two new data frames with rows from the original. I want to select which rows go into which dataframe using the values from a 1D numpy array, then compare the values in the array against the index of the original dataframe and if they match, I want to take the entire row of the original dataframe and add it to the new dataframe.

#reading in my cleaned customer data and creating the original dataframe.
customer_data = pd.read_excel('Clean Customer Data.xlsx', index_col = 0)
#this is the 1D array that has a single element that corresponds to the index number of customer_data
group_list = np.array([2045,323,41,...,n])
# creating the arrays with a slice from group_list with the values of the row indexes for the groups
group_1 = np.array(group_list[:1972])
group_2 = np.array(group_list[1972:])
for X in range(len(group_list):
    i = 0
    #this is where I get stuck
    if group_1[i] == **the index of the original dataframe**
        group1_df = pd.append(customer_data)
    else:
        group2_df = pd.append(customer_data)
    i = i+1

Obviously, I have some serious syntax and possibly some serious logic issues with what I'm doing, but I've been beating my head against this wall for a week now, and my brain is mush.

What I expect to happen is the row in the original data frame index of 2045 would go into group1_df.

Ultimately, I'm looking to create two data frames (group1_df and group2_df) that have the same features as the original dataset, the first one having 1,972 records and the second having 5,028.

The dataset looks something like this: Copy of the data set I'm working with

1
  • Welcome to StackOverflow. Your question is decribed well, only things that's missing is some example data (5-10 rows) and what your expected output looks like based on that example data. Commented Jul 13, 2019 at 19:49

2 Answers 2

1

Consider DataFrame.reindex to align each group values with indices of customer_data.

customer_data = pd.read_excel('Clean Customer Data.xlsx', index_col = 0)

group_list = np.array([2045,323,41,...,n])

group1_df = customer_data.reindex(group_list[:1972], axis = 'index')
group2_df = customer_data.reindex(group_list[1972:], axis = 'index')
Sign up to request clarification or add additional context in comments.

Comments

0

If your numpy array is a, and your dataframe is df,

group1_df = df.loc[df.index.isin(a[:1972]), :]
group2_df = df.loc[df.index.isin(a[1972:]), :]

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.