All Questions
1,660 questions
2
votes
2
answers
62
views
Pandas: Fill in missing values with an empty numpy array
I have a Pandas Dataframe that I derive from a process like this:
df1 = pd.DataFrame({'c1':['A','B','C','D','E'],'c2':[1,2,3,4,5]})
df2 = pd.DataFrame({'c1':['A','B','C'],'c2':[1,2,3],'c3': [np.array((...
0
votes
1
answer
62
views
Writing complex Pandas DataFrame to HDF5 using h5py
I have a Pandas DataFrame with mixed scalar and array-like data of different raw types (int, float, str). The DataFrame's types look like this:
'col1', dtype('float64')
'col2', dtype('O') <-- array,...
1
vote
2
answers
92
views
Pandas indexing
Can someone explain what is meant by
Both loc and iloc [in Pandas] are row-first, column-second. This is the opposite of what we do in native Python, which is column-first, row-second.
Because I ...
0
votes
0
answers
62
views
Different values in group by columns before and after group by in pandas dataframe [duplicate]
I havw following code:
sum_columns = ['p', 'q', 'r', 'Ax', 'Ay', 'Az']
avg_columns = ['Bx', 'By', 'Bz', 'G2 C03']
agg_map = {col: 'sum' for col in sum_columns}
agg_map.update({col: 'mean' for col in ...
0
votes
2
answers
47
views
Create Pivot table and add additional columns from another dataframe
Given two identically formatted dataframes:
df1
Counterparty Product Deal Date Value
foo bar Buy 01/01/24 10.00
foo bar Buy 01/01/24 10.00
foo ...
-1
votes
1
answer
212
views
How to filter array in Pandas dataframe? [duplicate]
Is it possible to filter array without creating new columns?
For example i have this dataframe:
userID goalsID
25 [1,2,4,5]
188 [3,6]
79 [1,9]
How to filter array by digit &...
0
votes
0
answers
19
views
Convert Array to Pandas Dataframe Columns [duplicate]
I have a column "ym:s:goalsID" in my dataframe, how to convert this columnt to 4 separate columns?
Screenshot of Dataframe
Now it is:
ym:s:goalsID
[26783434,282511740,26783434,282511740]
I ...
1
vote
0
answers
94
views
Python Iterating over Numpy Tile and for-loops
Goal: Here is a sample of a dataset that has "ID", "PHASENAME", "CDAYS", "MULTI_FACTOR", "DAY_COUNTER", and "DAILY_LABOR_PERCENT". I was ...
1
vote
1
answer
84
views
Convert String to Array[Int] in a Hive column using Spark or Hive
I have sample data as in below string format in Hive table:
+----------------------+
| col1 |
+----------------------+
| 160-80-40 sec|
| 160-80-40 sec|
| 10-10-10-...
0
votes
1
answer
90
views
pandas quantile vs the Excel equivalent calculation difference
There is probably a way to do this the problem is I don't know it. I have data sets that are often between 80 to 120 values long. I am trying to compute the 90% value for each separate data set. I was ...
0
votes
4
answers
109
views
Efficient way to iterate rows in two arrays and then copy array back into a dataframe
I am learning numpy and I have a dataframe of asset prices and thought it might be better to do a calculation in numpy and then put the data back into a dataframe when done. I have a working program ...
0
votes
0
answers
34
views
Python function : return array that was named automatically [duplicate]
My final goal is to recover data from a table, and store it in numpy arrays to work with it later.
I had to automatically name each columns of my dataframe (I've opened my file using pandas) due to my ...
1
vote
1
answer
59
views
How to split an array using its minimum entry
I am trying to split a dataset into two separate ones by finding its minimum point in the first column. I have used idxmin to firstly identify the location of the minimum entry and secondly iloc to ...
1
vote
1
answer
46
views
pandas to_csv function changing 2d array to a single string
I am trying to precalculate sentence embeddings and I want to store it in a csv file, so that I can reuse it later. I create a Pandas dataframe, and I have the embeddings stored correctly as a 2d ...
1
vote
1
answer
77
views
Adding numpy arrays to cells of a pandas DataFrame depends on initialisation
I was trying to add a list of numpy arrays as elements to the pandas DataFrame:
DataFrame
using:
df.loc[df['B']==4,'A'] = [np.array([5, 6, 7, 8]),np.array([2,3])]
Whether or not this is allowed seems ...