All Questions
4,412 questions
-1
votes
0
answers
35
views
Getting different results from Groupby for different sized Dataframes
I'm running the same functions on these two dfs that are identical except that they have different lengths (same number of columns and data types). When I run the larger one I get exactly as I would ...
0
votes
1
answer
62
views
Apply different aggregate functions to different columns of a pandas dataframe, and run a pivot/crosstab?
The issue
In SQL it is very easy to apply different aggregate functions to different columns, e.g. :
select item, sum(a) as [sum of a], avg(b) as [avg of b], min(c) as [min of c]
In Python, not so ...
2
votes
2
answers
116
views
Conditional running total based on date field in Pandas
I have a dataframe with below data.
DateTime
Tag
Qty
2025-01-01 13:00
1
270
2025-01-03 13:22
1
32
2025-01-10 12:33
2
44
2025-01-22 10:04
2
120
2025-01-29 09:30
3
182
2025-02-02 15:05
1
216
To be ...
0
votes
0
answers
28
views
dask: looping over groupby groups efficiently
Example DataFrame:
import pandas as pd
import dask.dataframe as dd
data = {
'A': [1, 2, 1, 3, 2, 1],
'B': ['x', 'y', 'x', 'y', 'x', 'y'],
'C': [10, 20, 30, 40, 50, 60]
}
pd_df = pd....
2
votes
2
answers
58
views
Pandas Group by without performing aggregation
I have a pandas dataframe as follows:
Athlete ID
City
No. of Sport Fields
1231
LA
81
4231
NYC
80
2234
NJ
64
1223
SF
75
4531
LA
81
2345
NYC.
80
...
I want to print the City and No. of Sport Fields ...
0
votes
2
answers
40
views
Issue in Pandas Dataframe grouping and getting the difference of a column
I'm struck with a problem, i have date frame as below, it has data for distributor who supply the items for different locations, now i want to calculate, for a particular day, does any item ( example: ...
0
votes
0
answers
27
views
Transpose several columns into one and groupby several columns
I have a dataset which contains two time stamps and several data columns. My aim is to put anything in three columns: two time columns and one data column which results having several rows of ...
2
votes
1
answer
60
views
Subtle mistake in pandas .apply(lambda g: g.shift(1, fill_value=0).cumsum())
I have a dataframe that records the performance of F1-drivers and it looks like
Driver_ID Date Place
1 2025-02-13 1
1 2024-12-31 1
1 2024-11-03 2
1 ...
2
votes
3
answers
75
views
Pandas groupby with tag-style list
I have a dataset with 'tag-like' groupings:
Id tags
0 item1 ['friends','family']
1 item2 ['friends']
2 item3 []
3 item4 ['family','holiday']
So a row can belong to ...
2
votes
1
answer
60
views
How to use vectorized calculations in pandas to find out where a value or category is changing with corrected first row?
With a dataset with millions of records, I have items with various categories and measurements, and I'm trying to figure out how many of the records have changed, in particular when the category or ...
2
votes
2
answers
55
views
Why does summing data grouped by df.iloc[:, 0] also sum up the column names?
I have a DataFrame with a species column and four arbitrary data columns. I want to group it by species and sum up the four data columns for each one. I've tried to do this in two ways: once by ...
-1
votes
1
answer
50
views
Grouping Rows of Data to Generate analytical
I am working with a data set of NHS attendance data (a snippet of the columns and rows are included). The data continues all the way until the final hour of Sunday. I have successfully cleaned the ...
2
votes
2
answers
82
views
How to use numpy.where in a pipe function for pandas dataframe groupby?
Here is a script to simulate the issue I am facing:
import pandas as pd
import numpy as np
data = {
'a':[1,2,1,1,2,1,1],
'b':[10,40,20,10,40,10,20],
'c':[0.3, 0.2, 0.6, 0.4, 0....
-1
votes
1
answer
48
views
Iterate over multiple dataframe and grouped them based on mean value
I have a list of dataframes with 81 different dataframes.
I would like to calculate the average value of the same column in each dataframes. Based on the mean values I would like to compare and ...
2
votes
1
answer
158
views
Resampling By Group in Polars
I'm trying to build a Monte Carlo simulator for my data in Polars. I am attempting to group by a column, resample the groups and then, unpack the aggregation lists back in their original sequence. I'...