All Questions
53 questions
1
vote
1
answer
63
views
How to fill a pandas column by calculation involving values from two dataframes
I got two dataframes, one (baseDataframe) keeps some base data for every individual n
n date1 age
0 1135 2021-05-08 <NA>
1 1339 2021-08-02 <NA>
2 1456 2021-08-07 <...
0
votes
1
answer
82
views
How to create a loop that will output percentage in a new column
I have a dataframe that looks like this:
COL1
COL2
COL3
COL4
COL5
COL6
S1
A
Apples
Aisle 1
N
Section 2
S1
A
Apples
Aisle 2
Y
Section 2
S1
A
Apples
Aisle 1
Y
Section 2
S2
A
Apples
Aisle 1
N
Section 2
...
1
vote
1
answer
71
views
Efficiently bin data in python
I am trying to summarise a database of activity times by user id to give an array of event count per timeslot to show a summary of weekly behaviour. I have approx. 35k ids over 3M events covering 2y ...
2
votes
2
answers
67
views
Putting NaN when a day in a DataFrame doesn't return a value
I want to get the last in the "15:30:00.0" row for every day in the data frame, but as you can see on the 16th we don't get any rows for anything from 13:00:00.0 to 15:30:00.0
My DataFrame:
...
0
votes
1
answer
504
views
How to groupby, iterate over rows and count in pandas?
I have a dataframe with columns city, date, source and count of each source.
I need to group by city, then iterate over rows of date column with the following condition: check each row and if the ...
1
vote
4
answers
72
views
Python group by row and topic and create new columns with binary value
I have a large csv where each row is a separate school course, each of which is tagged with one or more topics, like so:
school name
department
course name
topics
A
A1
X
1; 2
A
A1
Y
1; 3
B
B1
Z
1; 2; ...
0
votes
2
answers
257
views
Filter, iterate, , cumsum, add to dataframe
I have the following dataframe:
a,b,c,d
x,3,4,8
x,4,4,7
x,8,8,8
y,6,6,2
y,5,1,3
y,6,2,1
y,6,8,6
z,4,6,3
z,2,8,6
z,9,9,3
z,2,8,6
z,9,9,3
I’m looking to:
Filter for each value via (loop) in column a (...
1
vote
1
answer
7k
views
How to iterate through each row of a groupby object created by groupby()?
I'm working with a large dataset that includes all police stops in my city since 2014. The dataset has millions of rows, but sometimes there are multiple rows for a single stop (so, if the police ...
3
votes
3
answers
57
views
In Python I need to do an iterative groupby that access the previous "grouped value" to establish the value of the row of the aggregated column
I have the following dataset that you can replicate with this code:
number_order = [2,2,3,3,5,5,5,6]
number_fakecouriers = [1,2,1,2,1,2,3,3]
dictio = {"number_order":number_order, "...
1
vote
1
answer
188
views
GroupBy transform median with date filter pandas
I have 2 dataframes:
df1:
artist_id
concert_date
region_id
12345
2019-10
22
33322
2018-11
44
df2:
artist_id
date
region_id
popularity
12345
2019-10
22
76
12345
2019-11
44
23
I need to add the median ...
0
votes
1
answer
36
views
Calculate summary statistic by category and filter - efficient code?
I have the two following dataframes.
df1:
code name region
0 AFG Afghanistan Middle East
1 NLD Netherlands Western Europe
2 AUT Austria Western Europe
3 IRQ ...
0
votes
1
answer
39
views
Mumbojumbo .rolling() .max() .groupby() combination in python pandas
I am looking to do a "rolling" .max() .min() of B column "groupedby" date(column A values). However, trick is it should start on every row again so i can not use for example ...
1
vote
0
answers
58
views
How can I iterate through a DataFrame in the quickest way in terms of below example?
I am working on a project where I know store_code, product_code, a bigger product group "pmg" and a percentage per pmg which says how % of products the employees touch but right now I should ...
0
votes
1
answer
287
views
Find Percentage of each class for every ID
I am working with a data frame that has 20 ids and for each Id, there are about 10-15 stores, and each store is assigned a status (Zero, Negative and Positive).
Data:
data =
ID STORE STATUS ...
0
votes
1
answer
220
views
Sum all values row-wise conditionally grouped by id
My end goal is to sum all minutes only from initial to final in column periods. This needs to be grouped by id
I have thousands of id and not all of them have the same amount of min in between initial ...