Questions tagged [pandas]
pandas is a python library for Panel Data manipulation and analysis, e.g. multidimensional time series and cross-sectional data sets commonly found in statistics, experimental science results, econometrics, or finance.
1,340 questions
3
votes
0
answers
249
views
pandas string regex returns same data
I have data in pandas as below:
123-543-2345
876|678|3469
304-762-2467
Trying to change all to this format: 123-543-2345
I ...
5
votes
1
answer
104
views
Regarding proper approach for cleaning data for correlation using scipy.stats
I have a dataframe with two columns:
'HAD_DISEASE' (which stands for if the subject has had said disease) and it has either 1 or 2 as a value, 1 stands for yes and 2 for no.
'VNR', also an integer (...
3
votes
3
answers
265
views
Target imbalance observed but still model predicts correctly
I am currently working on the dataset where I am supposed to work on the prediction of the rides that might be cancelled. If it is predicted that it will be cancelled(because of drivers), then the ...
4
votes
1
answer
77
views
How to build model with smoothness via various data point
I am trying to model the arch of a basketball free throw projectory. Usually per person, this dataset has 6 points each where it is the height of the basketball via various seconds after the player ...
4
votes
2
answers
145
views
Quants : Beta calculation using pandas
Editing to add one key information ( df and dailyRet ), which I noticed how imp it is... after solving this issue.
...
4
votes
1
answer
102
views
NamedAgg: How to interpret pandas documentation notation/conventions?
I've been using pandas.NamedAgg all over my Python script, but I'm still a newbie to both Python and pandas.
Today, I went to the documentation to see if I can streamline my code by leaving out the <...
7
votes
2
answers
159
views
Loan prediction model relying almost entirely on Credit_History and ignoring other features
I'm building a machine learning model to predict loan approval rate. My dataset includes features like:
Credit_History
...
4
votes
0
answers
41
views
Trying to train ML model to do regression for US Department of Transportation Kaggle Flights Dataset with 5 million records and 7 features
For a college project for my data science course I am trying to fit a model based on the U.S. DOT's 2015 Kaggle Flight Cancellations dataset, but am not having great luck with model performance (MSE ...
2
votes
0
answers
71
views
Pandas Merge Produces Different Model Results Despite Identical Data - Why?
After merging DataFrames, my model gives worse performance even when using the same original features.
Minimal Example:
...
1
vote
0
answers
39
views
Q-values output is NaN in DQN model - input state is normalized and padded
I'm training a Deep Q-Network (DQN) to trade crypto using historical data. My model keeps outputting NaN values for the Q-values during prediction. I'm using a custom function getState2() to generate ...
2
votes
1
answer
65
views
Preprocessing multivalue attributes in a dataframe, similar to Nominal
Description:
Input is a CSV file
CSV file contains columns of different data types: Ordinal Values, Nominal Values, Numerical Values and Multi Value
For the multivalue columns. Minimum is 1, ...
7
votes
1
answer
97
views
Difference between transform('min) vs min() in pandas
I am currently working on a dataset that has two columns: customerID and date.
I want to find the minimum date for each customerID.
Initially, I used the following code:
...
6
votes
1
answer
207
views
Fuzzy Matching Names Between Two Excel Files to Fill in Amounts in Python
As part of my internship, I am working on a project where I need to process two Excel files:
File 1 contains names and numbers.
File 2 contains names and an empty column for amounts.
The goal is to ...
1
vote
1
answer
122
views
Separating data that overlaps between rows in a csv file using Pandas library
So, I downloaded this Ecommerce dataset from kaggle here:
https://www.kaggle.com/datasets/kolawale/focusing-on-mobile-app-or-website
After converting it to a csv file, there seems to be an issue. The ...
2
votes
2
answers
87
views
High accuracy on validation set, very low accuracy on test set!
I'm running a model to do binary classification, 75% of the data is FALSE and 25% of the data is TRUE. I get 100% Training Accuracy, 96.5% validation accuracy, but only 40% accuracy on the test set. ...