Questions tagged [data-wrangling]
The data-wrangling tag has no summary.
55 questions
-1
votes
1
answer
26
views
Is negative delay (early arrival ) an actual delay
I'm creating a Tableau dashboard using the dataset '2015 Flight Delays and Cancellations' and trying to answer a question related to delays. I noticed that my delay columns have values ranging from -...
0
votes
1
answer
60
views
Learning material for R
I am looking for learning material for R (including base R & tidyverse approaches (with a focus on readr, dplyr & tidyr)) and data visualisation using ggplot2 library for commercial teaching ...
0
votes
1
answer
49
views
Each person gets their top, or second choice of activity over a period of 6 slots
We are running a camp for 130 children, and on 3 days they can pick different activities to do. One activities for slot 1 (45min), the other for slot 2 (another 45min), enabling them to do 6 ...
0
votes
1
answer
41
views
Tool/package for merging tables with inconsistent column names and categorical variable encoding?
I have 10s of spreadsheets with facility-level rows. Each spreadsheet corresponds to a month. They each contain approximately the same variables (10s of them), but often with different column naming ...
0
votes
1
answer
2k
views
Export pandas dataframe to dictionary as tuple keys and value [closed]
I have a pandas dataframe df that looks like this:
col1 col2 col3
A X 1
B Y 2
C Z 3
...
2
votes
0
answers
87
views
Dealing with class imbalance in test set
I am building a machine that tries to predict which ISP customers will complain due to issues with the network. I am having some difficulties.
The idea is to use network metrics of ~300K customers as ...
3
votes
3
answers
1k
views
Giving each person in order their top choice which is still available in Google Sheets
The problem I want to solve is my residential building's garage choices.
There will be a random distribution of parking spaces.
I thought that it would be better if each person writes down which ...
1
vote
1
answer
531
views
Data Wrangling and data cleaning
I found some information about Data Wrangling and they say different things.
In this one, they say data cleaning is a subcategory of data wrangling link
In this PDF, data wrangling is EDA and model ...
0
votes
1
answer
79
views
Advantages to combining similarly-named columns for supervised ML?
Is there any benefit to combining similarly named columns either for an improvement in accuracy or for speeding up training/prediction in case of logistic regression, random forest or neural network ...
0
votes
1
answer
5k
views
Filter for top 10 highest values of group in dataset (in R)
Context: I am trying to find the top 10 highest values of count in my data frame conditional on them falling within the years 1970-1979. My data frame looks as below:
...
1
vote
1
answer
394
views
Data cleaning in Pandas, where the csv file has all data of each row in 1 field [closed]
I have really messy data that looks like this:
As you can see all the data in each row is contained in 1 column separated by a semi colon.
How do I arrange this data so that they are spread out over ...
0
votes
1
answer
5k
views
Compare multiple values from a DataFrame against a single row from another
I'm trying to compare address values for inaccuracies, for example, given multiple records like:
Reference
Apartment
Address
PostCode
AS097
NaN
00 Name Road
BH1 4HB
AS097
Flat 1 Building Name
00 Name ...
1
vote
0
answers
35
views
What is a good way to handle nominal spatial data with a changing number of categories to use in prediction model?
For a project I'm going to be working with spatial data with a nominal attribute (land use). Every year the number of categories for this attribute changes because categories split or merge. I do have ...
-1
votes
1
answer
72
views
Data wrangling dates
I have a feature with data creation dates. I have normalized them all to the same format and split them to 'day', 'month' and 'year' columns. But now I have a question. Should I apply normalization or ...
-1
votes
1
answer
56
views
Similar values cleaning [closed]
can someone know algorithm how to identify account names that are similar enough to be potentially merged and imported as one
Duplicates with different values:
Geico val1 NaN =====>>...