Skip to main content

Questions tagged [data-wrangling]

-1 votes
1 answer
26 views

I'm creating a Tableau dashboard using the dataset '2015 Flight Delays and Cancellations' and trying to answer a question related to delays. I noticed that my delay columns have values ranging from -...
Omar's user avatar
  • 1
0 votes
1 answer
60 views

I am looking for learning material for R (including base R & tidyverse approaches (with a focus on readr, dplyr & tidyr)) and data visualisation using ggplot2 library for commercial teaching ...
Hamideh's user avatar
  • 942
0 votes
1 answer
49 views

We are running a camp for 130 children, and on 3 days they can pick different activities to do. One activities for slot 1 (45min), the other for slot 2 (another 45min), enabling them to do 6 ...
Pascal's user avatar
  • 9
0 votes
1 answer
41 views

I have 10s of spreadsheets with facility-level rows. Each spreadsheet corresponds to a month. They each contain approximately the same variables (10s of them), but often with different column naming ...
katriel's user avatar
  • 111
0 votes
1 answer
2k views

I have a pandas dataframe df that looks like this: col1 col2 col3 A X 1 B Y 2 C Z 3 ...
Scratch's user avatar
2 votes
0 answers
87 views

I am building a machine that tries to predict which ISP customers will complain due to issues with the network. I am having some difficulties. The idea is to use network metrics of ~300K customers as ...
Leon's user avatar
  • 133
3 votes
3 answers
1k views

The problem I want to solve is my residential building's garage choices. There will be a random distribution of parking spaces. I thought that it would be better if each person writes down which ...
Heleno Paiva's user avatar
1 vote
1 answer
531 views

I found some information about Data Wrangling and they say different things. In this one, they say data cleaning is a subcategory of data wrangling link In this PDF, data wrangling is EDA and model ...
Inuraghe's user avatar
  • 501
0 votes
1 answer
79 views

Is there any benefit to combining similarly named columns either for an improvement in accuracy or for speeding up training/prediction in case of logistic regression, random forest or neural network ...
v81's user avatar
  • 3
0 votes
1 answer
5k views

Context: I am trying to find the top 10 highest values of count in my data frame conditional on them falling within the years 1970-1979. My data frame looks as below: ...
n.baes's user avatar
  • 39
1 vote
1 answer
394 views

I have really messy data that looks like this: As you can see all the data in each row is contained in 1 column separated by a semi colon. How do I arrange this data so that they are spread out over ...
PlatinumMaths's user avatar
0 votes
1 answer
5k views

I'm trying to compare address values for inaccuracies, for example, given multiple records like: Reference Apartment Address PostCode AS097 NaN 00 Name Road BH1 4HB AS097 Flat 1 Building Name 00 Name ...
Ricardo Sanchez's user avatar
1 vote
0 answers
35 views

For a project I'm going to be working with spatial data with a nominal attribute (land use). Every year the number of categories for this attribute changes because categories split or merge. I do have ...
Nander Vilar Castellar's user avatar
-1 votes
1 answer
72 views

I have a feature with data creation dates. I have normalized them all to the same format and split them to 'day', 'month' and 'year' columns. But now I have a question. Should I apply normalization or ...
Luiscri's user avatar
  • 101
-1 votes
1 answer
56 views

can someone know algorithm how to identify account names that are similar enough to be potentially merged and imported as one Duplicates with different values: Geico val1 NaN =====>>...
miro_muras's user avatar

15 30 50 per page