Skip to main content

Questions tagged [r]

R is a free, open-source programming language and software environment for statistical computing, bioinformatics, and graphics.

132 votes
1 answer
386k views

I am building a regression model and I need to calculate the below to check for correlations Correlation between 2 Multi level categorical variables Correlation between a Multi level categorical ...
GeorgeOfTheRF's user avatar
3 votes
1 answer
9k views

From various books and blog posts, I understood that the Variance Inflation Factor (VIF) is used to calculate collinearity. They say that VIF till 10 is good. But I have a question. As we can see in ...
thewhitetulip's user avatar
55 votes
9 answers
11k views

R has many libraries which are aimed at Data Analysis (e.g. JAGS, BUGS, ARULES etc..), and is mentioned in popular textbooks such as: J.Krusche, Doing Bayesian Data Analysis; B.Lantz, "Machine ...
akellyirl's user avatar
  • 723
7 votes
1 answer
3k views

I have dataset of around 180k observations of 13 variables (mix of numerical and categorical features). It is binary classification problem, but classes are imbalanced (25:1 for negative ones). I ...
Filip 's user avatar
  • 73
6 votes
2 answers
14k views

I was wondering which language can I use: R or Python, for my internship in fraud detection in an online banking system: I have to build machine learning algorithms (NN, etc.) that predict transaction ...
Hamza's user avatar
  • 61
37 votes
7 answers
6k views

From my limited dabbling with data science using R, I realized that cleaning bad data is a very important part of preparing data for analysis. Are there any best practices or processes for cleaning ...
Jay Godse's user avatar
  • 471
33 votes
3 answers
45k views

XGBoost have been doing a great job, when it comes to dealing with both categorical and continuous dependant variables. But, how do I select the optimized parameters for an XGBoost problem? This is ...
Dawny33's user avatar
  • 8,506
21 votes
6 answers
17k views

I work in an office where SQL Server is the backbone of everything we do, from data processing to cleaning to munging. My colleague specializes in writing complex functions and stored procedures to ...
AffableAmbler's user avatar
17 votes
2 answers
9k views

I am trying to build a recommendation system using collaborative filtering. I have the usual [user, movie, rating] information. I would like to incorporate an ...
Sidhha's user avatar
  • 397
16 votes
1 answer
10k views

What is the difference in R in xgboost between binary:logistic and reg:logistic? Is it only in evaluation metric? If yes, how does RMSE on binary classification compare to error rate? Is the ...
user2530062's user avatar
16 votes
1 answer
32k views

So, our data set this week has 14 attributes and each column has very different values. One column has values below 1 while another column has values that go from three to four whole digits. We ...
Jae's user avatar
  • 163
10 votes
1 answer
4k views

I implemented a custom objective and metric for a xgboost regression. In order to see if I'm doing this correctly, I started with a quadratic loss. The ...
Peter's user avatar
  • 8,044
7 votes
6 answers
5k views

I'm interested in model debugging, and one of the points that it mentions is to compare your model with a "less complex" one to check if the performance is substantially better on the most ...
Multivac's user avatar
  • 3,519
7 votes
2 answers
8k views

I'm exploring options for recommender systems optimized for the insurance industry, which would take into account i) product holdings ii) user characteristics (segment, age, affluence, etc.). I ...
Kasia Kulma's user avatar
7 votes
1 answer
4k views

I've compared the logistic regression models on R (glm) and on Spark (LogisticRegressionWithLBFGS) on a dataset of 390 obs. of ...
SparkUser's user avatar
  • 113

15 30 50 per page