Questions tagged [regression]
Techniques for analyzing the relationship between one (or more) "dependent" variables and "independent" variables.
1,586 questions
3
votes
0
answers
22
views
Media Value Simulation ( Regression)
I am estimating a regression model to predict media value and later use residuals for Monte Carlo simulation.
The model includes:
• Market fixed effects (grouped)
• Asset categories
• A hierarchical ...
5
votes
1
answer
326
views
Decision Tree for Predicting Upcoming Years [closed]
Is decision tree based model not suitable for predicting upcoming years? When the model was build using panel data? And i want to predict the upcoming years (for example year 2027)
0
votes
0
answers
18
views
Prediction Error [duplicate]
I am currently building a Gradient boosting regressor model using dataset of whole city and subregion in a country. Hence, my target is very sparse in range with the minimum of 3k and max of 1000k. ...
1
vote
0
answers
129
views
How to properly predict goals in soccer matches using match statistics?
This is my first time posting here
I'm a beginner in Data Science and currently trying to apply what I've learned to a real-world problem.
I built a web scraping script to collect statistics from ...
5
votes
1
answer
90
views
Hyperparameter Tuning Results
I did some hyperparameter tuning on learning rate and n_estimators for GBR model. However, the grid search gives me higher learning rate=0.2 and higher n_estimators=300 compared to default value.
When ...
3
votes
1
answer
60
views
Hyperparameter Tuning
I tried building a GBR model using the default hyperparameters, and get the results of RMSE almost 2 times higher than the MAE (my data target is in a very wide range from $10^3$ to $10^6$). I try to ...
4
votes
1
answer
87
views
Data Splitting for Hyperparameter Tuning
I want to do some hyperparameter tuning for my Gradient Boosting Regressor model to reduce the RMSE because when i evaluate the model using test set the RMSE is almost 2 times higher than the MAE. ...
5
votes
2
answers
174
views
Is Logistic Regression actually used for regression?
This is a question asked in my homework assignment, the full question is "Is Logistic Regression actually used for regression (predicting a continuous value)? If not, state what task it really ...
0
votes
0
answers
33
views
Can I use the slope of a regression to establish a correlation, if r_square is less than 20%?
I fit a regression line between a variable and target value. The coefficient of determination (R_square) between the two is very less < 20%. Does the calculated slope holds any significance in this ...
10
votes
1
answer
324
views
How do I train a regression model on time series data containing a band of zeros?
I am trying to create some kind of regression model. Target is continuous and can both be negative and positive. However, the issue is that there is a region/band that I know is roughly -50 to 50, ...
4
votes
2
answers
164
views
Should I use cross-validation or train-test split for a small sample size?
My dataset is less than 1000 samples with less than 10 features. Which method i should use? And if I use Cross-Validation then i have then how to choose the right size of fold?
5
votes
1
answer
247
views
XGBoost or GBR?
What is the pros and cons of using XGBoost VS GBR (scikit-learn) when dealing with data 500<records<1000 and about 5 columns?
1
vote
0
answers
40
views
Is it really necessary to enforce constraints on ouputs in neural network?
I have a question. I'm doing a regression and I have 20 outputs where their sum is equal to 1 and also they are non-negative. I thought since their sum is equal to 1 maybe I can predict first 19 ...
5
votes
2
answers
306
views
How can I identify which activation function is the best for my neural network based on inputs and outputs?
My professor said I shouldn't use blind sense in neural network and I should choose activation functions carefully based on my inputs and outputs and their constraints. In the project I have the ...
6
votes
2
answers
84
views
How to handle irrelevant categorical variables in aggregated data?
I’m working with ad server data where I can’t get user-level data — only aggregated reports. The data is aggregated on multiple categorical dimensions (e.g., day × product × medium × source × campaign ...