4

Issue: Getting r2 near to 0.64. Want to improve my results more. Don't know what's the issue of these results. Have done Removing outliers, Converting String -> Numerical, normalization. Wanna know is there any issue with my output? Please ask me anything if I didn't ask the question correctly. It's just my starting on Stack overflow.

y.value_counts()
3.3    215
3.0    185
2.7    154
3.7    134
2.3     96
4.0     54
2.0     31
1.7     21
1.3     20

This is histogram of my outputs. I am not professional in Regression need super help from your side.

Histogram of my Outputs

Removing Collinearity in my inputs


import seaborn as sns
# data=z_scores(df)
data=df
correlation=data.corr()

k=22
cols=correlation.nlargest(k,'Please enter your Subjects GPA which you have studied? (CS) [Introduction to ICT]')['Please enter your Subjects GPA which you have studied? (CS) [Introduction to ICT]'].index
cm=np.corrcoef(data[cols].values.T)
f,ax=plt.subplots(figsize=(15,15))
sns.heatmap(cm,vmax=.8,linewidths=0.01,square=True,annot=True,cmap='viridis',
            linecolor="white",xticklabels=cols.values,annot_kws={'size':12},yticklabels=cols.values)

enter image description here

cols=pd.DataFrame(cols)
cols=cols.set_axis(["Selected Features"], axis=1)
cols=cols[cols['Selected Features'] != 'Please enter your Subjects GPA which you have studied? (CS) [Introduction to ICT]']
cols=cols[cols['Selected Features'] != 'Your Fsc/Ics marks percentage?']
X=df[cols['Selected Features'].tolist()]
X

Then applied Random Forest Regressor and got these results

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=42)

from sklearn.ensemble import RandomForestRegressor
regressor = RandomForestRegressor(n_estimators = 10, random_state = 0)
model=regressor.fit(X_train, y_train)
y_pred = model.predict(X_test)

print("MAE Score: ", mean_absolute_error(y_test, y_pred))
print("MSE Score: ", mean_squared_error(y_test, y_pred))
print("RMSE Score: ", math.sqrt(mean_squared_error(y_test, y_pred)))
print("R2 score : %.2f" %r2_score(y_test,y_pred))

Got these Results.

MAE Score:  0.252967032967033
MSE Score:  0.13469450549450546
RMSE Score:  0.36700750059706605
R2 score : 0.64

1 Answer 1

1

in order to get better results you need to do hyper-parameter tuning try to focus on these

  1. n_estimators = number of trees in the forest
    max_features = max number of features considered for splitting a node
    max_depth = max number of levels in each decision tree
    min_samples_split = min number of data points placed in a node before the node is split
    min_samples_leaf = min number of data points allowed in a leaf node
    bootstrap = method for sampling data points (with or without replacement)  
    
  2. Parameters currently in use(random forest regressor )
    {'bootstrap': True,
    'criterion': 'mse',
    'max_depth': None,
    'max_features': 'auto',
    'max_leaf_nodes': None,
    'min_impurity_decrease': 0.0,
    'min_impurity_split': None,
    'min_samples_leaf': 1,
    'min_samples_split': 2,
    'min_weight_fraction_leaf': 0.0,
    'n_estimators': 10,
    'n_jobs': 1,
    'oob_score': False,
    'random_state': 42,
    'verbose': 0,
    'warm_start': False} 
    
  3. k fold cross validation

  4. use grid search cv

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.