Test accuracy is greater than train accuracy what to do?

Question

I am using the random forest.My test accuracy is 70% on the other hand train accuracy is 34% ? what to do ? How can I solve this problem.

Welcome to SO. Please be more specific and show code and data. — petezurich
– petezurich, Commented Jul 22, 2018 at 11:29

WestCoastProjects · Accepted Answer · 2018-07-22 19:08:18Z

21

Test accuracy should not be higher than train since the model is optimized for the latter. Ways in which this behavior might happen:

you did not use the same source dataset for test. You should do a proper train/test split in which both of them have the same underlying distribution. Most likely you provided a completely different (and more agreeable) dataset for test
an unreasonably high degree of regularization was applied. Even so there would need to be some element of "test data distribution is not the same as that of train" for the observed behavior to occur.

answered Jul 22, 2018 at 19:08

WestCoastProjects

63.9k109 gold badges368 silver badges636 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Eran Yogev Over a year ago

I agree with @javadba and would like to add: Another reason could be data contamination where records from the train set also exist in the test set.

Ethereal Over a year ago

I disagree with this; model metrics would be more representative of real-world performance if test data is more accurate than train. The downside is that the model performance would be better if the train data was more accurate.

AndW Over a year ago

I disagree with the statement that it should not be higher than train. In OP's case, the gap is large enough that it should not be. However, in some cases (such as mine), an adversarially trained model can be expected to have a lower training accuracy than a benign test set.

WestCoastProjects Over a year ago

@a6623 Thanks for that clarification ie my statement does not hold in all cases. In particular adversarial models are a different animal

WestCoastProjects Over a year ago

@Ethereal I have now run across a scneario that matches your description. The training data contains more difficult to model scenarios than some of the prediction/test only datasets. So some of the latter actually have higher statistical performances in prediction.

AndW · Accepted Answer · 2022-07-16 02:57:59Z

The other answers are correct in most cases. But I'd like to offer another perspective. There are specific training regimes that could cause the training data to be harder for the model to learn - for instance, adversarial training or adding Gaussian noise to the training examples. In these cases, the benign test accuracy could be higher than train accuracy, because benign examples are easier to evaluate. This isn't always a problem, however!

If this applies to you, and the gap between train and test accuracies is larger than you'd like (~30%, as in your question, is a pretty big gap), then this indicates that your model is underfitting to the harder patterns, so you'll need to increase the expressibility of your model. In the case of random forests, this might mean training the trees to a higher depth.

karel · Accepted Answer · 2021-10-23 08:03:59Z

1

First you should check the data that is used for training. I think there is some problem with the data, the data may not be properly pre-processed.

Also, in this case, you should try more epochs. Plot the learning curve to analyze when the model is going to converge.

You should check the following:

Both training and validation accuracy scores should increase and loss should decrease.
If there is something wrong in step 1 after any particular epoch, then train your model until that epoch only, because your model is over-fitting after that.

edited Oct 23, 2021 at 8:03

karel

5,96562 gold badges58 silver badges60 bronze badges

answered Jul 10, 2020 at 16:09

Mukul Kirti Verma

5946 silver badges10 bronze badges

Collectives™ on Stack Overflow

Test accuracy is greater than train accuracy what to do?

3 Answers 3

5 Comments

1 Comment

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

1 Comment

Comments

Related