Return to Revisions

1 of 3

asked May 31, 2019 at 23:42

Why same algorithm gave very different metrics on similar datasets?

I am fairly new to ML and still in the learning phase.

I used Random Forest ( hypertuned the parameters) for a binary classification problem on one dataset ( dataset A). I got a F1 score of 0.78. I then got a second dataset(dataset B).It was very similar to dataset(A).By similar I mean same variables and the distribution of classes in the target variable.I again built and trained a different random forest algorithm for dataset B. I expected the f1 score to be around 0.78 , but the f1 score for dataset B was 0.50.

I don't understand why there is such a stark difference in the f1 scores for the 2 datasets.Both datasets ( A& B) are very similar to each other and I trained separate models on both of them.

Any inputs on how to approach this issue?Thanks!

random-forest

asked May 31, 2019 at 23:42

data_analyst