Remove extra commentary

Source Link

edited Jun 1, 2019 at 2:44

2k
11
24
36

I am fairly new to ML.

I used Random Forest and hypertuned the parameters for a binary classification problem on a dataset (dataset A). I got a F1 score of 0.78. I then used a second dataset (dataset B). It was very similar to dataset A (same variables and the distribution of classes in the target variable). I again built and trained a different Random Forest algorithm for dataset B. I expected the f1 score to be around 0.78, but the f1 score for dataset B was 0.50.

I don't understand whyWhy could there isbe such a large difference between the f1 scores of the 2 datasets.?

Both datasets (A & B) are very similar to each other and I trained separate models on both of them.

Any inputs on how to approach this issue would be greatly appreciated. Thanks!

minor edits

Source Link

edit approved Jun 1, 2019 at 2:44

user29169

Why does the same algorithm gavegive very different metrics on similar datasets?

I am fairly new to ML and still in the learning phase.

I used Random Forest (and hypertuned the parameters) for a binary classification problem on onea dataset ( datasetdataset A). I got a F1 score of 0.78. I then gotused a second dataset (dataset B).It It was very similar to dataset A (A).By similar I mean samesame variables and the distribution of classes in the target variable).I I again built and trained a different random forestRandom Forest algorithm for dataset B. I expected the f1 score to be around 0.78 , but the f1 score for dataset B was 0.50.

I don't understand why there is such a starklarge difference inbetween the f1 scores forof the 2 datasets.Both Both datasets ( A&A & B) are very similar to each other and I trained separate models on both of them.

Any inputs on how to approach this issue?Thanks would be greatly appreciated. Thanks!

Why same algorithm gave very different metrics on similar datasets?

I am fairly new to ML and still in the learning phase.

I used Random Forest ( hypertuned the parameters) for a binary classification problem on one dataset ( dataset A). I got a F1 score of 0.78. I then got a second dataset(dataset B).It was very similar to dataset(A).By similar I mean same variables and the distribution of classes in the target variable.I again built and trained a different random forest algorithm for dataset B. I expected the f1 score to be around 0.78 , but the f1 score for dataset B was 0.50.

I don't understand why there is such a stark difference in the f1 scores for the 2 datasets.Both datasets ( A& B) are very similar to each other and I trained separate models on both of them.

Any inputs on how to approach this issue?Thanks!

Why does the same algorithm give very different metrics on similar datasets?

I am fairly new to ML.

I used Random Forest and hypertuned the parameters for a binary classification problem on a dataset (dataset A). I got a F1 score of 0.78. I then used a second dataset (dataset B). It was very similar to dataset A (same variables and the distribution of classes in the target variable). I again built and trained a different Random Forest algorithm for dataset B. I expected the f1 score to be around 0.78, but the f1 score for dataset B was 0.50.

I don't understand why there is such a large difference between the f1 scores of the 2 datasets. Both datasets (A & B) are very similar to each other and I trained separate models on both of them.

Any inputs on how to approach this issue would be greatly appreciated. Thanks!

Source Link

asked May 31, 2019 at 23:42

data_analyst

21
2

Why same algorithm gave very different metrics on similar datasets?

I am fairly new to ML and still in the learning phase.

I used Random Forest ( hypertuned the parameters) for a binary classification problem on one dataset ( dataset A). I got a F1 score of 0.78. I then got a second dataset(dataset B).It was very similar to dataset(A).By similar I mean same variables and the distribution of classes in the target variable.I again built and trained a different random forest algorithm for dataset B. I expected the f1 score to be around 0.78 , but the f1 score for dataset B was 0.50.

I don't understand why there is such a stark difference in the f1 scores for the 2 datasets.Both datasets ( A& B) are very similar to each other and I trained separate models on both of them.

Any inputs on how to approach this issue?Thanks!

random-forest

Stack Exchange Network

Return to Question

Post Timeline

Why does the same algorithm gavegive very different metrics on similar datasets?

Why same algorithm gave very different metrics on similar datasets?

Why does the same algorithm give very different metrics on similar datasets?

Why same algorithm gave very different metrics on similar datasets?