Different classifiers are yielding same metric results, is it normal?

Question

I am trying to implement the strategy of hierarchical classification with chained classifiers and a Bayesian network, as in the paper of Serrano-Perrez & Sucar.

The data in my case are ontologies, so this task is not easy at all. I have started to experiment with lubm ontology, and as base classifiers I have tried:

Naive Bayes
Decision Trees
Random Forest
Logistic Regression

I have noticed that these classifiers (except Naive Bayes) yield the same values for the performance metrics.

First, I suspected that the problem was some bug in my code, but after adding Dummy Classifier, I got some terrible results, confirming that the problem could be elsewhere and not necessarily the code.

Based on your experience, is it bad if you have same results for different classifiers?

(I have to present the results to my supervisor for my thesis, and I am very nervous.)

To how many significant figures do the metrics agree? If the answer is "more than four" (or worse, "to the accuracy of my floating-point numbers"), and the number in question is not 0.9999 or some similar indicator of either nearly perfect or nearly random predictive accuracy, I would assume a bug despite your having checked for it and regardless of the dummy classifier not giving the same result, until you can figure out a rigorous alternate explanation for the convergence. — Obie 2.0
– Obie 2.0, Commented 2 hours ago

Valentin Calomme · Accepted Answer · 2026-03-30 14:38:08Z

First of all, it’s easy to say “don’t be nervous”, but honestly, being nervous is a good sign. It means you care about the work you’re doing.

Research can be frustrating precisely because you don’t always get the clean or “expected” results you hoped for. And that’s completely okay. Negative results are still results. What matters is that you do your due diligence to make sure your findings aren’t caused by a bug, a flawed experimental setup, or some preprocessing mistake. Once you’ve ruled those out, the results, whether they confirm your hypothesis or not, are something you can legitimately defend.

Now, about your actual situation:

Is it bad if different classifiers give the same metrics?

Not necessarily. This can happen, and it doesn’t automatically mean something is wrong. In particular:

If the dataset is very simple, linearly separable, or highly imbalanced, you might see several classifiers converging to the same predictions.
If your hierarchical setup constrains the labels strongly, the base classifiers may have limited room to differ.
Ontology-derived features can be very sparse or highly correlated, which often causes multiple models to behave similarly.
It’s also possible that the signal in the data is weak, and only a certain baseline level of performance is achievable.

Your observation that the Dummy Classifier performs noticeably worse is actually reassuring: it means the rest of your models are learning something real.

What this means for your thesis

If your hypothesis was “these classifiers should behave differently”, but your results say otherwise, that’s not a failure. That’s an interesting finding. It gives you room to explore questions such as:

Why might different classifiers converge on the same decision boundary?
Is there some inherent property of ontology-based features that reduces model diversity?
Does the hierarchical/Bayesian structure constrain the predictions so much that model choice matters less?
Does this suggest something about the nature of the dataset or the task itself?

This kind of analysis can actually strengthen your thesis. Showing that something didn’t work as expected, and explaining why, demonstrates deeper understanding than just getting the “textbook” results.

Bottom line

You don’t need to be afraid of presenting results that don’t match your initial expectations. Your job is to defend your findings, whatever direction they point in. As long as you have:

validated your pipeline,
justified your interpretation,
and connected your results back to the literature,

you’re in a solid position.

And worst case? You’ve discovered something unexpected, that’s literally what research is.

I am really glad you took your time to respond to my question, but to be honest, this answer screams ChatGPT, can I please know if it is at least reviewed by a real person, please? — Anisa B.
– Anisa B., Commented 14 hours ago
The answer is my own doing. I am happy to answer any question or comment you might have about the points I am making — Valentin Calomme
– Valentin Calomme, Commented 14 hours ago
That's okay. I'd rather people ask than just silently assume that it was AI-generated without any supervision. What matters is that the advice is helpful and if there is any qualm about it, it is more than welcome — Valentin Calomme
– Valentin Calomme, Commented 12 hours ago
@AnisaB - Well, they are an "AI Engineer | Responsible AI & Data-driven innovation," so any of "they used a bot and were dishonest about it," "they spend too much time reading chatbot outputs and have started to write like them," or "they're the kind of person the bots learned to imitate" are plausible hypotheses. — Obie 2.0
– Obie 2.0, Commented 2 hours ago

Stack Exchange Network

Different classifiers are yielding same metric results, is it normal?

1 Answer 1

Is it bad if different classifiers give the same metrics?

What this means for your thesis

Bottom line

Hot Network Questions

Different classifiers are yielding same metric results, is it normal?

1 Answer 1

Is it bad if different classifiers give the same metrics?

What this means for your thesis

Bottom line

Related

Hot Network Questions