Imagine that you are the boss and hire a data scientist at a high salary to build a model that you hope will achieve a high accuracy score.$^{\dagger}$ She comes back to you reporting $92\%$ accuracy. “Wow,” you think. “That sounds like an A in school! Great job you’re worth every penny of your salary!”
Then you realize that you could have gotten that same $92\%$ accuracy by predicting the dominant outcome every time. At this point, you realize that your data scientist has not accomplished anything beyond what you could have accomplished, and you no longer feel that she is worth the high salary you pay her.
In your case, since you know that $92\%$ of the so patients lack the hyperthyroid, you could get a classifier with $92\%$ accuracy just by predicting this every time.
You’re always allowed to—and I would argue should be encouraged to—compare to some kind of baseline model.
An issue that might get you is that there are three categories, yet the task is to classify into hyperthyroid or not, which is binary. Whether or not these categories should be combined or dealt with separately really warrants a distinct question, however.
$^{\dagger}$A critical aspect of this is the assumption of accuracy being the metric of interest. I am not sold on this. First, any threshold-based metric is known to have issues. Second, even if you are in a position where you must make discrete classifications instead of predicting tendencies like the link discusses, the costs of wrong decisions need not be equal. I find that to be highly plausible here, and you might be willing to sacrifice a bit of accuracy for gains in sensitivity or specificity. The idea of comparing to baseline still applies to such a performance metric, though. If your data scientist cannot make a model that performs better than your naïve model that classifies everyone the same way, she probably hasn’t helped you that much, no matter how much the accuracy score looks like an A in school.
Related.