Tidymodels: Use step_dummy() for multiple binary classifications?

Question

I am a little bit lost in tidymodels. I have a some data from topicmodeling:

prevalent_topic: factor variable with most prevalent topic, ranging from "Topic_1" to "Topic_5"
value1 and value2: two numeric variables used as predictor

I want to predict/classify the prevalent_topic based on value1 and value2:

prevalent_topic ~ value1 + value2

I started with multiclass classification using glmnet and nnet with tidymodels. Now I want to try "one-vs-rest" binary classification and created a recipe to begin with:

dfFT_rec <- recipe( ~ value1 + value2, data = dfFT_train) %>%
  step_dummy(prevalent_topic, one_hot = TRUE) %>%
  step_normalize(c(value1, value2))

The second step creates dummy variables that I would like to use as outcome, e.g. "prevalent_topic_Topic_1", ""prevalent_topic_Topic_2", ...

I tried to update the recipe's formula to use "prevalent_topic_Topic_1 ~ value1 + value2" but that did not work. I also tried to fit a workflow to my data without specifying the outcome but only got an error: "logistic_reg() was unable to find an outcome."

Is this possible at all? Or is there a different way to turn an outcome factor variable into dummy-coded outcome variables?

topepo · Accepted Answer · 2025-05-15 14:27:21Z

1

As long as the values in prevalent_topic are mutually exclusive (and are in the normal factor class), you can use multinom_reg() to get a model. Instead of fitting a set of logistic regressions, you can simultaneously model all of your categories.

If there are not mutually exclusive (like a multiple choice question), you would probably need to make separate factors and model each separately. That "multilabel" structure isn't currently supported in tidymodels. You might look at the recipe step step_dummy_multi_choice() (https://recipes.tidymodels.org/reference/step_dummy_multi_choice.html) (followed by step_bin2factor() (https://recipes.tidymodels.org/reference/step_bin2factor.html)) to make the different outcome columns.

answered May 15 at 14:27

topepo

14.5k4 gold badges41 silver badges54 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

PsyR May 15 at 15:49

Thanks! Yes, they are mutually exclusive and I already tried multinom_reg() . The reason why I want to try one-vs-rest binary classification is that the topics are not nearly equally distributed, ranging from 4 to 5,000. Trying to equal that out with the themis package is not necessarily the better approach. I will check the `step_dummy_multi_choice()' but perhaps dummy-coding the outcome in advance to the ml workflow might be another approach.

Collectives™ on Stack Overflow

Tidymodels: Use step_dummy() for multiple binary classifications?

1 Answer 1

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Related