2
$\begingroup$

I am trying to use the XGBoost model to perform a multi-class classification over 40 classes.

The code is as follows:

xgb_params = list(colsample_bytree= 0.7,
                  subsample = 0.7,
                  eta = 0.05,
                  objective= 'multi:softmax',
                  max_depth= 5,
                  min_child_weight= 1,
                  eval_metric= "mlogloss", num_class = categoryclassnos,
                  nthread=4)

fit.xgb = xgb.train(params = xgb_params,
                    data = dtrain,
                    nrounds = 500,
                    watchlist = list(train = dtrain, test=dtest),
                    print_every_n = 50)

However, I am getting the following error:

Check failed: (info.labels.size()) != (0) label set cannot be empty

I have reproduced the dataset and the R script here.

Any help/ pointers are deeply appreciated.

$\endgroup$
4
  • $\begingroup$ In your code what does trainlabelsfactored <- as.integer(train$primarydeptt) - 1 return $\endgroup$ Commented Jun 1, 2017 at 19:02
  • $\begingroup$ This returns an integer in the range [0,max] based on the factor values. XGBoost requires this range (starting from 0) for label. $\endgroup$ Commented Jun 2, 2017 at 1:45
  • $\begingroup$ Continuing to explore this... As an attempt, I added label value in dtest parameter. The error went away... I still don't get the logic as there is unlikely to be a label tag available for test data set while predicting.. $\endgroup$ Commented Jun 2, 2017 at 9:53
  • 1
    $\begingroup$ When creating the xgb.DMatrix (dtrain, dtest), specify the label parameter. $\endgroup$ Commented Jun 27, 2018 at 8:18

2 Answers 2

3
$\begingroup$

In python I needed to set the label param in dtrain = xgb.DMatrix(X_train,label=y_train,feature_names=cfg_col_X)

PS I was going to comment but missing rep points.

$\endgroup$
1
$\begingroup$

To train a model you need two things, your training data(a matrix of variables and observations) and your labels (the known classification of each observation in the training data). Perhaps the last column in your data is actually the label? If so you will to vertically split your dtrain so that parameter data= all columns but the last, and a parameter label= just the last column.

Once you have the model, you can use it to predict on data that doesn’t have labels and the output will be the labels.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.