Multiclass classification error metrics in Apache Spark

Question

Is it possible to find the error metrics(precision and recall) in a multiclass classification problem in Apache Spark. I am using Logistic Regression from Spark's MlLib to build my model and want to evaluate my model using the error metrics.

tourist · Accepted Answer · 2018-07-03 00:13:44Z

From MLlib docs

Assuming your test data is in test

import org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS
import org.apache.spark.mllib.evaluation.MulticlassMetrics
import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.mllib.util.MLUtils

val predictionAndLabels = test.map { case LabeledPoint(label,
   features) =>
     val prediction = model.predict(features)
     (prediction, label)
}

val metrics = new MulticlassMetrics(predictionAndLabels)

Confusion matrix

println("Confusion matrix:")
println(metrics.confusionMatrix)

Overall Statistics

val accuracy = metrics.accuracy
println("Summary Statistics")
println(s"Accuracy = $accuracy")

Precision by label

val labels = metrics.labels
labels.foreach { l =>
  println(s"Precision($l) = " + metrics.precision(l))
}

Recall by label

labels.foreach { l =>
  println(s"Recall($l) = " + metrics.recall(l))
}

False positive rate by label

labels.foreach { l =>
  println(s"FPR($l) = " + metrics.falsePositiveRate(l))
}

F-measure by label

labels.foreach { l =>
  println(s"F1-Score($l) = " + metrics.fMeasure(l))
}

Weighted stats

println(s"Weighted precision: ${metrics.weightedPrecision}")
println(s"Weighted recall: ${metrics.weightedRecall}")
println(s"Weighted F1 score: ${metrics.weightedFMeasure}")
println(s"Weighted false positive rate: ${metrics.weightedFalsePositiveRate}")

Ok Thanks. This is using the RDD API I guess. Is there a way to do the same thing using the DataFrame API as it is much more optimised.
AFAIK there is only RDD based API for this, I could be very well wrong
@RajnilGuha if you needed the dataframe API, you should have clarified this in your question, plus you should have provided what you have tried so far (from which, arguably, the API used would be apparent). I hope it is now clear why such 3-line "questions" get closed and downvoted...
@naveenmarri there is indeed a MulticlassClassificationEvaluator in Spark ML providing weighted precision & recall; check also the docs for tuning
@RajnilGuha an attempt can very well be that's my code so far, and from this point I need to compute this & that... Otherwise you expect someone to come up with an end-to-end scenario of his/her own, which may or may be not appropriate for your case - actually this is exactly what happened here, with a respondent spending time to help you, only to be told afterwards "that's nice, but I don't want it for the RDD API" (and not even getting an upvote as an elementary courtesy for the time spent)...

Collectives™ on Stack Overflow

Multiclass classification error metrics in Apache Spark

1 Answer 1

6 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Related