The Wayback Machine - https://web.archive.org/web/20210421055225/https://spacy.io/api/scorer/

Other

Scorer

class
Compute evaluation scores

The Scorer computes evaluation scores. It’s typically created by Language.evaluate. In addition, the Scorer provides a number of evaluation methods for evaluating Token and Doc attributes.

Scorer.__init__ method

Create a new Scorer.

NameDescription
nlpThe pipeline to use for scoring, where each pipeline component may provide a scoring method. If none is provided, then a default pipeline for the multi-language code xx is constructed containing: senter, tagger, morphologizer, parser, ner, textcat. Language

Scorer.score method

Calculate the scores for a list of Example objects using the scoring methods provided by the components in the pipeline.

The returned Dict contains the scores provided by the individual pipeline components. For the scoring methods provided by the Scorer and use by the core pipeline components, the individual score names start with the Token or Doc attribute being scored:

  • token_acc, token_p, token_r, token_f,
  • sents_p, sents_r, sents_f
  • tag_acc, pos_acc, morph_acc, morph_per_feat, lemma_acc
  • dep_uas, dep_las, dep_las_per_type
  • ents_p, ents_r ents_f, ents_per_type
  • textcat_macro_auc, textcat_macro_f
NameDescription
examplesThe Example objects holding both the predictions and the correct gold-standard annotations. Iterable[Example]

Scorer.score_tokenization staticmethodv3.0

Scores the tokenization:

  • token_acc: number of correct tokens / number of gold tokens
  • token_p, token_r, token_f: precision, recall and F-score for token character spans

Docs with has_unknown_spaces are skipped during scoring.

NameDescription
examplesThe Example objects holding both the predictions and the correct gold-standard annotations. Iterable[Example]

Scorer.score_token_attr staticmethodv3.0

Scores a single token attribute. Tokens with missing values in the reference doc are skipped during scoring.

NameDescription
examplesThe Example objects holding both the predictions and the correct gold-standard annotations. Iterable[Example]
attrThe attribute to score. str
keyword-only
getterDefaults to getattr. If provided, getter(token, attr) should return the value of the attribute for an individual Token. Callable[[Token, str], Any]
missing_valuesAttribute values to treat as missing annotation in the reference annotation. Defaults to {0, None, ""}. Set[Any]

Scorer.score_token_attr_per_feat staticmethodv3.0

Scores a single token attribute per feature for a token attribute in the Universal Dependencies FEATS format. Tokens with missing values in the reference doc are skipped during scoring.

NameDescription
examplesThe Example objects holding both the predictions and the correct gold-standard annotations. Iterable[Example]
attrThe attribute to score. str
keyword-only
getterDefaults to getattr. If provided, getter(token, attr) should return the value of the attribute for an individual Token. Callable[[Token, str], Any]
missing_valuesAttribute values to treat as missing annotation in the reference annotation. Defaults to {0, None, ""}. Set[Any]

Scorer.score_spans staticmethodv3.0

Returns PRF scores for labeled or unlabeled spans.

NameDescription
examplesThe Example objects holding both the predictions and the correct gold-standard annotations. Iterable[Example]
attrThe attribute to score. str
keyword-only
getterDefaults to getattr. If provided, getter(doc, attr) should return the Span objects for an individual Doc. Callable[[Doc, str], Iterable[Span]]
has_annotationDefaults to None. If provided, has_annotation(doc) should return whether a Doc has annotation for this attr. Docs without annotation are skipped for scoring purposes. Optional[Callable[[Doc], bool]]

Scorer.score_deps staticmethodv3.0

Calculate the UAS, LAS, and LAS per type scores for dependency parses. Tokens with missing values for the attr (typically dep) are skipped during scoring.

NameDescription
examplesThe Example objects holding both the predictions and the correct gold-standard annotations. Iterable[Example]
attrThe attribute to score. str
keyword-only
getterDefaults to getattr. If provided, getter(token, attr) should return the value of the attribute for an individual Token. Callable[[Token, str], Any]
head_attrThe attribute containing the head token. str
head_getterDefaults to getattr. If provided, head_getter(token, attr) should return the head for an individual Token. Callable[[Doc, str], Token]
ignore_labelsLabels to ignore while scoring (e.g. "punct"). Iterable[str]
missing_valuesAttribute values to treat as missing annotation in the reference annotation. Defaults to {0, None, ""}. Set[Any]

Scorer.score_cats staticmethodv3.0

Calculate PRF and ROC AUC scores for a doc-level attribute that is a dict containing scores for each label like Doc.cats. The returned dictionary contains the following scores:

  • {attr}_micro_p, {attr}_micro_r and {attr}_micro_f: each instance across each label is weighted equally
  • {attr}_macro_p, {attr}_macro_r and {attr}_macro_f: the average values across evaluations per label
  • {attr}_f_per_type and {attr}_auc_per_type: each contains a dictionary of scores, keyed by label
  • A final {attr}_score and corresponding {attr}_score_desc (text description)

The reported {attr}_score depends on the classification properties:

  • binary exclusive with positive label: {attr}_score is set to the F-score of the positive label
  • 3+ exclusive classes, macro-averaged F-score: {attr}_score = {attr}_macro_f
  • multilabel, macro-averaged AUC: {attr}_score = {attr}_macro_auc
NameDescription
examplesThe Example objects holding both the predictions and the correct gold-standard annotations. Iterable[Example]
attrThe attribute to score. str
keyword-only
getterDefaults to getattr. If provided, getter(doc, attr) should return the cats for an individual Doc. Callable[[Doc, str], Dict[str, float]]
labelsThe set of possible labels. Defaults to []. Iterable[str]
multi_labelWhether the attribute allows multiple labels. Defaults to True. bool
positive_labelThe positive label for a binary task with exclusive classes. Defaults to None. Optional[str]