Pipeline Functions
merge_noun_chunks function
Merge noun chunks into a single token. Also available via the string name
"merge_noun_chunks". After initialization, the component is typically added to
the processing pipeline using nlp.add_pipe.
| Name | Type | Description |
|---|---|---|
doc | Doc | The Doc object to process, e.g. the Doc in the pipeline. |
| RETURNS | Doc | The modified Doc with merged noun chunks. |
merge_entities function
Merge named entities into a single token. Also available via the string name
"merge_entities". After initialization, the component is typically added to
the processing pipeline using nlp.add_pipe.
| Name | Type | Description |
|---|---|---|
doc | Doc | The Doc object to process, e.g. the Doc in the pipeline. |
| RETURNS | Doc | The modified Doc with merged entities. |
merge_subtokens functionv2.1
Merge subtokens into a single token. Also available via the string name
"merge_subtokens". After initialization, the component is typically added to
the processing pipeline using nlp.add_pipe.
As of v2.1, the parser is able to predict “subtokens” that should be merged into
one single token later on. This is especially relevant for languages like
Chinese, Japanese or Korean, where a “word” isn’t defined as a
whitespace-delimited sequence of characters. Under the hood, this component uses
the Matcher to find sequences of tokens with the dependency
label "subtok" and then merges them into a single token.
| Name | Type | Description |
|---|---|---|
doc | Doc | The Doc object to process, e.g. the Doc in the pipeline. |
label | unicode | The subtoken dependency label. Defaults to "subtok". |
| RETURNS | Doc | The modified Doc with merged subtokens. |

