Cython Structs

C-language objects that let you group variables together

TokenC C struct
Source

Cython data container for the Token object.

Name	Type	Description
`lex`	`const LexemeC*`	A pointer to the lexeme for the token.
`morph`	`uint64_t`	An ID allowing lookup of morphological attributes.
`pos`	`univ_pos_t`	Coarse-grained part-of-speech tag.
`spacy`	`bint`	A binary value indicating whether the token has trailing whitespace.
`tag`	`attr_t`	Fine-grained part-of-speech tag.
`idx`	`int`	The character offset of the token within the parent document.
`lemma`	`attr_t`	Base form of the token, with no inflectional suffixes.
`sense`	`attr_t`	Space for storing a word sense ID, currently unused.
`head`	`int`	Offset of the syntactic parent relative to the token.
`dep`	`attr_t`	Syntactic dependency relation.
`l_kids`	`uint32_t`	Number of left children.
`r_kids`	`uint32_t`	Number of right children.
`l_edge`	`uint32_t`	Offset of the leftmost token of this token’s syntactic descendants.
`r_edge`	`uint32_t`	Offset of the rightmost token of this token’s syntactic descendants.
`sent_start`	`int`	Ternary value indicating whether the token is the first word of a sentence. `0` indicates a missing value, `-1` indicates `False` and `1` indicates `True`. The default value, 0, is interpreted as no sentence break. Sentence boundary detectors will usually set 0 for all tokens except tokens that follow a sentence boundary.
`ent_iob`	`int`	IOB code of named entity tag. `0` indicates a missing value, `1` indicates `I`, `2` indicates `0` and `3` indicates `B`.
`ent_type`	`attr_t`	Named entity type.
`ent_id`	`attr_t`	ID of the entity the token is an instance of, if any. Currently not used, but potentially for coreference resolution.

Token.get_struct_attr staticmethodnogil
Source

Get the value of an attribute from the TokenC struct by attribute ID.

Name	Type	Description
`token`	`const TokenC*`	A pointer to a `TokenC` struct.
`feat_name`	`attr_id_t`	The ID of the attribute to look up. The attributes are enumerated in `spacy.typedefs`.
RETURNS	`attr_t`	The value of the attribute.

Token.set_struct_attr staticmethodnogil
Source

Set the value of an attribute of the TokenC struct by attribute ID.

Name	Type	Description
`token`	`const TokenC*`	A pointer to a `TokenC` struct.
`feat_name`	`attr_id_t`	The ID of the attribute to look up. The attributes are enumerated in `spacy.typedefs`.
`value`	`attr_t`	The value to set.

token_by_start function
Source

Find a token in a TokenC* array by the offset of its first character.

Example

from spacy.tokens.doc cimport Doc, token_by_start
from spacy.vocab cimport Vocab

doc = Doc(Vocab(), words=["hello", "world"])
assert token_by_start(doc.c, doc.length, 6) == 1
assert token_by_start(doc.c, doc.length, 4) == -1

Name	Type	Description
`tokens`	`const TokenC*`	A `TokenC*` array.
`length`	`int`	The number of tokens in the array.
`start_char`	`int`	The start index to search for.
RETURNS	`int`	The index of the token in the array or `-1` if not found.

token_by_end function
Source

Find a token in a TokenC* array by the offset of its final character.

Example

from spacy.tokens.doc cimport Doc, token_by_end
from spacy.vocab cimport Vocab

doc = Doc(Vocab(), words=["hello", "world"])
assert token_by_end(doc.c, doc.length, 5) == 0
assert token_by_end(doc.c, doc.length, 1) == -1

Name	Type	Description
`tokens`	`const TokenC*`	A `TokenC*` array.
`length`	`int`	The number of tokens in the array.
`end_char`	`int`	The end index to search for.
RETURNS	`int`	The index of the token in the array or `-1` if not found.

set_children_from_heads function
Source

Set attributes that allow lookup of syntactic children on a TokenC* array. This function must be called after making changes to the TokenC.head attribute, in order to make the parse tree navigation consistent.

Example

from spacy.tokens.doc cimport Doc, set_children_from_heads
from spacy.vocab cimport Vocab

doc = Doc(Vocab(), words=["Baileys", "from", "a", "shoe"])
doc.c[0].head = 0
doc.c[1].head = 0
doc.c[2].head = 3
doc.c[3].head = 1
set_children_from_heads(doc.c, doc.length)
assert doc.c[3].l_kids == 1

Name	Type	Description
`tokens`	`const TokenC*`	A `TokenC*` array.
`length`	`int`	The number of tokens in the array.

LexemeC C struct
Source

Struct holding information about a lexical type. LexemeC structs are usually owned by the Vocab, and accessed through a read-only pointer on the TokenC struct.

Name	Type	Description
`flags`	`flags_t`	Bit-field for binary lexical flag values.
`id`	`attr_t`	Usually used to map lexemes to rows in a matrix, e.g. for word vectors. Does not need to be unique, so currently misnamed.
`length`	`attr_t`	Number of unicode characters in the lexeme.
`orth`	`attr_t`	ID of the verbatim text content.
`lower`	`attr_t`	ID of the lowercase form of the lexeme.
`norm`	`attr_t`	ID of the lexeme’s norm, i.e. a normalized form of the text.
`shape`	`attr_t`	Transform of the lexeme’s string, to show orthographic features.
`prefix`	`attr_t`	Length-N substring from the start of the lexeme. Defaults to `N=1`.
`suffix`	`attr_t`	Length-N substring from the end of the lexeme. Defaults to `N=3`.
`cluster`	`attr_t`	Brown cluster ID.
`prob`	`float`	Smoothed log probability estimate of the lexeme’s word type (context-independent entry in the vocabulary).
`sentiment`	`float`	A scalar value indicating positivity or negativity.

Lexeme.get_struct_attr staticmethodnogil
Source

Get the value of an attribute from the LexemeC struct by attribute ID.

Name	Type	Description
`lex`	`const LexemeC*`	A pointer to a `LexemeC` struct.
`feat_name`	`attr_id_t`	The ID of the attribute to look up. The attributes are enumerated in `spacy.typedefs`.
RETURNS	`attr_t`	The value of the attribute.

Lexeme.set_struct_attr staticmethodnogil
Source

Set the value of an attribute of the LexemeC struct by attribute ID.

Name	Type	Description
`lex`	`const LexemeC*`	A pointer to a `LexemeC` struct.
`feat_name`	`attr_id_t`	The ID of the attribute to look up. The attributes are enumerated in `spacy.typedefs`.
`value`	`attr_t`	The value to set.

Lexeme.c_check_flag staticmethodnogil
Source

Check the value of a binary flag attribute.

Name	Type	Description
`lexeme`	`const LexemeC*`	A pointer to a `LexemeC` struct.
`flag_id`	`attr_id_t`	The ID of the flag to look up. The flag IDs are enumerated in `spacy.typedefs`.
RETURNS	`bint`	The boolean value of the flag.

Lexeme.c_set_flag staticmethodnogil
Source

Set the value of a binary flag attribute.

Name	Type	Description
`lexeme`	`const LexemeC*`	A pointer to a `LexemeC` struct.
`flag_id`	`attr_id_t`	The ID of the flag to look up. The flag IDs are enumerated in `spacy.typedefs`.
`value`	`bint`	The value to set.

Suggest edits

Dec	JAN	Feb
	06
2019	2020	2021

Cython

TokenC C structSource

Token.get_struct_attr staticmethodnogilSource

Token.set_struct_attr staticmethodnogilSource

token_by_start functionSource

token_by_end functionSource

set_children_from_heads functionSource

LexemeC C structSource

Lexeme.get_struct_attr staticmethodnogilSource

Lexeme.set_struct_attr staticmethodnogilSource

Lexeme.c_check_flag staticmethodnogilSource

Lexeme.c_set_flag staticmethodnogilSource

TokenC C struct
Source

Token.get_struct_attr staticmethodnogil
Source

Token.set_struct_attr staticmethodnogil
Source

token_by_start function
Source

token_by_end function
Source

set_children_from_heads function
Source

LexemeC C struct
Source

Lexeme.get_struct_attr staticmethodnogil
Source

Lexeme.set_struct_attr staticmethodnogil
Source

Lexeme.c_check_flag staticmethodnogil
Source

Lexeme.c_set_flag staticmethodnogil
Source