Cython Classes
Doc cdef class
The Doc object holds an array of TokenC
structs.
Attributes
| Name | Type | Description |
|---|---|---|
mem | cymem.Pool | A memory pool. Allocated memory will be freed once the Doc object is garbage collected. |
vocab | Vocab | A reference to the shared Vocab object. |
c | TokenC* | A pointer to a TokenC struct. |
length | int | The number of tokens in the document. |
max_length | int | The underlying size of the Doc.c array. |
Doc.push_back method
Append a token to the Doc. The token can be provided as a
LexemeC or
TokenC pointer, using Cython’s
fused types.
| Name | Type | Description |
|---|---|---|
lex_or_tok | LexemeOrToken | The word to append to the Doc. |
has_space | bint | Whether the word has trailing whitespace. |
Token cdef class
A Cython class providing access and methods for a
TokenC struct. Note that the Token object does
not own the struct. It only receives a pointer to it.
Attributes
| Name | Type | Description |
|---|---|---|
vocab | Vocab | A reference to the shared Vocab object. |
c | TokenC* | A pointer to a TokenC struct. |
i | int | The offset of the token within the document. |
doc | Doc | The parent document. |
Token.cinit method
Create a Token object from a TokenC* pointer.
| Name | Type | Description |
|---|---|---|
vocab | Vocab | A reference to the shared Vocab. |
c | TokenC* | A pointer to a TokenCstruct. |
offset | int | The offset of the token within the document. |
doc | Doc | The parent document. |
| RETURNS | Token | The newly constructed object. |
Span cdef class
A Cython class providing access and methods for a slice of a Doc object.
Attributes
| Name | Type | Description |
|---|---|---|
doc | Doc | The parent document. |
start | int | The index of the first token of the span. |
end | int | The index of the first token after the span. |
start_char | int | The index of the first character of the span. |
end_char | int | The index of the last character of the span. |
label | attr_t | A label to attach to the span, e.g. for named entities. |
Lexeme cdef class
A Cython class providing access and methods for an entry in the vocabulary.
Attributes
| Name | Type | Description |
|---|---|---|
c | LexemeC* | A pointer to a LexemeC struct. |
vocab | Vocab | A reference to the shared Vocab object. |
orth | attr_t | ID of the verbatim text content. |
Vocab cdef class
A Cython class providing access and methods for a vocabulary and other data shared across a language.
Attributes
| Name | Type | Description |
|---|---|---|
mem | cymem.Pool | A memory pool. Allocated memory will be freed once the Vocab object is garbage collected. |
strings | StringStore | A StringStore that maps string to hash values and vice versa. |
length | int | The number of entries in the vocabulary. |
Vocab.get method
Retrieve a LexemeC* pointer from the
vocabulary.
| Name | Type | Description |
|---|---|---|
mem | cymem.Pool | A memory pool. Allocated memory will be freed once the Vocab object is garbage collected. |
string | unicode | The string of the word to look up. |
| RETURNS | const LexemeC* | The lexeme in the vocabulary. |
Vocab.get_by_orth method
Retrieve a LexemeC* pointer from the
vocabulary.
| Name | Type | Description |
|---|---|---|
mem | cymem.Pool | A memory pool. Allocated memory will be freed once the Vocab object is garbage collected. |
orth | attr_t | ID of the verbatim text content. |
| RETURNS | const LexemeC* | The lexeme in the vocabulary. |
StringStore cdef class
A lookup table to retrieve strings by 64-bit hashes.
Attributes
| Name | Type | Description |
|---|---|---|
mem | cymem.Pool | A memory pool. Allocated memory will be freed once theStringStore object is garbage collected. |
keys | vector[hash_t] | A list of hash values in the StringStore. |

