The Wayback Machine - https://web.archive.org/web/20200425112555/https://spacy.io/api/cython-classes/

Cython

Cython Classes

Doc cdef class

The Doc object holds an array of TokenC structs.

Attributes

NameTypeDescription
memcymem.PoolA memory pool. Allocated memory will be freed once the Doc object is garbage collected.
vocabVocabA reference to the shared Vocab object.
cTokenC*A pointer to a TokenC struct.
lengthintThe number of tokens in the document.
max_lengthintThe underlying size of the Doc.c array.

Doc.push_back method

Append a token to the Doc. The token can be provided as a LexemeC or TokenC pointer, using Cython’s fused types.

NameTypeDescription
lex_or_tokLexemeOrTokenThe word to append to the Doc.
has_spacebintWhether the word has trailing whitespace.

Token cdef class

A Cython class providing access and methods for a TokenC struct. Note that the Token object does not own the struct. It only receives a pointer to it.

Attributes

NameTypeDescription
vocabVocabA reference to the shared Vocab object.
cTokenC*A pointer to a TokenC struct.
iintThe offset of the token within the document.
docDocThe parent document.

Token.cinit method

Create a Token object from a TokenC* pointer.

NameTypeDescription
vocabVocabA reference to the shared Vocab.
cTokenC*A pointer to a TokenCstruct.
offsetintThe offset of the token within the document.
docDocThe parent document.

Span cdef class

A Cython class providing access and methods for a slice of a Doc object.

Attributes

NameTypeDescription
docDocThe parent document.
startintThe index of the first token of the span.
endintThe index of the first token after the span.
start_charintThe index of the first character of the span.
end_charintThe index of the last character of the span.
labelattr_tA label to attach to the span, e.g. for named entities.

Lexeme cdef class

A Cython class providing access and methods for an entry in the vocabulary.

Attributes

NameTypeDescription
cLexemeC*A pointer to a LexemeC struct.
vocabVocabA reference to the shared Vocab object.
orthattr_tID of the verbatim text content.

Vocab cdef class

A Cython class providing access and methods for a vocabulary and other data shared across a language.

Attributes

NameTypeDescription
memcymem.PoolA memory pool. Allocated memory will be freed once the Vocab object is garbage collected.
stringsStringStoreA StringStore that maps string to hash values and vice versa.
lengthintThe number of entries in the vocabulary.

Vocab.get method

Retrieve a LexemeC* pointer from the vocabulary.

NameTypeDescription
memcymem.PoolA memory pool. Allocated memory will be freed once the Vocab object is garbage collected.
stringunicodeThe string of the word to look up.

Vocab.get_by_orth method

Retrieve a LexemeC* pointer from the vocabulary.

NameTypeDescription
memcymem.PoolA memory pool. Allocated memory will be freed once the Vocab object is garbage collected.
orthattr_tID of the verbatim text content.

StringStore cdef class

A lookup table to retrieve strings by 64-bit hashes.

Attributes

NameTypeDescription
memcymem.PoolA memory pool. Allocated memory will be freed once theStringStore object is garbage collected.
keysvector[hash_t]A list of hash values in the StringStore.